RapidMiner

RapidMiner

DatabaseExampleSource returns only 10 rows

DatabaseExampleSource returns only 10 rows

Hello,
I have a problem with the DatabaseExampleSource in combination with Oracle and rapidminer version 4.1.

My query that is executed should return over 100.000 data rows. But all i get within rapidminer are the first 10 rows. This matches (maybe) the default fetch size of the ODBC driver? So it seems that the first 10 rows will be fetched and afterwards the operator terminates...

Sometimes I get the following error message:
Jun 24, 2008 8:49:46 AM: [Error] While reading examples from result set: operation not allowed: Unsupported syntax for refreshRow()

Btw: when i switch to version 4.0 everything works fine...

Any suggestions what I'm doing wrong?

greetz
ajay
3 REPLIES
Moderator

Re: DatabaseExampleSource returns only 10 rows

Hi,

I have not yet an idea, what could be the problem here. Does the error show up, when you only try to read the data via the [tt]DatabaseExamplesSource[/tt] operator? Or does your process involve other operators? What settings do you use in the example source operator? Maybe you can post your process XML?

Regards,
Tobias

Re: DatabaseExampleSource returns only 10 rows

ok, I tried it with and without further operators, but this does not make any difference.
still it isn't a real error - it just does not retrieve all data rows Smiley Happy

here are some more details:

rapidminer 4.0

Process XML:
<operator name="Root" class="Process">
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
        <parameter key="database_system" value="Oracle"/>
        <parameter key="database_url" value="jdbc:oracle:thin:@*******:1521:***"/>
        <parameter key="password" value="***"/>
        <parameter key="query" value="SELECT JH_JOBNUMBER, DEVFROMMEAN FROM (  SELECT JH_JOBNUMBER, (CD_VALUE-AVG(CD_VALUE) OVER(PARTITION BY JH_JOBNUMBER, FEATURE_SIZE, ORIENTATION)) AS DEVFROMMEAN  FROM (  SELECT JH_JOBNUMBER, CD_FEATURENAME,    CASE    WHEN CD_FEATURENAME LIKE &#39;%280%&#39; THEN &#39;280&#39;    WHEN CD_FEATURENAME LIKE &#39;%400%&#39; THEN &#39;400&#39;  END AS FEATURE_SIZE,  CASE    WHEN CD_FEATURENAME LIKE &#39;%X%&#39; THEN &#39;X&#39;    WHEN CD_FEATURENAME LIKE &#39;%Y%&#39; THEN &#39;Y&#39;  END AS ORIENTATION,  CD_VALUE  FROM DIM_GENERAL_JOBHEADER header, DIM_CDFEATURE feature, FACT_CDRAWANDLER fact  WHERE header.DIM_JH_SK = fact.DIM_JH_SK  AND feature.DIM_CDFEATURE_SK = fact.DIM_CDFEATURE_SK  AND CD_FEATURENAME LIKE &#39;%_GCD_%&#39;  ORDER BY JH_JOBNUMBER, FEATURE_SIZE, ORIENTATION, FLOOR((fact.CD_MEASUREMENTPOINTCOORDINATEY+76200)/10500) DESC, (fact.CD_MEASUREMENTPOINTCOORDINATEX+76200)  ) ) -- WHERE DEVFROMMEAN &gt; -30 AND DEVFROMMEAN &lt; 30"/>
        <parameter key="username" value="***"/>
    </operator>
</operator>


output at the console:
P Jun 24, 2008 9:54:32 PM: Initialising process setup
P Jun 24, 2008 9:54:32 PM: [NOTE] No filename given for result file, using stdout for logging results!
P Jun 24, 2008 9:54:32 PM: Checking properties...
P Jun 24, 2008 9:54:32 PM: Properties are ok.
P Jun 24, 2008 9:54:32 PM: Checking process setup...
P Jun 24, 2008 9:54:32 PM: Inner operators are ok.
P Jun 24, 2008 9:54:32 PM: Checking i/o classes...
P Jun 24, 2008 9:54:32 PM: i/o classes are ok. Process output: ExampleSet.
P Jun 24, 2008 9:54:32 PM: Process ok.
P Jun 24, 2008 9:54:32 PM: Process initialised
P Jun 24, 2008 9:54:32 PM: [NOTE] Process starts
P Jun 24, 2008 9:54:32 PM: Process:
  Root[0] (Process)
  +- DatabaseExampleSource[0] (DatabaseExampleSource)
P Jun 24, 2008 9:54:50 PM: [NOTE] Process finished after 18 seconds
P Jun 24, 2008 9:54:50 PM: Process:
  Root[1] (Process)
  +- DatabaseExampleSource[1] (DatabaseExampleSource)
P Jun 24, 2008 9:54:50 PM: Produced output:
IOContainer (1 objects):
SimpleExampleSet:
134424 examples,
2 regular attributes,
no special attributes
(created by DatabaseExampleSource)
P Jun 24, 2008 9:54:50 PM: [NOTE] Process finished successfully
G Jun 24, 2008 9:54:50 PM: [Error] Plotter: the given data contains missing values. Probably most plotters will not be able to produce proper visualizations. Please replace missing values beforehand if possible.
G Jun 24, 2008 9:54:50 PM: [Warning] Cannot plot all data points, using only a sample of 1000 rows.
G Jun 24, 2008 9:54:51 PM: [NOTE] Cannot use plotter 'Scatter 3D': Data table is not allowed to contain missing values. Please replace them beforehand.
G Jun 24, 2008 9:54:52 PM: [NOTE] Cannot use plotter 'Scatter 3D Color': Data table is not allowed to contain missing values. Please replace them beforehand.
G Jun 24, 2008 9:54:52 PM: [NOTE] Cannot use plotter 'SOM': Data table is not allowed to contain missing values. Please replace them beforehand.
G Jun 24, 2008 9:54:52 PM: [NOTE] Cannot use plotter 'Density': Data table is not allowed to contain missing values. Please replace them beforehand.
G Jun 24, 2008 9:54:52 PM: [NOTE] Cannot use plotter 'Sticks 3D': Data table is not allowed to contain missing values. Please replace them beforehand.
G Jun 24, 2008 9:54:52 PM: [NOTE] Cannot use plotter 'Box 3D': Data table is not allowed to contain missing values. Please replace them beforehand.
G Jun 24, 2008 9:54:52 PM: [NOTE] Cannot use plotter 'Surface 3D': Data table must have between 0 and 50 rows, was 1000.


-----------------------------------

rapidminer 4.1

Process XML:
<operator name="Root" class="Process" expanded="yes">
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
        <parameter key="database_system" value="Oracle"/>
        <parameter key="database_url" value="jdbc:oracle:thin:@*******:1521:***"/>
        <parameter key="password" value="***"/>
        <parameter key="query" value="SELECT JH_JOBNUMBER, DEVFROMMEAN FROM (  SELECT JH_JOBNUMBER, (CD_VALUE-AVG(CD_VALUE) OVER(PARTITION BY JH_JOBNUMBER, FEATURE_SIZE, ORIENTATION)) AS DEVFROMMEAN  FROM (  SELECT JH_JOBNUMBER, CD_FEATURENAME,    CASE    WHEN CD_FEATURENAME LIKE &#39;%280%&#39; THEN &#39;280&#39;    WHEN CD_FEATURENAME LIKE &#39;%400%&#39; THEN &#39;400&#39;  END AS FEATURE_SIZE,  CASE    WHEN CD_FEATURENAME LIKE &#39;%X%&#39; THEN &#39;X&#39;    WHEN CD_FEATURENAME LIKE &#39;%Y%&#39; THEN &#39;Y&#39;  END AS ORIENTATION,  CD_VALUE  FROM DIM_GENERAL_JOBHEADER header, DIM_CDFEATURE feature, FACT_CDRAWANDLER fact  WHERE header.DIM_JH_SK = fact.DIM_JH_SK  AND feature.DIM_CDFEATURE_SK = fact.DIM_CDFEATURE_SK  AND CD_FEATURENAME LIKE &#39;%_GCD_%&#39;  ORDER BY JH_JOBNUMBER, FEATURE_SIZE, ORIENTATION, FLOOR((fact.CD_MEASUREMENTPOINTCOORDINATEY+76200)/10500) DESC, (fact.CD_MEASUREMENTPOINTCOORDINATEX+76200)  ) ) -- WHERE DEVFROMMEAN &gt; -30 AND DEVFROMMEAN &lt; 30"/>
        <parameter key="username" value="***"/>
    </operator>
</operator>


output at the console:
P Jun 24, 2008 10:15:23 PM: Initialising process setup
P Jun 24, 2008 10:15:23 PM: [NOTE] No filename given for result file, using stdout for logging results!
P Jun 24, 2008 10:15:23 PM: Checking properties...
P Jun 24, 2008 10:15:23 PM: Properties are ok.
P Jun 24, 2008 10:15:23 PM: Checking process setup...
P Jun 24, 2008 10:15:23 PM: Inner operators are ok.
P Jun 24, 2008 10:15:23 PM: Checking i/o classes...
P Jun 24, 2008 10:15:23 PM: i/o classes are ok. Process output: ExampleSet.
P Jun 24, 2008 10:15:23 PM: Process ok.
P Jun 24, 2008 10:15:23 PM: Process initialised
P Jun 24, 2008 10:15:23 PM: [NOTE] Process starts
P Jun 24, 2008 10:15:23 PM: Process:
  Root[0] (Process)
  +- DatabaseExampleSource[0] (DatabaseExampleSource)
G Jun 24, 2008 10:15:28 PM: [Error] While reading examples from result set: operation not allowed: Unsupported syntax for refreshRow()
P Jun 24, 2008 10:15:28 PM: [NOTE] Process finished after 5 seconds
P Jun 24, 2008 10:15:28 PM: Process:
  Root[1] (Process)
  +- DatabaseExampleSource[1] (DatabaseExampleSource)
P Jun 24, 2008 10:15:28 PM: Produced output:
IOContainer (1 objects):
SimpleExampleSet:
10 examples,
2 regular attributes,
no special attributes
(created by DatabaseExampleSource)
P Jun 24, 2008 10:15:28 PM: [NOTE] Process finished successfully


when i execute the process for a second time, the error message disappear..
P Jun 24, 2008 10:15:59 PM: Initialising process setup
P Jun 24, 2008 10:15:59 PM: [NOTE] No filename given for result file, using stdout for logging results!
P Jun 24, 2008 10:15:59 PM: Checking properties...
P Jun 24, 2008 10:15:59 PM: Properties are ok.
P Jun 24, 2008 10:15:59 PM: Checking process setup...
P Jun 24, 2008 10:15:59 PM: Inner operators are ok.
P Jun 24, 2008 10:15:59 PM: Checking i/o classes...
P Jun 24, 2008 10:15:59 PM: i/o classes are ok. Process output: ExampleSet.
P Jun 24, 2008 10:15:59 PM: Process ok.
P Jun 24, 2008 10:15:59 PM: Process initialised
P Jun 24, 2008 10:15:59 PM: [NOTE] Process starts
P Jun 24, 2008 10:15:59 PM: Process:
  Root[1] (Process)
  +- DatabaseExampleSource[1] (DatabaseExampleSource)
P Jun 24, 2008 10:16:06 PM: [NOTE] Process finished after 6 seconds
P Jun 24, 2008 10:16:06 PM: Process:
  Root[1] (Process)
  +- DatabaseExampleSource[1] (DatabaseExampleSource)
P Jun 24, 2008 10:16:06 PM: Produced output:
IOContainer (1 objects):
SimpleExampleSet:
10 examples,
2 regular attributes,
no special attributes
(created by DatabaseExampleSource)
P Jun 24, 2008 10:16:06 PM: [NOTE] Process finished successfully


I use the latest java jdk 1.5 with the ojdbc driver for java 1.4/1.5 (ojdbc14.jar).
Hope this helps a little ?!

-ajay-
RMStaff

Re: DatabaseExampleSource returns only 10 rows

Hello,

hmm, it could be that there is a problem with the Oracle JDBC driver and order / grouping statements:

http://www.orafaq.com/forum/?t=msg&;th=34929/0/

and

http://forum.java.sun.com/thread.jspa?threadID=272795&;messageID=1282624

Maybe these links and the suggested workaround could help? Although I would not yet have an idea why it worked in RM 4.0 then. I try to check what differs between both version with respect to database access.

Cheers,
Ingo