"Error in using CashedDatabaseExampleSourse"

Anki · June 2011

Hi..This is Anki.....I am new to this community.

I am using SQL Server 2005 (Microsoft) as Database in local system. Because of the large database I am trying to access the data using
operator CashedDatabaseExampleSourse to work directly on database. All connections set correct, but the error is

Error in: CachedDatabaseExampleSource (CachedDatabaseExampleSource)
Database error occurred: Incorrect syntax near 'AUTO_INCREMENT'
-------------------------------------------------------------------------------------
The JDBC driver has thrown an SQLException. This may because of a lack of privileges, wrong table name or url and similar problems. Please note that some databases are case sensitive. Details are given in the message.
What could be the reason? I have takencare of database name, table names, field names...
Please Help me

Yours
Anki

keith · July 2011

Which version of RapidMiner are you using? Does the error still occur when you use the non-cached read database operator (reading a subset of rows since your data is large)? In SQL Server you can limit the # of rows coming back using the TOP keyword in your SELECT statement:

SELECT TOP 1000 * FROM MYTABLE

Anki · July 2011

Hi Keith,

This error comes for me when I used version RM 4.5. So I have switched to Latest Ver 5. Here also same error is coming when I used "Stream Database". When I used "Read Database" in (Version 5) operater It is working properly. I have 4 millon examples in my table.

I am not getting exactly what to do. Can u please eloborate the solution.

Thank you Keith

Yours
Anki

keith · July 2011

Upgrading to RM version 5 will help since it's the latest and best supported.

All I was asking you to do was to check if the problem occurred with any database operator or just the Stream Database version. Since you had a lot of data, I suggested testing with a smaller subset of data (TOP 1000 rows, for example) rather than the whole 4 million. From your last response, it sounds like you have now tested with the regular version of the operator in RM 5, which worked, but the stream option still didn't . I don't have experience with the stream database option, so may not be of much more help.

Strange thing is that I don't think AUTO_INCREMENT is a valid SQL Server keyword (it uses IDENTITY for system generated unique columns). Are you specifying a query in the operator to retrieve the rows? If so, what is the SQL statement you are using? Is this a query that originated from another (non-SQL Server) database system? Does the query work if you run it directly against the database outside of RM?

Anki · July 2011

Hi Keith.

I am trying to read entire table from MS SQL Server 2005.

select * from CUST_TRANSACTIONS;

I have written query in query tab. and all the data in SQL Server only.

Now I am using RM 5.1 and In this version all things are working fine with regular "Database Read" operator. Where as the problem is only with "stream database" operator. I also tried by writing same query at SQL Server 2005 without RM, here it is working fine.

Anyhow i am continuing with normal "Database read" operator. but while I am trying to write (Append mode, because it is in Loop) into Database, It showing error after 3 Iterations. Here is the process.


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="607" width="773">
      <operator activated="true" class="read_database" compatibility="5.1.006" expanded="true" height="60" name="Read Database" width="90" x="45" y="120">
        <parameter key="connection" value="anki"/>
        <parameter key="query" value="SELECT distinct &quot;CUST_ID&quot;&#13;, count(*) TRAN_CNT&#10;FROM &quot;CUST_TRAND&quot; group by CUST_ID"/>
        <enumeration key="parameters"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Filter Examples" width="90" x="166" y="101">
        <parameter key="condition_class" value="attribute_value_filter"/>
        <parameter key="parameter_string" value="TRAN_CNT&gt;6"/>
      </operator>
      <operator activated="true" class="loop_examples" compatibility="5.1.006" expanded="true" height="94" name="Loop Examples" width="90" x="380" y="75">
        <process expanded="true" height="607" width="773">
          <operator activated="true" class="extract_macro" compatibility="5.1.006" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
            <parameter key="macro" value="customer_id"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="attribute_name" value="CUST_ID"/>
            <parameter key="example_index" value="%{example}"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="5.1.006" expanded="true" height="60" name="Extract Macro (2)" width="90" x="179" y="30">
            <parameter key="macro" value="tranc"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="attribute_name" value="TRAN_CNT"/>
            <parameter key="example_index" value="%{example}"/>
          </operator>
          <operator activated="true" class="generate_macro" compatibility="5.1.006" expanded="true" height="76" name="Generate Macro" width="90" x="380" y="30">
            <list key="function_descriptions">
              <parameter key="outlier_count" value="ceil(%{tranc} * 0.2)"/>
            </list>
          </operator>
          <operator activated="true" class="read_database" compatibility="5.1.006" expanded="true" height="60" name="Read Database (2)" width="90" x="45" y="210">
            <parameter key="connection" value="anki"/>
            <parameter key="query" value="SELECT *&#13;&#10;FROM &quot;CUST_TRAND&quot;&#13;&#10;WHERE CUST_ID=%{customer_id}"/>
            <enumeration key="parameters"/>
          </operator>
          <operator activated="true" class="work_on_subset" compatibility="5.1.006" expanded="true" height="76" name="Work on Subset" width="90" x="179" y="210">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|TRAN_AMT"/>
            <process expanded="true">
              <connect from_port="exampleSet" to_port="example set"/>
              <portSpacing port="source_exampleSet" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" breakpoints="after" class="detect_outlier_distances" compatibility="5.1.006" expanded="true" height="76" name="Detect Outlier (Distances)" width="90" x="313" y="210">
            <parameter key="number_of_neighbors" value="3"/>
            <parameter key="number_of_outliers" value="%{outlier_count}"/>
          </operator>
          <operator activated="true" class="write_database" compatibility="5.1.006" expanded="true" height="60" name="Write Database" width="90" x="447" y="210">
            <parameter key="connection" value="anki"/>
            <parameter key="table_name" value="newone"/>
            <parameter key="overwrite_mode" value="append"/>
          </operator>
          <connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Extract Macro" from_port="example set" to_op="Extract Macro (2)" to_port="example set"/>
          <connect from_op="Extract Macro (2)" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
          <connect from_op="Generate Macro" from_port="through 1" to_port="example set"/>
          <connect from_op="Read Database (2)" from_port="output" to_op="Work on Subset" to_port="example set"/>
          <connect from_op="Work on Subset" from_port="example set" to_op="Detect Outlier (Distances)" to_port="example set input"/>
          <connect from_op="Detect Outlier (Distances)" from_port="example set output" to_op="Write Database" to_port="input"/>
          <connect from_op="Write Database" from_port="through" to_port="output 1"/>
          <portSpacing port="source_example set" spacing="18"/>
          <portSpacing port="sink_example set" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="72"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Database" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Loop Examples" to_port="example set"/>
      <connect from_op="Loop Examples" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

colo · July 2011

Hi Anki,

did you create the table structure prior to the use of "Write Database" in some database interface? Or is the table created by the RapidMiner operator? "Read Database" has an awful behavior when creating tables (see this old topic, still not working completely). When creating the table the length for varchar columns is set to the longest value present in your example set for this attribute. If you are writing from a loop, there is always only one value, so the column width is set to the length of the string from your first example. As soon as you try to store an example containing a longer value (although you didn't provide your error message, I assume this is the case for your third example), an error message appears since the data doesn't fit in the table column. "Read Database" offers the promising "set default varchar length" parameter, which could help to set a column length containing enough room for all values. But this option is simply ignored in my case (just tested it).
So there exist two possible workaraounds for this: You can either create the table structure and define all columns with their respective length before you execute your process (I use PHP-MyAdmin for my mySQL database) or you have to set all nominal attributes to text attributes. The text type doesn't require a fixed length when creating the database table. The first option needs more work but should be preferred if possible.

If this does not help you, please post the error message that appeared when trying to execute the process.

Regards
Matthias

Anki · July 2011

Hi Matthias,

I have created Table from SQL Server interface, and tried the same things.
Now It is running upto 432 Iterations and giving Erroe message as

The process would need more than maximum amount of availavle memory.......

where as the total number of Iterations I need to run are 920.
Please tell me where the problem is ...

Thank you

Yours
Anki

colo · July 2011

Hi Anki,

did you have a look at the memory consumption by RapidMiner? How much memory do you have available (in your system and for RapidMiner)? If you open the results perspective you will see the resource monitor (System Monitor) showing available memory (Max) and memory in usage (Total). Please check this and have an eye on the increasing usage when executing your process.

Regards
Matthias

Anki · July 2011

Hi Matthies,

IActully My RAM is 2GB. In system Monitor it is showing MAX : 1GB and Available: 1GB. As you said, when the process is going on memory consumption is increasing, and finally it is giving Error.

So for this any other solution except dividing the original data into sub parts and repeating?

Thank you

Yours
Anki

keith · July 2011

I'm still suspicious of the AUTO_INCREMENT error. Poking around in the sources, I found this snippet in com.rapidminer.example.table.IndexCachedDatabaseExampleTable.java

	
	/** Subclasses might want to override this method if they do not support auto_increment. */
	protected void createRMPrimaryKeyIndex(DatabaseHandler databaseHandler, String tableName) throws SQLException {
		String addKeyQuery = 
			"ALTER TABLE " + databaseHandler.getStatementCreator().makeIdentifier(tableName)+
			" ADD " + databaseHandler.getStatementCreator().makeIdentifier(INDEX_COLUMN_NAME)+ 
			" INT NOT NULL AUTO_INCREMENT PRIMARY KEY";

		Statement statement = databaseHandler.createStatement(true);
		statement.execute(addKeyQuery);
		statement.close();
	}

But it's not clear to me that there's an override for SQL Server elsewhere in the code as suggested might be needed by the code comment. Since AUTO_INCREMENT isn't supported by SQL Server (they use IDENTITY for similar functionality), this could be the source of the problem?

Keith

colo · July 2011

Hi Anki,

I never used outlier detection operators and don't know how much memory these calculations require. But one thing seems suspicious in your process. You included "Work on Subset" which makes me think, that you want to find outliers just for the attributes specified for this operator!? But the way you connected it to your process, it's absolutely useless and doesn't have any effect. If you really intended to work only on the chosen attribute subset, you have to put "Detect Outliers" inside the "Work on Subset" operator. Perhaps using all attributes may be a reason for the high memory consumption?

Another thing seemed strange to me. What exactly do you achieve by using COUNT() and GROUP BY in this query (just being nosy):

SELECT distinct "CUST_ID", count(*) TRAN_CNT FROM "CUST_TRAND" group by CUST_ID

Best regards
Matthias

Anki · July 2011

Hi Matthias,

I want to detect outliers for the customers only if they made more than 6 transactions. For that purpose I have written the query.
And also I am using TRAN_CNT value for writing the macro inside the loop, and 20% of TRAN_CNT is input to the "number_of_outlier" parameter in outlier detection.

I want to give only TRAN_AMT as input for "Outlier Detection" thats why I have choosen that operator "work on subset" .
I think I may made some silly mistakes, because I am not much familier with this operators.

And also I have observed one thing that ...the process stoped in the inner loop while processing the number of transactions count is 4325 for one customer
I think It is unable to load 4325 transactions to work on. Am I talking sence Matthias

Please suggest me Any changes required...

Thank you

Yours
Anki

Anki · July 2011

Hi Matthies,

Here is the process.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="607" width="773">
      <operator activated="true" class="read_database" compatibility="5.1.006" expanded="true" height="60" name="Read Database" width="90" x="45" y="75">
        <parameter key="connection" value="anki"/>
        <parameter key="query" value="SELECT distinct &quot;CUST_ID&quot;&#13;, count(*) TRAN_CNT&#10;FROM &quot;CUST_TRAND&quot; group by CUST_ID"/>
        <enumeration key="parameters"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Filter Examples" width="90" x="184" y="74">
        <parameter key="condition_class" value="attribute_value_filter"/>
        <parameter key="parameter_string" value="TRAN_CNT&gt;6"/>
      </operator>
      <operator activated="true" class="loop_examples" compatibility="5.1.006" expanded="true" height="94" name="Loop Examples" width="90" x="380" y="75">
        <process expanded="true" height="607" width="773">
          <operator activated="true" class="extract_macro" compatibility="5.1.006" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
            <parameter key="macro" value="customer_id"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="attribute_name" value="CUST_ID"/>
            <parameter key="example_index" value="%{example}"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="5.1.006" expanded="true" height="60" name="Extract Macro (2)" width="90" x="167" y="29">
            <parameter key="macro" value="tranc"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="attribute_name" value="TRAN_CNT"/>
            <parameter key="example_index" value="%{example}"/>
          </operator>
          <operator activated="true" class="generate_macro" compatibility="5.1.006" expanded="true" height="76" name="Generate Macro" width="90" x="292" y="29">
            <list key="function_descriptions">
              <parameter key="outlierc" value="ceil(%{tranc} * 0.2)"/>
            </list>
          </operator>
          <operator activated="true" class="read_database" compatibility="5.1.006" expanded="true" height="60" name="Read Database (2)" width="90" x="45" y="210">
            <parameter key="connection" value="anki"/>
            <parameter key="query" value="SELECT *&#13;&#10;FROM &quot;CUST_TRAND&quot;&#13;&#10;WHERE CUST_ID=%{customer_id}"/>
            <parameter key="table_name" value="CUST_TRAND"/>
            <parameter key="prepare_statement" value="true"/>
            <enumeration key="parameters"/>
          </operator>
          <operator activated="false" class="select_attributes" compatibility="5.1.006" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
            <parameter key="attribute_filter_type" value="subset"/>
          </operator>
          <operator activated="true" class="detect_outlier_distances" compatibility="5.1.006" expanded="true" height="76" name="Detect Outlier (2)" width="90" x="313" y="210">
            <parameter key="number_of_neighbors" value="3"/>
            <parameter key="number_of_outliers" value="%{outlierc}"/>
          </operator>
          <operator activated="true" class="write_database" compatibility="5.1.006" expanded="true" height="60" name="Write Database (2)" width="90" x="447" y="120">
            <parameter key="connection" value="anki"/>
            <parameter key="table_name" value="outnew"/>
            <parameter key="overwrite_mode" value="append"/>
          </operator>
          <operator activated="true" class="free_memory" compatibility="5.1.006" expanded="true" height="76" name="Free Memory (2)" width="90" x="581" y="210"/>
          <connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Extract Macro" from_port="example set" to_op="Extract Macro (2)" to_port="example set"/>
          <connect from_op="Extract Macro (2)" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
          <connect from_op="Generate Macro" from_port="through 1" to_port="example set"/>
          <connect from_op="Read Database (2)" from_port="output" to_op="Detect Outlier (2)" to_port="example set input"/>
          <connect from_op="Detect Outlier (2)" from_port="example set output" to_op="Write Database (2)" to_port="input"/>
          <connect from_op="Write Database (2)" from_port="through" to_op="Free Memory (2)" to_port="through 1"/>
          <connect from_op="Free Memory (2)" from_port="through 1" to_port="output 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_example set" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="72"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Database" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Loop Examples" to_port="example set"/>
      <connect from_op="Loop Examples" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Thank you

Yours
Anki

colo · July 2011

Hi Anki,

As I said, I am not familiar with outlier detection, but I guess the operator uses all regular attributes as the different models do. So you either have to load only the relevant attribute (or filter this one activating "Select Attributes"). It should also be possible to use "Work on Subset", but then you have to put the outlier operator inside it. If you use "Select Attributes" perhaps some attributes are lost, that are required later when writing back to database. To avoid this, it should be possible not to filter them out but to set them to some special role. Use the "Set Role" operator and assign some non-standard role, such as "keep_attribute". This way outlier detection should not consider them (and use lots of memory for unwanted calculations). If this won't help, someone with experience in using outlier analysis may probably be able to provide better suggestions.

Regards
Matthias

Edit: Oops, I misinterpreted your first database query. I thought TRAN_CNT was already contained in the database. But after looking over it once again, this query of course makes sense, sorry.

Anki · July 2011

Hi Matthias,

Sorry that I did not tell you, I did not have TRAN_CNT in my database and because of that I am calculating while reading into database only.
Any how I followed the steps taken from your reply. Now I am able run the process perfectly for all the cases except if number of transactions more than 4000. Any how I have only 2 cases lke that. Now I kept those case aside.

Everything is fine. I will think of the cases which more examples.
Thank you once again Matthias for your continuous support.

Yours
Anki

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Error in using CashedDatabaseExampleSourse"

Answers