"How can I Iterate by dividing the database"

AnkiAnki Member Posts: 39  Maven
edited May 23 in Help
Hi all,

I am new to RM. I have a database (SQL Server) with 1 millon bank transactions of 1400 customers. I want to find out outliers (In case of Transaction amount) customerwise. I read all data usung batch processing into Database in RM.

Can anyone please suggest me, how can I get transactions data customer wise and Iterate the process for each customer.

Thank you 
Tagged:

Answers

  • colocolo Member Posts: 236  Guru
    Hi Anki,

    how are the customers identified in your database? Do they have some sort of ID? If so, you can simply loop over these IDs either using a list of all customer IDs (maybe generated by a simple SQL query) or using the "Loop Values" operator for the ID attribute (if you loaded the entire database to RapidMiner). Inside this loop you can use "Filter Examples" to consider only the transactions from this customer and then start your outlier detection.
    I would prefer just loading the data from a single customer by defining a proper query when importing the data ("Read Database" operator allows this). In this case you don't need to filter the examples and start analysing the data instantly.

    This is just a simple way you could iterate over all of the customers...

    Regards
    Matthias
  • AnkiAnki Member Posts: 39  Maven
    Hi Matthias,

    Thank you verymuch for your reply. Your solution exactly as per my requirements.
    Customer Ids are like this   

    CUST_ID                TRAN_ID                        TRAN_AMT
    ------------------------------------------------------------------
    25051610              TM2134                            5400
    25051610                TT3245                            12390
    25051610                CD9807                            120
    25051610                CD235                              1298011
    ......
    25051610                CF9087                          12

    833282731                  ...........                          ..........
    ......
    319021623

    From your reply I got as ..First connect RM to read Database, Write Query ( Seclect ............Where Cust_ID=25051610).  Outliers belong to that customer.
    After that How can I repeat for every customer? This question may be silly, But please help me.....Because I am new to RM...I am unable to find the process to repeat.
    Please help me

    Thank you once again Matthias
  • colocolo Member Posts: 236  Guru
    Hi Anki,

    if you don't want to load the whole table from database at once, you can use two "Read Database" operators. I would try it this way: first "Read Database" delivers all customer IDs (SELECT DISTINCT Cust_ID ...). Then you can use the "Loop Examples" operator to execute the embedded process of this operator for every row/customer from your list. You will need to extract the current ID via "Extract Macro" and start a second database query ("Read Database" again) using the macro for the customer ID to filter the desired entries (WHERE clause).

    Regards
    Matthias
  • AnkiAnki Member Posts: 39  Maven
    Hi Matthias,

    The solution is seems to be simple but I have struggled for whole day but I am unable to solve this, becuse I am new to RM and not so familier with these operators.
    Can you please give me an example process to solve this, so that It will help me a lot.

    Thank you Matthias

    Yours
    Anki
  • colocolo Member Posts: 236  Guru
    Hi Anki,

    just built a quick example without testing it, since I don't have the time to create some data. I described a process like this:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="607" width="773">
          <operator activated="true" class="read_database" compatibility="5.1.008" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
            <parameter key="connection" value="localDB"/>
            <parameter key="query" value="SELECT DISTINCT `Cust_ID`&#13;&#10;FROM `tabellenname`"/>
            <enumeration key="parameters"/>
          </operator>
          <operator activated="true" class="loop_examples" compatibility="5.1.008" expanded="true" height="94" name="Loop Examples" width="90" x="179" y="30">
            <process expanded="true" height="607" width="773">
              <operator activated="true" class="extract_macro" compatibility="5.1.008" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
                <parameter key="macro" value="customer_id"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="Cust_ID"/>
                <parameter key="example_index" value="%{example}"/>
              </operator>
              <operator activated="true" class="read_database" compatibility="5.1.008" expanded="true" height="60" name="Read Database (2)" width="90" x="45" y="120">
                <parameter key="connection" value="localDB"/>
                <parameter key="query" value="SELECT *&#13;&#10;FROM `tabellenname`&#13;&#10;WHERE `Cust_ID`=%{customer_id}"/>
                <enumeration key="parameters"/>
              </operator>
              <connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Extract Macro" from_port="example set" to_port="example set"/>
              <connect from_op="Read Database (2)" from_port="output" to_port="output 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="72"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Database" from_port="output" to_op="Loop Examples" to_port="example set"/>
          <connect from_op="Loop Examples" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Your outlier detection has to be applied after the inner "Read Database" operator inside "Loop Examples". If you don't write back to database inside the loop, you will have to append the multiple example sets afterwards. But the "Append" operator is very memory consuming, so I would prefer incremental database storage.

    Regards
    Matthias
  • AnkiAnki Member Posts: 39  Maven
    Hi Matthias,

    Thank you verymuch. It is workig perfectly.

    Thanks again for your valuable time

    Yours
    Anki
  • AnkiAnki Member Posts: 39  Maven
    Hi Matthies,

    Thanks a lot for your continues support for me in solving this problem.
    Here while I am trying to append the output into Database ( with write Database operator ), it showing error after 4 iterations (I checked using break point)
    The error is like
    Database error occured: Data Truncation
    Error may be because of lack of previlages....

    It is working fine when I tried to write into CSV files ( into different file each iteration time).
    What could be the reason...

    Thank you Matthias

    Yours
    Anki

Sign In or Register to comment.