Options

Data Preprocessing - Need urgently help

fdhhakifdhhaki Member Posts: 6 Contributor II
Hello Everbody..

i will transform a given data, but i dont know how can i do it in Rapidminer.

Given data:


Client Date        Product
Client 1 14.02.1980 Product1
Client 1 14.02.1980 Product 2
Client 1 14.02.1980 Product 3
Client 1 14.02.1980 Product 4
Client 1 14.02.1980 Product 5
Client 1 13.02.1934 Product 1
Client 1 13.02.1934 Product 2
Client 1 13.02.1934 Product 3
Client 1 13.02.1934 Product 4
Client 3 14.02.1934 Product 1
Client 3 14.02.1934 Product 2
Client 3 14.02.1934 Product 3
Client 4 15.02.1934 Product 1
Client 4 15.02.1934 Product 2
Client 5 16.02.1934 Product 1


this is what i want..

Client     Date       Product1  Product 2  Product 3  Product 4  Product 5
Client 1 14.02.1980     1               1       1           1             1
Client 1 13.02.1934     1               1       1           1             0
Client 3 14.02.1934     1                1       1           0             0
Client 4 15.02.1934     1               1       0           0             0
Client 5 16.02.1934     1               0       0           0             0




I would be very happy if someone can help me. It is very urgent and important

Greetings!

Answers

  • Options
    el_chiefel_chief Member Posts: 63 Contributor II
    Nominal to Binominal operator

    in data transformation, type conversion
  • Options
    fdhhakifdhhaki Member Posts: 6 Contributor II
    hi,

    thank you for your answer.
    yes, i can use it for type conversion.

    But how can i get this table structure?

    In Example: If a Client buys some Articles on 14.02.1980, it represents one Data set.
                      If the same Client buys on another day articles, it represents onother Data set..and so on..

    Can you show me an Example-Workflow in Rapidminer?

    Thanks!!
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    I think this may help..

    Here's the data...

    Client,  Date,            Product
    Client 1,  14.02.1980,  Product 1
    Client 1,  14.02.1980,  Product 2
    Client 1,  14.02.1980,  Product 3
    Client 1,  14.02.1980,  Product 4
    Client 1,  14.02.1980,  Product 5
    Client 1,  13.02.1934,  Product 1
    Client 1,  13.02.1934,  Product 2
    Client 1,  13.02.1934,  Product 3
    Client 1,  13.02.1934,  Product 4
    Client 3,  14.02.1934,  Product 1
    Client 3,  14.02.1934,  Product 2
    Client 3,  14.02.1934,  Product 3
    Client 4,  15.02.1934,  Product 1
    Client 4,  15.02.1934,  Product 2
    Client 5,  16.02.1934,  Product 1

    And here's the code..

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="251" width="748">
          <operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="46" y="58">
            <parameter key="file_name" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\RM5\a.csv"/>
            <parameter key="column_separators" value=","/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Product"/>
            <parameter key="use_underscore_in_name" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value="Pr.*"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.0.11" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
            <list key="aggregation_attributes">
              <parameter key="Product_Product 1" value="sum"/>
              <parameter key="Product_Product 2" value="sum"/>
              <parameter key="Product_Product 3" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="Client|Date"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="581" y="30">
            <parameter key="replace_what" value="sum\(Product_|\)"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Hope so, good weekend to all!

  • Options
    fdhhakifdhhaki Member Posts: 6 Contributor II
    goood..thank you very much haddock!

    good weekend !
  • Options
    haddockhaddock Member Posts: 849 Maven
    My pleasure!

    Thanks for acknowledging, far too often folks don't bother to do that.

    Have fun!

    PS After I posted I thought it might be better to aggregate the data, here's how...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="251" width="748">
          <operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="46" y="58">
            <parameter key="file_name" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\RM5\a.csv"/>
            <parameter key="column_separators" value=","/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Product"/>
            <parameter key="use_underscore_in_name" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value="Pr.*"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.0.11" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
            <list key="aggregation_attributes">
              <parameter key="Product_Product 1" value="sum"/>
              <parameter key="Product_Product 2" value="sum"/>
              <parameter key="Product_Product 3" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="Client|Date"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="581" y="30">
            <parameter key="replace_what" value="sum\(Product_|\)"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    fdhhakifdhhaki Member Posts: 6 Contributor II
    Hello..
    the workflow works correct, but there is a little Problem!

    Here is the data:

    Client     Date          Product
    Client 1 14.02.1980 Product 1
    Client 1 14.02.1980 Product 1
    Client 1 14.02.1980 Product 1
    Client 1 15.02.1980 Product 1
    Client 1 14.02.1980 Product 3
    Client 1 14.02.1980 Product 4
    Client 1 14.02.1980 Product 5
    Client 1 13.02.1934 Product 1
    Client 1 13.02.1934 Product 2
    Client 1 13.02.1934 Product 3
    Client 1 13.02.1934 Product 4
    Client 3 14.02.1934 Product 1
    Client 3 14.02.1934 Product 2
    Client 3 14.02.1934 Product 3
    Client 4 15.02.1934 Product 1
    Client 4 15.02.1934 Product 2
    Client 5 16.02.1934 Product 1

    So the Output in the Workflow is:
    Row  Client        Date                Prod.1  Prod.2  Prod.3  Prod.4  Prod5
    1       Client 1 14.02.1980     3.0         0.0       1.0     1.0       1.0
    2       Client 1 15.02.1980     1.0         0.0       0.0     0.0       0.0
    3       Client 1 13.02.1934     1.0         1.0       1.0     1.0       0.0
    4       Client 3 14.02.1934     1.0         1.0       1.0      0.0     0.0
    5       Client 4 15.02.1934     1.0         1.0       0.0     0.0       0.0
    6       Client 5 16.02.1934     1.0         0.0       0.0     0.0       0.0


    It Aggregates the number of the Products..

    So what i want to do is..group by Date (it´s correct here) and..:

          1.  if a Cust. buys a few Products on a same Date, it is one Dataset ( Transaktion) for the Table.  ( so i will know, wich Products are bought
                together) -->correct

    2. So the grouping by date is here correkt.. because if the same Client buys another day, it is a new row and a new Transaction.

    MY PROBLEM IS:

    --> I dont want the sum of the Product.. i just want a "1" for buy.. "0" for not buy

    how can i do this? daddock?? :)
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    You can add a discretizing operator, which puts values in bands, like this...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="251" width="815">
          <operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="46" y="58">
            <parameter key="file_name" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\RM5\a.csv"/>
            <parameter key="column_separators" value=","/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Product"/>
            <parameter key="use_underscore_in_name" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value="Pr.*"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.0.11" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
            <list key="aggregation_attributes">
              <parameter key="Product_Product 1" value="sum"/>
              <parameter key="Product_Product 2" value="sum"/>
              <parameter key="Product_Product 3" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="Client|Date"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="581" y="30">
            <parameter key="replace_what" value="sum\(Product_|\)"/>
          </operator>
          <operator activated="true" class="discretize_by_user_specification" compatibility="5.0.11" expanded="true" height="94" name="Discretize" width="90" x="701" y="34">
            <list key="classes">
              <parameter key="0" value="0.0"/>
              <parameter key="1" value="Infinity"/>
            </list>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.