Create the right data out of a warehouse return dataset.

ferdinand_papaferdinand_papa Member Posts: 7 Contributor I
Hello,

first of all excuse my bad English. The purpose of my question is that we have to do a of thesis including a data analysis with rapid miner. 

I have a dataset with 20146 Customers, the set includes about 60 attributes but just 3 of them are relevant. Let me try to explain it to you in this way. The whole data set is about return rates in warehouse trade. In simple words, how many articles did a customer order and how many of them is he returning instead of buying. 
He gave us the following parameters,  >40% high return rate, <18 % low return rate, and the 22% in the middle are neither high or low. So we have 3 different classes of customers. they are supposed to be classified with the value : H - high return rate, N - low return rate, - U - unidentified

We have the customer number, the delivered amount of products and the returned amount of products for each customer. 

The OUTCOME data needs to be like that: <Customer Number>, <Class (H/N/U)>

                                                                             230823,                    N 

                                                                             230824,                    H 

                                                                             230825,                    U 

I managed to create a data set that includes the customer number and N and H but I can't define the 22% class that is between high and low. I tried with if function, generate attributes, and so on. Another problem is that when I try to do it with generate attributes it just gives me true or false and that doesn't help me really much.


Does anyone has an idea of how to solve this? I am pretty desperate.  I hope you can help me :)

regards


<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">

  <context>

    <input/>

    <output/>

    <macros/>

  </context>

  <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">

    <parameter key="logverbosity" value="init"/>

    <parameter key="random_seed" value="2001"/>

    <parameter key="send_mail" value="never"/>

    <parameter key="notification_email" value=""/>

    <parameter key="process_duration_for_mail" value="30"/>

    <parameter key="encoding" value="SYSTEM"/>

    <process expanded="true">

      <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve retouren_train" width="90" x="45" y="34">

        <parameter key="repository_entry" value="//Local Repository/retouren_train"/>

      </operator>

      <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85">

        <parameter key="attribute_filter_type" value="subset"/>

        <parameter key="attribute" value=""/>

        <parameter key="attributes" value="RETOUREN_MENGE|LIEFER_MENGE|KDNR"/>

        <parameter key="use_except_expression" value="false"/>

        <parameter key="value_type" value="attribute_value"/>

        <parameter key="use_value_type_exception" value="false"/>

        <parameter key="except_value_type" value="time"/>

        <parameter key="block_type" value="attribute_block"/>

        <parameter key="use_block_type_exception" value="false"/>

        <parameter key="except_block_type" value="value_matrix_row_start"/>

        <parameter key="invert_selection" value="false"/>

        <parameter key="include_special_attributes" value="false"/>

      </operator>

      <operator activated="true" class="generate_attributes" compatibility="9.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="85">

        <list key="function_descriptions">

          <parameter key="Niedrigretournierer" value="RETOUREN_MENGE/LIEFER_MENGE*100&lt;18"/>

          <parameter key="Hochretournierer" value="RETOUREN_MENGE/LIEFER_MENGE*100&gt;40"/>

        </list>

        <parameter key="keep_all" value="true"/>

      </operator>

      <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace" width="90" x="648" y="187">

        <parameter key="attribute_filter_type" value="single"/>

        <parameter key="attribute" value="Hochretournierer"/>

        <parameter key="attributes" value="Hochretournierer"/>

        <parameter key="use_except_expression" value="false"/>

        <parameter key="value_type" value="nominal"/>

        <parameter key="use_value_type_exception" value="false"/>

        <parameter key="except_value_type" value="file_path"/>

        <parameter key="block_type" value="single_value"/>

        <parameter key="use_block_type_exception" value="false"/>

        <parameter key="except_block_type" value="single_value"/>

        <parameter key="invert_selection" value="false"/>

        <parameter key="include_special_attributes" value="true"/>

        <parameter key="replace_what" value="true"/>

        <parameter key="replace_by" value="H"/>

      </operator>

      <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (2)" width="90" x="246" y="238">

        <parameter key="attribute_filter_type" value="single"/>

        <parameter key="attribute" value="Niedrigretournierer"/>

        <parameter key="attributes" value=""/>

        <parameter key="use_except_expression" value="false"/>

        <parameter key="value_type" value="nominal"/>

        <parameter key="use_value_type_exception" value="false"/>

        <parameter key="except_value_type" value="file_path"/>

        <parameter key="block_type" value="single_value"/>

        <parameter key="use_block_type_exception" value="false"/>

        <parameter key="except_block_type" value="single_value"/>

        <parameter key="invert_selection" value="false"/>

        <parameter key="include_special_attributes" value="false"/>

        <parameter key="replace_what" value="true"/>

        <parameter key="replace_by" value="N"/>

      </operator>

      <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (3)" width="90" x="313" y="340">

        <parameter key="attribute_filter_type" value="single"/>

        <parameter key="attribute" value="Hochretournierer"/>

        <parameter key="attributes" value=""/>

        <parameter key="use_except_expression" value="false"/>

        <parameter key="value_type" value="nominal"/>

        <parameter key="use_value_type_exception" value="false"/>

        <parameter key="except_value_type" value="file_path"/>

        <parameter key="block_type" value="single_value"/>

        <parameter key="use_block_type_exception" value="false"/>

        <parameter key="except_block_type" value="single_value"/>

        <parameter key="invert_selection" value="false"/>

        <parameter key="include_special_attributes" value="false"/>

        <parameter key="replace_what" value="false"/>

        <parameter key="replace_by" value="N"/>

      </operator>

      <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (4)" width="90" x="514" y="238">

        <parameter key="attribute_filter_type" value="single"/>

        <parameter key="attribute" value="Niedrigretournierer"/>

        <parameter key="attributes" value=""/>

        <parameter key="use_except_expression" value="false"/>

        <parameter key="value_type" value="nominal"/>

        <parameter key="use_value_type_exception" value="false"/>

        <parameter key="except_value_type" value="file_path"/>

        <parameter key="block_type" value="single_value"/>

        <parameter key="use_block_type_exception" value="false"/>

        <parameter key="except_block_type" value="single_value"/>

        <parameter key="invert_selection" value="false"/>

        <parameter key="include_special_attributes" value="false"/>

        <parameter key="replace_what" value="false"/>

        <parameter key="replace_by" value="H"/>

      </operator>

      <operator activated="true" class="set_role" compatibility="9.1.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="391">

        <parameter key="attribute_name" value="KDNR"/>

        <parameter key="target_role" value="label"/>

        <list key="set_additional_roles">

          <parameter key="Hochretournierer" value="prediction"/>

          <parameter key="Niedrigretournierer" value="prediction"/>

        </list>

      </operator>

      <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="581" y="391">

        <parameter key="attribute_filter_type" value="subset"/>

        <parameter key="attribute" value=""/>

        <parameter key="attributes" value="KDNR|Niedrigretournierer"/>

        <parameter key="use_except_expression" value="false"/>

        <parameter key="value_type" value="attribute_value"/>

        <parameter key="use_value_type_exception" value="false"/>

        <parameter key="except_value_type" value="time"/>

        <parameter key="block_type" value="attribute_block"/>

        <parameter key="use_block_type_exception" value="false"/>

        <parameter key="except_block_type" value="value_matrix_row_start"/>

        <parameter key="invert_selection" value="false"/>

        <parameter key="include_special_attributes" value="false"/>

      </operator>

      <operator activated="true" class="rename" compatibility="9.1.000" expanded="true" height="82" name="Rename" width="90" x="681" y="289">

        <parameter key="old_name" value="Niedrigretournierer"/>

        <parameter key="new_name" value="Einteilung Hoch-/Niedrigretournierer"/>

        <list key="rename_additional_attributes"/>

      </operator>

      <connect from_op="Retrieve retouren_train" from_port="output" to_op="Select Attributes" to_port="example set input"/>

      <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>

      <connect from_op="Generate Attributes" from_port="example set output" to_op="Replace" to_port="example set input"/>

      <connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>

      <connect from_op="Replace (2)" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>

      <connect from_op="Replace (3)" from_port="example set output" to_op="Replace (4)" to_port="example set input"/>

      <connect from_op="Replace (4)" from_port="example set output" to_op="Set Role" to_port="example set input"/>

      <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>

      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>

      <connect from_op="Rename" from_port="example set output" to_port="result 1"/>

      <portSpacing port="source_input 1" spacing="0"/>

      <portSpacing port="sink_result 1" spacing="0"/>

      <portSpacing port="sink_result 2" spacing="0"/>

    </process>

  </operator>

</process>


Answers

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 230  RM Data Scientist
    Hi @ferdinand_papa,

    Thanks for sharing the process. I can not run the process to check the logics without input data. But from your formula, I guess a nested if() statement will work for your case.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve retouren_train" width="90" x="45" y="136">
            <parameter key="repository_entry" value="//Local Repository/retouren_train"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="136">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="RETOUREN_MENGE|LIEFER_MENGE|KDNR"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.1.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="380" y="136">
            <list key="function_descriptions">
              <parameter key="Class" value="if(RETOUREN_MENGE/LIEFER_MENGE*100&lt;18, Niedrigretournierer, if(RETOUREN_MENGE/LIEFER_MENGE*100&gt;40,))"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="false" class="generate_attributes" compatibility="9.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="380" y="34">
            <list key="function_descriptions">
              <parameter key="Niedrigretournierer" value="RETOUREN_MENGE/LIEFER_MENGE*100&lt;18"/>
              <parameter key="Hochretournierer" value="RETOUREN_MENGE/LIEFER_MENGE*100&gt;40"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace" width="90" x="514" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Hochretournierer"/>
            <parameter key="attributes" value="Hochretournierer"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="replace_what" value="true"/>
            <parameter key="replace_by" value="H"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (2)" width="90" x="715" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Niedrigretournierer"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="true"/>
            <parameter key="replace_by" value="N"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (3)" width="90" x="849" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Hochretournierer"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="false"/>
            <parameter key="replace_by" value="N"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (4)" width="90" x="983" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Niedrigretournierer"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="false"/>
            <parameter key="replace_by" value="H"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.1.000" expanded="true" height="82" name="Set Role" width="90" x="1117" y="136">
            <parameter key="attribute_name" value="KDNR"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles">
              <parameter key="Hochretournierer" value="prediction"/>
              <parameter key="Niedrigretournierer" value="prediction"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="1318" y="136">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="KDNR|Niedrigretournierer"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.1.000" expanded="true" height="82" name="Rename" width="90" x="1452" y="136">
            <parameter key="old_name" value="Niedrigretournierer"/>
            <parameter key="new_name" value="Einteilung Hoch-/Niedrigretournierer"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <connect from_op="Retrieve retouren_train" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
          <connect from_op="Replace (2)" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
          <connect from_op="Replace (3)" from_port="example set output" to_op="Replace (4)" to_port="example set input"/>
          <connect from_op="Replace (4)" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    HTH!

    YY
    lionelderkrikor
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 895   Unicorn
    HI @ferdinand_papa,

    An other solution is to use the Discretize by User Specification operator :
    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
            <parameter key="generator_type" value="comma_separated_text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="input_csv_text" value="Customer_number, return_rate&#10;1,99&#10;2,70&#10;3,39&#10;4,19&#10;5,16&#10;6,4"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="discretize_by_user_specification" compatibility="9.1.000" expanded="true" height="103" name="Discretize" width="90" x="313" y="85">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="return_rate"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <list key="classes">
              <parameter key="U" value="18.0"/>
              <parameter key="N" value="40.0"/>
              <parameter key="H" value="101.0"/>
            </list>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    
    Hope it helps,

    Regards,

    Lionel
     


    yyhuang
  • ferdinand_papaferdinand_papa Member Posts: 7 Contributor I
    That just worked perfectly fine! thank you maybe you can help me with the 2nd task as well?
    He gave us a 2nd dataset but in that data set the attributes "RETOUREN_MENGE" and "LIEFER_MENGE" are missing but we have like 69 other attributes. So we have 2 datasets one already has the outcome data that we want to have and the 2nd dataset doesn't contain this data. So he wants us to to a prognosis from the first dataset onto the 2nd one. So basically we need to find out some kind of correlation or s.th similar and transfer that knowledge onto the 2nd dataset so I can get the same outcome. I attached the 2 datasets maybe you can give me a hint on how to do that? :)

Sign In or Register to comment.