Generate attribute

mario_sarkmario_sark Member Posts: 13 Contributor I
Hello!

I am trying the add a column in order to check if an attribute is in between it's [Average -/+ one Standard Deviation]  give it a value of "2" if it is less then Average- one Standard Deviation give it a value of "1"  and finally if it is Greater then Average+Standard deviation  give it a value of "3". Please check the example below.

ID      X1         Average of X1                  Standard deviation o fX1      Average - STD        Average + STD                 Generated Attribute
1    1,000                  2,417                                         1,017                  1,400                   3,434                                         1
2    2,000                  2,417                                         1,017                  1,400                   3,434                                         2
3    2,000                  2,417                                         1,017                  1,400                   3,434                                         2
4    3,500                  2,417                                         1,017                  1,400                   3,434                                         3
5    4,000                  2,417                                         1,017                  1,400                   3,434                                         3
6    2,000                  2,417                                         1,017                  1,400                   3,434                                         1

Thank you, 
Mario

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @mario_sark,

    You can use Generate attributes operator: 

    Here the process with a sample of your data : 
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.2.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="X1,average-std,average+std&#10;1,1.4,3.434&#10;2,1.4,3.434&#10;3.5,1.4,3.434"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
            <list key="function_descriptions">
              <parameter key="generated_attribute" value="if(X1&lt;[average-std],1,if(X1&gt;[average+std],3,2))"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    Hope this helps,

    Regards,

    Lionel

  • mario_sarkmario_sark Member Posts: 13 Contributor I
    Hi @lionelderkrikor  Thank you for you reply, 

    but I forget to mention that i don't have the Average and the standard Deviation columns in my data, however I need to generate it, how can i do that ? (repeat the Average and Standard Deviation of X1 in a single columns like i shared the table above?)

    Hope  that you can help with this!
    Thank you, 
    Mario
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    In this case you can extract the average and the standard deviation values from your data with the operator Extract Macro and use the macros in the Generate Attributes operator.  You can also do that in a Loop Attributes operator if you want to perform the same operation on multiple columns.
    Below is a little example process.
    Hope this helps,
    Ingo
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"><br>&nbsp; <context><br>&nbsp;&nbsp;&nbsp; <input/><br>&nbsp;&nbsp;&nbsp; <output/><br>&nbsp;&nbsp;&nbsp; <macros/><br>&nbsp; </context><br>&nbsp; <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"><br>&nbsp;&nbsp;&nbsp; <parameter key="logverbosity" value="init"/><br>&nbsp;&nbsp;&nbsp; <parameter key="random_seed" value="2001"/><br>&nbsp;&nbsp;&nbsp; <parameter key="send_mail" value="never"/><br>&nbsp;&nbsp;&nbsp; <parameter key="notification_email" value=""/><br>&nbsp;&nbsp;&nbsp; <parameter key="process_duration_for_mail" value="30"/><br>&nbsp;&nbsp;&nbsp; <parameter key="encoding" value="UTF-8"/><br>&nbsp;&nbsp;&nbsp; <process expanded="true"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="repository_entry" value="//Samples/data/Iris"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </operator><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <operator activated="true" class="extract_macro" compatibility="9.2.001" expanded="true" height="68" name="Extract Avg" width="90" x="179" y="34"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="macro" value="a1_avg"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="macro_type" value="statistics"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="statistics" value="average"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="attribute_name" value="a1"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <list key="additional_macros"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </operator><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <operator activated="true" class="extract_macro" compatibility="9.2.001" expanded="true" height="68" name="Extract SD" width="90" x="313" y="34"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="macro" value="a1_std_dev"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="macro_type" value="statistics"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="statistics" value="deviation"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="attribute_name" value="a1"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <list key="additional_macros"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </operator><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <list key="function_descriptions"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="a1_new" value="if(a1&lt;eval(%{a1_avg})-eval(%{a1_std_dev}),1,if(a1&gt;eval(%{a1_avg})+eval(%{a1_std_dev}),3,2))"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </list><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="keep_all" value="true"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </operator><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Retrieve Iris" from_port="output" to_op="Extract Avg" to_port="example set"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Extract Avg" from_port="example set" to_op="Extract SD" to_port="example set"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Extract SD" from_port="example set" to_op="Generate Attributes" to_port="example set input"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <portSpacing port="source_input 1" spacing="0"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <portSpacing port="sink_result 1" spacing="0"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <portSpacing port="sink_result 2" spacing="0"/><br>&nbsp;&nbsp;&nbsp; </process><br>&nbsp; </operator><br></process>
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    To accomplish this even more easily, you can also just use Normalize operator with Z-score transformation method, and then it will compute the number of standard deviations from the mean for you automatically!
    You could then use Generate Attributes to recode this (e.g., round it, truncate it, take the absolute value, etc.) as desired.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.