The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

Changing the attribute to be predicted

jbarrickjbarrick Member Posts: 2 Contributor I
edited November 2018 in Help
I've only been working with RapidMiner for a short amount of time, so this may be a simple question to answer -

When modeling using both the Default Model and the k-NN Model, I can not seem to change which attribute is being predicted. For the model I'm attempting to create, I used "Generate Sales Data" and added a "Total Price" attribute (this is all from the example in the User Manual). When I added a model, it only predicted the transaction number, which is only a label that goes from 1 to 100, which means that there's really nothing to predict. I was wondering how to change this so that the model instead predicts a value for a different attribute that makes more sense, such as the total price. These attributes vary within a range so it would make more sense to predict these.

Thanks in advance for any help! I really appreciate it!

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    The "Set Role" operator is the one you want. You need to use this to change the role to "label" for the attribute you want to use as the supervised example from which the algorithm can learn a model. Be warned though that the "generate sales data" operator doesn't, as far as I can tell, generate anything apart from random data so any models will fail to predict anything meaningful.

    If however, you've created a new attribute such as "total_sales" based on "amount*single_price" and you use this as the label with "amount"  and "single_price" as regular attributes then your model will try to predict a value for "total_sales" based on the relationship between the input attributes. An algorithm like neural networks should be able to work out that "total_sales" is indeed "amount*single_price". This is not tremendously helpful but illustrates the point.

    For fun I've attached a process that shows what I mean.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
        <process expanded="true" height="476" width="882">
          <operator activated="true" class="generate_sales_data" compatibility="5.1.001" expanded="true" height="60" name="Generate Sales Data" width="90" x="45" y="75">
            <parameter key="number_examples" value="1000"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.1.001" expanded="true" height="76" name="Generate Attributes" width="90" x="112" y="165">
            <list key="function_descriptions">
              <parameter key="total_price" value="amount*single_price"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="112" y="255">
            <parameter key="name" value="total_price"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="345">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="product_id|amount|single_price|total_price"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="split_data" compatibility="5.1.001" expanded="true" height="94" name="Split Data" width="90" x="246" y="345">
            <enumeration key="partitions">
              <parameter key="ratio" value="0.3"/>
              <parameter key="ratio" value="0.7"/>
            </enumeration>
          </operator>
          <operator activated="true" class="neural_net" compatibility="5.1.001" expanded="true" height="76" name="Neural Net" width="90" x="380" y="210">
            <list key="hidden_layers"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model (2)" width="90" x="514" y="345">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="5.1.001" expanded="true" height="76" name="Performance" width="90" x="648" y="210">
            <parameter key="squared_error" value="true"/>
            <parameter key="correlation" value="true"/>
            <parameter key="squared_correlation" value="true"/>
          </operator>
          <connect from_op="Generate Sales Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Split Data" to_port="example set"/>
          <connect from_op="Split Data" from_port="partition 1" to_op="Neural Net" to_port="training set"/>
          <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Neural Net" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Apply Model (2)" from_port="model" to_port="result 3"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <connect from_op="Performance" from_port="example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="270"/>
          <portSpacing port="sink_result 2" spacing="54"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    Plot "total_price" against "prediction(total_price)" and you should get a straight line that shows the prediction is very close indeed to the correct value.

    You'll notice that I filter out most of the attributes, the reason is that neural networks can't handle nominal values. You'll find this to be one of the headaches as you learn the product; namely which algorithms handle which attribute types.

    regards,

    Andrew
Sign In or Register to comment.