Copy and Paste Attribute List After Weighting

scooter7scooter7 Member Posts: 8 Contributor II
edited November 2019 in Help

After using the "nominal to binominal" operator to create hundreds of new variables and then using the "select by weight" operator, I have hundreds of variables that I would like to save using the "select attributes" operator with the "subset" option. I copied the variables from the weights table into Excel and have been manually copying and pasting them in (my goal is to use this subset to score unseen data in the future). Is there an easier way to do this other than manually copying and pasting one variable at a time?

The issue is that, using a split validation, I have a nice model working. But, the training data has been through a lot of preprocessing. When I try to apply the exact same preprocessing steps to the new data, the prediction scoring is not working (probably due to the fact that I am selecting variables based on the chi square weights of the training data that are not the same as the weights of the scoring data). I figured that selecting the variables and using them in a new process without trying to apply the same type of weight-based preprocessing would hopefully eliminate the scoring errors that i have been experiencing. Thanks!


  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Could you store your weights from your training preprocessing steps and use select by weights? 

    Alternatively one way I can think of is to store a single example (you can do the whole lot if you feel like it) of your training data set as a reference and then use Reorder Attributes on your Scoring Data to select only the desired attributes. 
    See this hastily worked example.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          <operator activated="true" class="subprocess" compatibility="5.3.015" expanded="true" height="76" name="TrainingSetSimulation" width="90" x="45" y="120">
            <description>This uses a Select Attibutes Operator to simulate your training set.  </description>
            <process expanded="true">
              <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Iris (2)" width="90" x="45" y="30">
                <parameter key="repository_entry" value="//Samples/data/Iris"/>
              <operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="|a1|a4|id|label"/>
                <parameter key="include_special_attributes" value="true"/>
              <operator activated="true" class="filter_example_range" compatibility="5.3.015" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="30">
                <description>You only need one example. 

    You would store this result with the model to use in preprocessing as a reference set in Reorder Attributes.</description>
                <parameter key="first_example" value="1"/>
                <parameter key="last_example" value="1"/>
              <connect from_op="Retrieve Iris (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
              <connect from_op="Filter Example Range" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
          <operator activated="true" class="order_attributes" compatibility="5.3.015" expanded="true" height="76" name="Reorder Attributes" width="90" x="246" y="30">
            <parameter key="sort_mode" value="reference data"/>
            <parameter key="handle_unmatched" value="remove"/>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="TrainingSetSimulation" from_port="out 1" to_op="Reorder Attributes" to_port="reference_data"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="90"/>
          <portSpacing port="sink_result 2" spacing="0"/>
Sign In or Register to comment.