Combining multiple, segmented models

keith_drakekeith_drake Member Posts: 11 Contributor II
edited November 2018 in Help
I want to employ a strategy that builds 24 separate, independent models by segmenting the input feature space by two variables: one into three segments and the other into eight segments: 3 x 8 = 24.

I know how to use the Filter Examples operator to do the incoming splits and then how to use X-Validation, Backward Elimination, etc., to do the modeling. But what operators can I use to select from among the resulting 24 models, so that I can run my hold-out data through them/it?

My approach will result in 24 different models, each derived from a different segment of the original data. When I process new (hold-out) data, only one of the 24 models will be appropriate to use since it was trained using the same segment of data (out of 24) as the new example. The other 23 models should be ignored.

My challenge is determining overall performance statistics for the 24 models automatically--without running each model separately and manually putting together the individual results to manually calculate RMSE, AE, etc.

If you can point me in the right direction by suggesting the operator(s) I should look at, that is all I need.

Thanks!

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hey,

    initially i thought this would be easy.. but well - it is more complicated then i thought. But maybe there is an easier way to do this.
    Check the attached process. You might need to add a handle exception operator if it is possible ot have not every segment available during testing.

    ~Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.5.002" expanded="true" height="60" name="Retrieve Deals" width="90" x="112" y="210">
            <parameter key="repository_entry" value="//Samples/data/Deals"/>
          </operator>
          <operator activated="false" class="productivity:execute_process" compatibility="6.5.002" expanded="true" height="60" name="Execute Segmentation Process" width="90" x="313" y="435">
            <parameter key="process_location" value="//Local Repository/Forum/KCD43/processes/Segmentation Process"/>
            <list key="macros"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.5.002" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="210">
            <list key="function_descriptions">
              <parameter key="Segment" value="if(Age &lt; 50, &quot;Young&quot;,&quot;Old&quot;)"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="6.5.002" expanded="true" height="76" name="Set Role" width="90" x="380" y="210">
            <parameter key="attribute_name" value="Segment"/>
            <parameter key="target_role" value="cluster"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="6.5.002" expanded="true" height="94" name="Multiply" width="90" x="514" y="210"/>
          <operator activated="true" class="remove_duplicates" compatibility="6.5.002" expanded="true" height="76" name="Remove Duplicates" width="90" x="648" y="345">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Segment"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="remember" compatibility="6.5.002" expanded="true" height="60" name="Remember" width="90" x="782" y="345">
            <parameter key="name" value="Segments"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="6.5.002" expanded="true" height="112" name="Validation" width="90" x="916" y="120">
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="6.5.002" expanded="true" height="60" name="Recall" width="90" x="45" y="30">
                <parameter key="name" value="Segments"/>
                <parameter key="remove_from_store" value="false"/>
                <description align="center" color="transparent" colored="false" width="126">Only to keep track</description>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="6.5.002" expanded="true" height="60" name="Extract Macro (4)" width="90" x="179" y="30">
                <parameter key="macro" value="numberOfSegments"/>
                <list key="additional_macros"/>
              </operator>
              <operator activated="true" class="loop" compatibility="6.5.002" expanded="true" height="94" name="Loop" width="90" x="246" y="120">
                <parameter key="set_iteration_macro" value="true"/>
                <parameter key="iterations" value="%{numberOfSegments}"/>
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="6.5.002" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
                    <parameter key="macro" value="currentSegment"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="attribute_name" value="Segment"/>
                    <parameter key="example_index" value="%{iteration}"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="filter_examples" compatibility="6.5.002" expanded="true" height="94" name="Filter Examples" width="90" x="380" y="165">
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="Segment.equals.%{currentSegment}"/>
                    </list>
                  </operator>
                  <operator activated="true" class="parallel_decision_tree" compatibility="6.5.002" expanded="true" height="76" name="Decision Tree" width="90" x="581" y="165"/>
                  <connect from_port="input 1" to_op="Extract Macro" to_port="example set"/>
                  <connect from_port="input 2" to_op="Filter Examples" to_port="example set input"/>
                  <connect from_op="Filter Examples" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="126"/>
                  <portSpacing port="source_input 3" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="training" to_op="Loop" to_port="input 2"/>
              <connect from_op="Recall" from_port="result" to_op="Extract Macro (4)" to_port="example set"/>
              <connect from_op="Extract Macro (4)" from_port="example set" to_op="Loop" to_port="input 1"/>
              <connect from_op="Loop" from_port="output 1" to_port="model"/>
              <portSpacing port="source_training" spacing="108"/>
              <portSpacing port="sink_model" spacing="90"/>
              <portSpacing port="sink_through 1" spacing="36"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="6.5.002" expanded="true" height="60" name="Recall (2)" width="90" x="45" y="30">
                <parameter key="name" value="Segments"/>
                <parameter key="remove_from_store" value="false"/>
                <description align="center" color="transparent" colored="false" width="126">Only to keep track</description>
              </operator>
              <operator activated="false" class="loop_clusters" compatibility="6.5.002" expanded="true" height="112" name="Loop Clusters (2)" width="90" x="179" y="435">
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="6.5.002" expanded="true" height="60" name="Extract Macro (2)" width="90" x="45" y="30">
                    <parameter key="macro" value="currentSegment"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="attribute_name" value="Segment"/>
                    <parameter key="example_index" value="1"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="filter_examples" compatibility="6.5.002" expanded="true" height="94" name="Filter Examples (2)" width="90" x="246" y="165">
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="Segment.equals.%{currentSegment}"/>
                    </list>
                  </operator>
                  <operator activated="true" class="apply_model" compatibility="6.5.002" expanded="true" height="76" name="Apply Model" width="90" x="514" y="75">
                    <list key="application_parameters"/>
                  </operator>
                  <connect from_port="cluster subset" to_op="Extract Macro (2)" to_port="example set"/>
                  <connect from_port="in 1" to_op="Apply Model" to_port="model"/>
                  <connect from_port="in 2" to_op="Filter Examples (2)" to_port="example set input"/>
                  <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_port="out 1"/>
                  <portSpacing port="source_cluster subset" spacing="0"/>
                  <portSpacing port="source_in 1" spacing="54"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="source_in 3" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="loop" compatibility="6.5.002" expanded="true" height="112" name="Loop (2)" width="90" x="246" y="120">
                <parameter key="set_iteration_macro" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="6.5.002" expanded="true" height="60" name="Extract Macro (3)" width="90" x="112" y="30">
                    <parameter key="macro" value="currentSegment"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="attribute_name" value="Segment"/>
                    <parameter key="example_index" value="1"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="filter_examples" compatibility="6.5.002" expanded="true" height="94" name="Filter Examples (3)" width="90" x="380" y="255">
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="Segment.equals.%{currentSegment}"/>
                    </list>
                  </operator>
                  <operator activated="true" class="select" compatibility="6.5.002" expanded="true" height="60" name="Select" width="90" x="380" y="165">
                    <parameter key="index" value="%{iteration}"/>
                  </operator>
                  <operator activated="true" class="apply_model" compatibility="6.5.002" expanded="true" height="76" name="Apply Model (2)" width="90" x="648" y="165">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="false" breakpoints="before" class="parallel_decision_tree" compatibility="6.5.002" expanded="true" height="76" name="Decision Tree (2)" width="90" x="581" y="390"/>
                  <connect from_port="input 1" to_op="Extract Macro (3)" to_port="example set"/>
                  <connect from_port="input 2" to_op="Select" to_port="collection"/>
                  <connect from_port="input 3" to_op="Filter Examples (3)" to_port="example set input"/>
                  <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
                  <connect from_op="Select" from_port="selected" to_op="Apply Model (2)" to_port="model"/>
                  <connect from_op="Apply Model (2)" from_port="labelled data" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="108"/>
                  <portSpacing port="source_input 3" spacing="36"/>
                  <portSpacing port="source_input 4" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="append" compatibility="6.5.002" expanded="true" height="76" name="Append" width="90" x="380" y="120"/>
              <operator activated="true" class="performance_classification" compatibility="6.5.002" expanded="true" height="76" name="Performance" width="90" x="581" y="120">
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Loop (2)" to_port="input 2"/>
              <connect from_port="test set" to_op="Loop (2)" to_port="input 3"/>
              <connect from_op="Recall (2)" from_port="result" to_op="Loop (2)" to_port="input 1"/>
              <connect from_op="Loop (2)" from_port="output 1" to_op="Append" to_port="example set 1"/>
              <connect from_op="Append" from_port="merged set" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="90"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Deals" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Remove Duplicates" to_port="example set input"/>
          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Remember" to_port="store"/>
          <connect from_op="Validation" from_port="model" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="178" resized="true" width="263" x="232" y="134">Do the segmentation (also possible via clustering)</description>
          <description align="center" color="yellow" colored="false" height="193" resized="true" width="323" x="620" y="272">Store an example per segment to keep track</description>
        </process>
      </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • keith_drakekeith_drake Member Posts: 11 Contributor II
    Thanks so much Martin, I appreciate the guidance! Could you point me in the right direction regarding implementing your code? I've not looked at code in many years, but want to refine my technical skills for sure.  Keith
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi,

    simply copy the XML to the XML view of rapidminer and press the check box.
    To see the xml view, just go to view->show view->xml

    You also could safe it as XML and use File->Import Process.

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.