HOW to filter data with two models applied in squence

inceptorfullinceptorfull Member Posts: 44 Contributor II
edited November 2018 in Help
hi all, I am trying to apply NEural network for Credit default problem, so I applied NN and got me 85% accuracy,

now I want to use such results to be more filtered using KNN to get higher accuracy or closer case to the predicated case for more confirmation of the predication
so how to do that in rapid miner?

NN--- KNN? and

what if want to do in reverse order so let KNN assign nearest neighbour then let NN predicat from such neighours? is that possible?
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Is the general idea that you want to  learn a neural net per neighourhood and use this to score that single customer?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • inceptorfullinceptorfull Member Posts: 44 Contributor II
    I can use it in that way so NN learn per the neighourhood of KNN  

    OR

    let NN learn then Filter the NN results by the neighourhood  so the output of NN be the input for KNN to get new predication or Closest neighbours for reassign the bad aaplicant in the default problem to good again ( if they really good applicant)


    it will be great to give me overall idea how to implement both to get the better accuracy
    thanks a lot for your support
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi,

    have a look at the attached process. I built this a while ago for a forum user. It runs a learner per cluster calculated with k-means.  this is somehow pretty close to what you want to do.

    Best,
    Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Retrieve Deals" width="90" x="112" y="210">
            <parameter key="repository_entry" value="//Samples/data/Deals"/>
          </operator>
          <operator activated="true" class="k_means" compatibility="7.0.001" expanded="true" height="82" name="Clustering" width="90" x="246" y="187">
            <parameter key="k" value="20"/>
            <parameter key="measure_types" value="MixedMeasures"/>
          </operator>
          <operator activated="false" class="generate_attributes" compatibility="7.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="340">
            <list key="function_descriptions">
              <parameter key="Segment" value="if(Age &lt; 50, &quot;Young&quot;,&quot;Old&quot;)"/>
            </list>
          </operator>
          <operator activated="false" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="340">
            <parameter key="attribute_name" value="Segment"/>
            <parameter key="target_role" value="cluster"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="rename" compatibility="7.0.001" expanded="true" height="82" name="Rename" width="90" x="380" y="187">
            <parameter key="old_name" value="cluster"/>
            <parameter key="new_name" value="Segment"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="7.0.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="210"/>
          <operator activated="true" class="remove_duplicates" compatibility="7.0.001" expanded="true" height="82" name="Remove Duplicates" width="90" x="648" y="345">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Segment"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="remember" compatibility="7.0.001" expanded="true" height="68" name="Remember" width="90" x="782" y="345">
            <parameter key="name" value="Segments"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="7.0.001" expanded="true" height="124" name="Validation" width="90" x="916" y="120">
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="7.0.001" expanded="true" height="60" name="Recall" width="90" x="45" y="30">
                <parameter key="name" value="Segments"/>
                <parameter key="remove_from_store" value="false"/>
                <description align="center" color="transparent" colored="false" width="126">Only to keep track</description>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="7.0.001" expanded="true" height="60" name="Extract Macro (4)" width="90" x="179" y="30">
                <parameter key="macro" value="numberOfSegments"/>
                <list key="additional_macros"/>
              </operator>
              <operator activated="true" class="loop" compatibility="7.0.001" expanded="true" height="94" name="Loop" width="90" x="246" y="120">
                <parameter key="set_iteration_macro" value="true"/>
                <parameter key="iterations" value="%{numberOfSegments}"/>
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="7.0.001" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
                    <parameter key="macro" value="currentSegment"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="attribute_name" value="Segment"/>
                    <parameter key="example_index" value="%{iteration}"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="94" name="Filter Examples" width="90" x="380" y="165">
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="Segment.equals.%{currentSegment}"/>
                    </list>
                  </operator>
                  <operator activated="true" class="parallel_decision_tree" compatibility="7.0.001" expanded="true" height="76" name="Decision Tree" width="90" x="581" y="165"/>
                  <connect from_port="input 1" to_op="Extract Macro" to_port="example set"/>
                  <connect from_port="input 2" to_op="Filter Examples" to_port="example set input"/>
                  <connect from_op="Filter Examples" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="126"/>
                  <portSpacing port="source_input 3" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="training" to_op="Loop" to_port="input 2"/>
              <connect from_op="Recall" from_port="result" to_op="Extract Macro (4)" to_port="example set"/>
              <connect from_op="Extract Macro (4)" from_port="example set" to_op="Loop" to_port="input 1"/>
              <connect from_op="Loop" from_port="output 1" to_port="model"/>
              <portSpacing port="source_training" spacing="108"/>
              <portSpacing port="sink_model" spacing="90"/>
              <portSpacing port="sink_through 1" spacing="36"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="7.0.001" expanded="true" height="60" name="Recall (2)" width="90" x="45" y="30">
                <parameter key="name" value="Segments"/>
                <parameter key="remove_from_store" value="false"/>
                <description align="center" color="transparent" colored="false" width="126">Only to keep track</description>
              </operator>
              <operator activated="false" class="loop_clusters" compatibility="7.0.001" expanded="true" height="112" name="Loop Clusters (2)" width="90" x="179" y="435">
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="7.0.001" expanded="true" height="60" name="Extract Macro (2)" width="90" x="45" y="30">
                    <parameter key="macro" value="currentSegment"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="attribute_name" value="Segment"/>
                    <parameter key="example_index" value="1"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="94" name="Filter Examples (2)" width="90" x="246" y="165">
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="Segment.equals.%{currentSegment}"/>
                    </list>
                  </operator>
                  <operator activated="true" class="apply_model" compatibility="7.0.001" expanded="true" height="76" name="Apply Model" width="90" x="514" y="75">
                    <list key="application_parameters"/>
                  </operator>
                  <connect from_port="cluster subset" to_op="Extract Macro (2)" to_port="example set"/>
                  <connect from_port="in 1" to_op="Apply Model" to_port="model"/>
                  <connect from_port="in 2" to_op="Filter Examples (2)" to_port="example set input"/>
                  <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_port="out 1"/>
                  <portSpacing port="source_cluster subset" spacing="0"/>
                  <portSpacing port="source_in 1" spacing="54"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="source_in 3" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="loop" compatibility="7.0.001" expanded="true" height="112" name="Loop (2)" width="90" x="246" y="120">
                <parameter key="set_iteration_macro" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="7.0.001" expanded="true" height="60" name="Extract Macro (3)" width="90" x="112" y="30">
                    <parameter key="macro" value="currentSegment"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="attribute_name" value="Segment"/>
                    <parameter key="example_index" value="1"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="94" name="Filter Examples (3)" width="90" x="380" y="255">
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="Segment.equals.%{currentSegment}"/>
                    </list>
                  </operator>
                  <operator activated="true" class="select" compatibility="7.0.001" expanded="true" height="60" name="Select" width="90" x="380" y="165">
                    <parameter key="index" value="%{iteration}"/>
                  </operator>
                  <operator activated="true" class="apply_model" compatibility="7.0.001" expanded="true" height="76" name="Apply Model (2)" width="90" x="648" y="165">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="false" breakpoints="before" class="parallel_decision_tree" compatibility="7.0.001" expanded="true" height="76" name="Decision Tree (2)" width="90" x="581" y="390"/>
                  <connect from_port="input 1" to_op="Extract Macro (3)" to_port="example set"/>
                  <connect from_port="input 2" to_op="Select" to_port="collection"/>
                  <connect from_port="input 3" to_op="Filter Examples (3)" to_port="example set input"/>
                  <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
                  <connect from_op="Select" from_port="selected" to_op="Apply Model (2)" to_port="model"/>
                  <connect from_op="Apply Model (2)" from_port="labelled data" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="108"/>
                  <portSpacing port="source_input 3" spacing="36"/>
                  <portSpacing port="source_input 4" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="append" compatibility="7.0.001" expanded="true" height="76" name="Append" width="90" x="380" y="120"/>
              <operator activated="true" class="performance_classification" compatibility="7.0.001" expanded="true" height="76" name="Performance" width="90" x="581" y="120">
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Loop (2)" to_port="input 2"/>
              <connect from_port="test set" to_op="Loop (2)" to_port="input 3"/>
              <connect from_op="Recall (2)" from_port="result" to_op="Loop (2)" to_port="input 1"/>
              <connect from_op="Loop (2)" from_port="output 1" to_op="Append" to_port="example set 1"/>
              <connect from_op="Append" from_port="merged set" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="90"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Deals" from_port="output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="clustered set" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Remove Duplicates" to_port="example set input"/>
          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Remember" to_port="store"/>
          <connect from_op="Validation" from_port="model" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="178" resized="true" width="263" x="228" y="289">Do the segmentation by hand (also possible via clustering)</description>
          <description align="center" color="yellow" colored="false" height="193" resized="true" width="323" x="620" y="272">Store an example per segment to keep track</description>
        </process>
      </operator>
    </process>

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • inceptorfullinceptorfull Member Posts: 44 Contributor II
    thanks a lot for reply and feedback, but it actually very complicated , I didnot understand any,
    do you think if I applied Neural network then KNN to get higher accuracy will be good? if so , how to put the two models so both enhance the results?
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    You might try the Stacking and Voting operators. Those meta operators are also in that area.

    And yes, Meta-Learning can help. I am not sure if your concrete idea helps, but i would give it a try.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • inceptorfullinceptorfull Member Posts: 44 Contributor II
    thanks a lot for keeping up, actually my last step in my analysis stop on that step,
    seems stacking seems may work but when I use both the KNN and NN in the base learner and use the NN in the stacking model learner it stops and give me the following error" binomial attribute is not supported"
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    try to use nominal to numerical before. and i would give you the tip to use breakpoints to see whats happening :)

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • inceptorfullinceptorfull Member Posts: 44 Contributor II
    Am really very gratefull for your concern and keeping up with me

    I tried it gives me another error in the apply model , " the input exampleset doesnot match the training exampleset, missing attribute: base prediction0=Good"

    I tried to make the stacking model learner Decision Tree, the percentage actually increse, but am not sure is it right or wrong, since I used base learner NN and KNN , I wish I could try it with the NN

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi,

    the trick is that you need to combine the models (preprocessing from nominal to numerical and the NN) using Group Models.

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.