How to combine several Data Mining-Algorithms and get one final result

johnson66 · July 2011

Hi all,

i have two datasets with a label that contains two different values (Fraud=yes or Fraud=no).
I recognized that one algorithmn is better for fraud detection "yes" and the other one for fraud detection "no". So I want to know, how to combine both result sets / algorithmn to get an better result. I will aggregate the results and got one confusion matrix. Which operator can i use for this task?

Many thanks for your help!

Regard,
Johnson

homburg · August 2011

Hi johnson66.

Please take a look at this process. It might help you to solve your problem.

<process version="5.1.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="317" width="1686">
      <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="5.1.006" expanded="true" height="76" name="Generate ID" width="90" x="179" y="75"/>
      <operator activated="true" class="multiply" compatibility="5.1.006" expanded="true" height="94" name="Multiply" width="90" x="313" y="75"/>
      <operator activated="true" class="naive_bayes" compatibility="5.1.006" expanded="true" height="76" name="Naive Bayes" width="90" x="447" y="120"/>
      <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model (2)" width="90" x="581" y="120">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="decision_tree" compatibility="5.1.006" expanded="true" height="76" name="Decision Tree" width="90" x="447" y="30"/>
      <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="581" y="30">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Yes Filter" width="90" x="715" y="30">
        <parameter key="condition_class" value="attribute_value_filter"/>
        <parameter key="parameter_string" value="prediction(Play)=yes"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.1.006" expanded="true" height="94" name="Multiply (2)" width="90" x="849" y="30"/>
      <operator activated="true" class="set_minus" compatibility="5.1.006" expanded="true" height="76" name="No &amp; default" width="90" x="983" y="120"/>
      <operator activated="true" class="append" compatibility="5.1.006" expanded="true" height="94" name="Merge" width="90" x="1117" y="30"/>
      <operator activated="true" class="sort" compatibility="5.1.006" expanded="true" height="76" name="Sort" width="90" x="1251" y="30">
        <parameter key="attribute_name" value="id"/>
      </operator>
      <operator activated="true" class="performance_classification" compatibility="5.1.006" expanded="true" height="76" name="Performance" width="90" x="1385" y="30">
        <list key="class_weights"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Naive Bayes" to_port="training set"/>
      <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Naive Bayes" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="No &amp; default" to_port="example set input"/>
      <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Yes Filter" to_port="example set input"/>
      <connect from_op="Yes Filter" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="Merge" to_port="example set 1"/>
      <connect from_op="Multiply (2)" from_port="output 2" to_op="No &amp; default" to_port="subtrahend"/>
      <connect from_op="No &amp; default" from_port="example set output" to_op="Merge" to_port="example set 2"/>
      <connect from_op="Merge" from_port="merged set" to_op="Sort" to_port="example set input"/>
      <connect from_op="Sort" from_port="example set output" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Best regards
Helge

Jolly · April 2015

Hello,

I want to combine the output (confusion matrix) of two process using naive Bayes combiner method and generate a resultant confusion matrix.
PLease help me

MartinLiebig · April 2015

Hi Jolly,

it does not seem that obvious to me, what you want to do. A naive bayes runs on indivudual observations, the confusion matrix however is aggregated over all observations. Furthermore the confusion matrix uses information about the label so this would not be possible to be applied on non-labeled data. If you really need that, you can use performance to data to get the information into an example set.

Are you sure you dont want to use a naive bayes on the confidences produced by two different learners? That is possible. A straight forward way of just averaging the confidences would be to use the Vote operator. If you need to use a naive bayes on the confidences, you need to use a split inside your cross validation. If needed i can provide an example process.

Cheers,
Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How to combine several Data Mining-Algorithms and get one final result

Answers