How to combine several Data Mining-Algorithms and get one final result

johnson66johnson66 Member Posts: 1 Contributor I
edited November 2018 in Help
Hi all,

i have two datasets with a label that contains two different values (Fraud=yes or Fraud=no).
I recognized that one algorithmn is better for fraud detection "yes" and the other one for fraud detection "no". So I want to know, how to combine both result sets / algorithmn to get an better result. I will aggregate the results and got one confusion matrix. Which operator can i use for this task?

Many thanks for your help!



  • Options
    homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi johnson66.

    Please take a look at this process. It might help you to solve your problem.
    <process version="5.1.006">
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="317" width="1686">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          <operator activated="true" class="generate_id" compatibility="5.1.006" expanded="true" height="76" name="Generate ID" width="90" x="179" y="75"/>
          <operator activated="true" class="multiply" compatibility="5.1.006" expanded="true" height="94" name="Multiply" width="90" x="313" y="75"/>
          <operator activated="true" class="naive_bayes" compatibility="5.1.006" expanded="true" height="76" name="Naive Bayes" width="90" x="447" y="120"/>
          <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model (2)" width="90" x="581" y="120">
            <list key="application_parameters"/>
          <operator activated="true" class="decision_tree" compatibility="5.1.006" expanded="true" height="76" name="Decision Tree" width="90" x="447" y="30"/>
          <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="581" y="30">
            <list key="application_parameters"/>
          <operator activated="true" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Yes Filter" width="90" x="715" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="prediction(Play)=yes"/>
          <operator activated="true" class="multiply" compatibility="5.1.006" expanded="true" height="94" name="Multiply (2)" width="90" x="849" y="30"/>
          <operator activated="true" class="set_minus" compatibility="5.1.006" expanded="true" height="76" name="No &amp; default" width="90" x="983" y="120"/>
          <operator activated="true" class="append" compatibility="5.1.006" expanded="true" height="94" name="Merge" width="90" x="1117" y="30"/>
          <operator activated="true" class="sort" compatibility="5.1.006" expanded="true" height="76" name="Sort" width="90" x="1251" y="30">
            <parameter key="attribute_name" value="id"/>
          <operator activated="true" class="performance_classification" compatibility="5.1.006" expanded="true" height="76" name="Performance" width="90" x="1385" y="30">
            <list key="class_weights"/>
          <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Naive Bayes" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="No &amp; default" to_port="example set input"/>
          <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Yes Filter" to_port="example set input"/>
          <connect from_op="Yes Filter" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Merge" to_port="example set 1"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="No &amp; default" to_port="subtrahend"/>
          <connect from_op="No &amp; default" from_port="example set output" to_op="Merge" to_port="example set 2"/>
          <connect from_op="Merge" from_port="merged set" to_op="Sort" to_port="example set input"/>
          <connect from_op="Sort" from_port="example set output" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
    Best regards
  • Options
    JollyJolly Member Posts: 11 Contributor II

            I want to combine the output (confusion matrix) of two process using naive Bayes combiner method and generate a resultant confusion matrix.
    PLease help me
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Hi Jolly,

    it does not seem that obvious to me, what you want to do. A naive bayes runs on indivudual observations, the confusion matrix however is aggregated over all observations. Furthermore the confusion matrix uses information about the label so this would not be possible to be applied on non-labeled data. If you really need that, you can use performance to data to get the information into an example set.

    Are you sure you dont want to use a naive bayes on the confidences produced by two different learners? That is possible. A straight forward way of just averaging the confidences would be to use the Vote operator. If you need to use a naive bayes on the confidences, you need to use a split inside your cross validation. If needed i can provide an example process.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.