implement this algorithm in rapidminer

MelodyMelody Member Posts: 9 Contributor I
edited December 2018 in Help

Hi, I want to implement an algorithm in the RapidMiner like this, but I do not know how? please guide me

 

Untitled.png

Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Based on your graph you will need a Read opeator to load in your data, a Set Role operator to set your label, then a Sample operator, a Cross Validation(CV) operator, and a Stacking operator on the training side of CV operator. You embed the different machine learners in the Stacking operator. 

  • MelodyMelody Member Posts: 9 Contributor I

    Hi, Thank you for your reply.
    For the sampler operator, should I use the bootstrap operator or bagging?


    This error occurred for the operation I used. What is this error?What should I do?

     

    p1.png

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Well that depends on what you want to do with sampling as you balance your classes. Is it better to bootstrap (aka unsample) or downsample? Have you considered weighting them using a Generate Weight (stratification)?

     

    Your other error means that you can't deliver and example set (EXA) from that operator, rather you need an operator that delivers a model (MOD). Something like a Naive Bayes or Decision Tree, etc

  • MelodyMelody Member Posts: 9 Contributor I

    I want to use an optimal model to achieve higher ranking accuracy in unbalanced data in an ensemble algorithm by combining two ensemble bagging and boosting and using a genetic programming model as a learning algorithm for classifying unbalanced data.If I just to use bagging for sampling and give data for training in boosting. It makes a better model by weight.

    I want to use genetic programming to improve this model.How do you think I can make this model? Is this idea feasible?

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yup, you can do that in RapidMiner. Post your process when you're ready and we can troubleshoot. 

  • MelodyMelody Member Posts: 9 Contributor I

    Thank you,

    Post my process here or email you?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Please post it to the thread, thanks. 

  • MelodyMelody Member Posts: 9 Contributor I

    Hi, Mr. Ott.

    Is my processing correct?
    Do you think this complies with the model I explained?
    Is sampling done in the same way?
    How can the minority class (positive) specifically weigh more to see more in the prediction?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    See I'm guess that the positive class is the minority class. I would handle it by overweighting the minority class and underweigthing the majority class. Something like this.

     

    Then i would use a Cross Validation (not Split Validation) in the Optimize Weights. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="false" class="read_excel" compatibility="7.5.003" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="D:\thesis96\dataset\Main.DataSet\Glass2.xlsx"/>
    <parameter key="imported_cell_range" value="A1:J215"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value=" RI.true.real.attribute"/>
    <parameter key="1" value=" Na.true.real.attribute"/>
    <parameter key="2" value=" Mg.true.real.attribute"/>
    <parameter key="3" value=" Al.true.real.attribute"/>
    <parameter key="4" value=" Si.true.real.attribute"/>
    <parameter key="5" value=" K.true.real.attribute"/>
    <parameter key="6" value=" Ca.true.real.attribute"/>
    <parameter key="7" value=" Ba.true.real.attribute"/>
    <parameter key="8" value=" Fe.true.real.attribute"/>
    <parameter key="9" value="class.true.nominal.label"/>
    </list>
    </operator>
    <operator activated="false" class="bagging" compatibility="7.5.003" expanded="true" height="82" name="Bagging" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.5.003" expanded="true" height="82" name="Decision Tree" width="90" x="246" y="34"/>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    </process>
    </operator>
    <operator activated="false" class="sample_model_based" compatibility="7.5.003" expanded="true" height="82" name="Sample (Model-Based)" width="90" x="313" y="34"/>
    <operator activated="true" class="retrieve" compatibility="7.5.003" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="187">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" breakpoints="after" class="generate_weight_stratification" compatibility="7.5.003" expanded="true" height="82" name="Generate Weight (Stratification)" width="90" x="246" y="187"/>
    <operator activated="true" class="optimize_weights_evolutionary" compatibility="7.5.003" expanded="true" height="103" name="Optimize Weights (2)" width="90" x="514" y="34">
    <parameter key="population_size" value="100"/>
    <parameter key="maximum_number_of_generations" value="40"/>
    <parameter key="use_early_stopping" value="true"/>
    <parameter key="show_population_plotter" value="true"/>
    <parameter key="selection_scheme" value="roulette wheel"/>
    <parameter key="p_crossover" value="0.2"/>
    <parameter key="crossover_type" value="shuffle"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.003" expanded="true" height="145" name="Cross Validation" width="90" x="447" y="34">
    <parameter key="number_of_folds" value="3"/>
    <process expanded="true">
    <operator activated="true" class="adaboost" compatibility="7.5.003" expanded="true" height="82" name="AdaBoost (2)" width="90" x="112" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.5.003" expanded="true" height="82" name="Decision Tree (3)" width="90" x="246" y="34"/>
    <connect from_port="training set" to_op="Decision Tree (3)" to_port="training set"/>
    <connect from_op="Decision Tree (3)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    </process>
    </operator>
    <connect from_port="training set" to_op="AdaBoost (2)" to_port="training set"/>
    <connect from_op="AdaBoost (2)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.5.003" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="7.5.003" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34">
    <parameter key="classification_error" value="true"/>
    <parameter key="weighted_mean_recall" value="true"/>
    <parameter key="weighted_mean_precision" value="true"/>
    <parameter key="root_mean_squared_error" value="true"/>
    <parameter key="root_relative_squared_error" value="true"/>
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
    <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_port="example set" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Bagging" to_port="training set"/>
    <connect from_op="Bagging" from_port="model" to_op="Sample (Model-Based)" to_port="model"/>
    <connect from_op="Bagging" from_port="example set" to_op="Sample (Model-Based)" to_port="example set input"/>
    <connect from_op="Retrieve Golf" from_port="output" to_op="Generate Weight (Stratification)" to_port="example set input"/>
    <connect from_op="Generate Weight (Stratification)" from_port="example set output" to_op="Optimize Weights (2)" to_port="example set in"/>
    <connect from_op="Optimize Weights (2)" from_port="example set out" to_port="result 1"/>
    <connect from_op="Optimize Weights (2)" from_port="weights" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
  • MelodyMelody Member Posts: 9 Contributor I

    I used cross vallidation, but why is the number of error predictions in the confusion matrix not equal to the number of displayed errors of optimize weight and wrong prediction negative and positive? Or am I wrong?

    Not compatible with the confusion matrix for visualization.How to be corrected?

    1.png2.png

     

    How can I get the tree out of this output process?

     

Sign In or Register to comment.