need to check if the solution is correct (machine learning techniques with ensemble approach)

twilight_bayatwilight_baya Member Posts: 10 Contributor I
edited December 2018 in Help

Dear all,

 

I am a very beginner with rapidminer. My task is to perform three classification techniques (ANN, DT, and SVM). Then apply the ensemble technique on the three models together to improve accuracy. (i am supposed to get a final score in %)

 

I need someone to have a look at the solution and let me know if it is correct.

 

Thank you very much for your help.

Answers

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @twilight_baya

     

    It would be helpful for all community members if you could also post your dataset and the formulation of the problem itself. In other words, we'd need understanding of what kind of data you are working with and what exactly metric you need to predict.

  • twilight_bayatwilight_baya Member Posts: 10 Contributor I

    Thank you for your reply, kypexin.

     

    Here is my data set. I have one dependent variable YGPA (pass (1) or fail (0)). I have 16 independent variables including gender (male, female). X1 to X15.
    X1 is a score from 100. X2 - X15 is a score from 5.
    I would like to perform classification techniques for the given dataset (ANN, DT, and SVM). Then I would like to apply stacking ensemble method to the three techniques together.

     

    Thomas_Ott has kindly provided me with the XML codes. But i am not sure which parts of the coding i need to change in accordance to my data.

  • twilight_bayatwilight_baya Member Posts: 10 Contributor I

    The XML codes provided by Thomas_Ott are attached here.

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @twilight_baya

     

    Your attached process XML does not seem to be valid so I can't open it in RapidMiner; could you please copy XML source of the process directly from RapidMiner's 'XML' tab and post it here? Thanks.  

  • twilight_bayatwilight_baya Member Posts: 10 Contributor I

    I have received these codes from tthomas in a private message.

     

    I will copy and paste them here.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="8.1.001" expanded="true" height="68" name="Generate Data" width="90" x="112" y="34">
    <parameter key="target_function" value="random classification"/>
    <parameter key="number_examples" value="2000"/>
    <parameter key="number_of_attributes" value="15"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="Validation" width="90" x="380" y="34">
    <parameter key="sampling_type" value="stratified sampling"/>
    <process expanded="true">
    <operator activated="true" class="stacking" compatibility="8.1.001" expanded="true" height="68" name="Stacking" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="support_vector_machine" compatibility="8.1.001" expanded="true" height="124" name="SVM" width="90" x="112" y="34"/>
    <operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net" width="90" x="112" y="187">
    <list key="hidden_layers"/>
    </operator>
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="289"/>
    <connect from_port="training set 1" to_op="SVM" to_port="training set"/>
    <connect from_port="training set 2" to_op="Neural Net" to_port="training set"/>
    <connect from_port="training set 3" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="SVM" from_port="model" to_port="base model 1"/>
    <connect from_op="Neural Net" from_port="model" to_port="base model 2"/>
    <connect from_op="Decision Tree" from_port="model" to_port="base model 3"/>
    <portSpacing port="source_training set 1" spacing="0"/>
    <portSpacing port="source_training set 2" spacing="0"/>
    <portSpacing port="source_training set 3" spacing="0"/>
    <portSpacing port="source_training set 4" spacing="0"/>
    <portSpacing port="sink_base model 1" spacing="0"/>
    <portSpacing port="sink_base model 2" spacing="0"/>
    <portSpacing port="sink_base model 3" spacing="0"/>
    <portSpacing port="sink_base model 4" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="naive_bayes" compatibility="8.1.001" expanded="true" height="82" name="Naive Bayes" width="90" x="179" y="34"/>
    <connect from_port="stacking examples" to_op="Naive Bayes" to_port="training set"/>
    <connect from_op="Naive Bayes" from_port="model" to_port="stacking model"/>
    <portSpacing port="source_stacking examples" spacing="0"/>
    <portSpacing port="sink_stacking model" spacing="0"/>
    </process>
    </operator>
    <connect from_port="training set" to_op="Stacking" to_port="training set"/>
    <connect from_op="Stacking" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="8.1.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <connect from_op="Performance" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    <description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="137">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
    </process>
    <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a decision tree model.</description>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="example set"/>
    <connect from_op="Validation" from_port="model" to_port="result 1"/>
    <connect from_op="Validation" from_port="performance 1" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @twilight_baya

     

    I have modified the procees for you and built all-in-one process with the following logic: 

     

    • Process reads your .csv file from disk, you have to specify path in 'Retrieve' operator.
    • Nominal attribute converted to numerical, in order to be able to use SVM and Neural Net, which are capable of handling numerica attributes only.
    • Then 4 cross validation operators train the models (SVM, NN, DT and ensemble) on the same copy of data.
    • Performance converted into data, after some transformation a nicely looking table is output, where you can compare each pereformance metric by different classification algorithm used. 
    • In result tab you can also visually examine ROC curves for each algorithm separately.

     

    Screenshot 2018-04-16 17.03.29.png

    Screenshot 2018-04-16 17.04.59.png

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="/Users/kypexin/Downloads/MYDATA.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="UTF-8"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="YGPA.true.binominal.label"/>
    <parameter key="1" value="Gender.true.polynominal.attribute"/>
    <parameter key="2" value="X1.true.integer.attribute"/>
    <parameter key="3" value="X2.true.real.attribute"/>
    <parameter key="4" value="X3.true.real.attribute"/>
    <parameter key="5" value="X4.true.real.attribute"/>
    <parameter key="6" value="X5.true.real.attribute"/>
    <parameter key="7" value="X6.true.real.attribute"/>
    <parameter key="8" value="X7.true.real.attribute"/>
    <parameter key="9" value="X8.true.real.attribute"/>
    <parameter key="10" value="X9.true.real.attribute"/>
    <parameter key="11" value="X10.true.real.attribute"/>
    <parameter key="12" value="X11.true.real.attribute"/>
    <parameter key="13" value="X12.true.real.attribute"/>
    <parameter key="14" value="X13.true.real.attribute"/>
    <parameter key="15" value="X14.true.real.attribute"/>
    <parameter key="16" value="X15.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="nominal_to_numerical" compatibility="8.1.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="45" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Gender"/>
    <list key="comparison_groups"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="145" name="Multiply" width="90" x="179" y="136"/>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="x-Val SVM" width="90" x="313" y="34">
    <parameter key="sampling_type" value="stratified sampling"/>
    <process expanded="true">
    <operator activated="true" class="support_vector_machine" compatibility="8.1.001" expanded="true" height="124" name="SVM (2)" width="90" x="112" y="34"/>
    <connect from_port="training set" to_op="SVM (2)" to_port="training set"/>
    <connect from_op="SVM (2)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="8.1.001" expanded="true" height="82" name="performance SVM" width="90" x="179" y="34"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="performance SVM" to_port="labelled data"/>
    <connect from_op="performance SVM" from_port="performance" to_port="performance 1"/>
    <connect from_op="performance SVM" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    <description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="137">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
    </process>
    </operator>
    <operator activated="true" class="performance_to_data" compatibility="8.1.001" expanded="true" height="82" name="Performance to Data" width="90" x="447" y="85"/>
    <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="transform data" width="90" x="581" y="34">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Value|Criterion"/>
    </operator>
    <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename" width="90" x="179" y="34">
    <parameter key="old_name" value="Value"/>
    <parameter key="new_name" value="SVM"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="transpose" compatibility="8.1.001" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
    <operator activated="true" class="rename_by_example_values" compatibility="8.1.001" expanded="true" height="82" name="Rename by Example Values" width="90" x="447" y="34"/>
    <connect from_port="in 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
    <connect from_op="Rename" from_port="example set output" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
    <connect from_op="Rename by Example Values" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="x-val NN" width="90" x="313" y="187">
    <parameter key="sampling_type" value="stratified sampling"/>
    <process expanded="true">
    <operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net (4)" width="90" x="179" y="34">
    <list key="hidden_layers"/>
    </operator>
    <connect from_port="training set" to_op="Neural Net (4)" to_port="training set"/>
    <connect from_op="Neural Net (4)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="8.1.001" expanded="true" height="82" name="performance NN" width="90" x="179" y="34"/>
    <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_op="performance NN" to_port="labelled data"/>
    <connect from_op="performance NN" from_port="performance" to_port="performance 1"/>
    <connect from_op="performance NN" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    <description align="left" color="blue" colored="true" height="103" resized="false" width="315" x="38" y="137">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
    </process>
    </operator>
    <operator activated="true" class="performance_to_data" compatibility="8.1.001" expanded="true" height="82" name="Performance to Data (2)" width="90" x="447" y="187"/>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="x-val DT" width="90" x="313" y="340">
    <parameter key="sampling_type" value="stratified sampling"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree (6)" width="90" x="112" y="34"/>
    <connect from_port="training set" to_op="Decision Tree (6)" to_port="training set"/>
    <connect from_op="Decision Tree (6)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="8.1.001" expanded="true" height="82" name="performance DT" width="90" x="179" y="34"/>
    <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (3)" from_port="labelled data" to_op="performance DT" to_port="labelled data"/>
    <connect from_op="performance DT" from_port="performance" to_port="performance 1"/>
    <connect from_op="performance DT" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    <description align="left" color="blue" colored="true" height="103" resized="false" width="315" x="38" y="137">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
    </process>
    </operator>
    <operator activated="true" class="performance_to_data" compatibility="8.1.001" expanded="true" height="82" name="Performance to Data (3)" width="90" x="447" y="340"/>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="x-val stacking" width="90" x="313" y="493">
    <parameter key="sampling_type" value="stratified sampling"/>
    <process expanded="true">
    <operator activated="true" class="stacking" compatibility="8.1.001" expanded="true" height="68" name="Stacking (4)" width="90" x="246" y="34">
    <process expanded="true">
    <operator activated="true" class="support_vector_machine" compatibility="8.1.001" expanded="true" height="124" name="SVM (7)" width="90" x="112" y="34"/>
    <operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net (7)" width="90" x="112" y="187">
    <list key="hidden_layers"/>
    </operator>
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree (7)" width="90" x="112" y="289"/>
    <connect from_port="training set 1" to_op="SVM (7)" to_port="training set"/>
    <connect from_port="training set 2" to_op="Neural Net (7)" to_port="training set"/>
    <connect from_port="training set 3" to_op="Decision Tree (7)" to_port="training set"/>
    <connect from_op="SVM (7)" from_port="model" to_port="base model 1"/>
    <connect from_op="Neural Net (7)" from_port="model" to_port="base model 2"/>
    <connect from_op="Decision Tree (7)" from_port="model" to_port="base model 3"/>
    <portSpacing port="source_training set 1" spacing="0"/>
    <portSpacing port="source_training set 2" spacing="0"/>
    <portSpacing port="source_training set 3" spacing="0"/>
    <portSpacing port="source_training set 4" spacing="0"/>
    <portSpacing port="sink_base model 1" spacing="0"/>
    <portSpacing port="sink_base model 2" spacing="0"/>
    <portSpacing port="sink_base model 3" spacing="0"/>
    <portSpacing port="sink_base model 4" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="naive_bayes" compatibility="8.1.001" expanded="true" height="82" name="Naive Bayes (4)" width="90" x="179" y="34"/>
    <connect from_port="stacking examples" to_op="Naive Bayes (4)" to_port="training set"/>
    <connect from_op="Naive Bayes (4)" from_port="model" to_port="stacking model"/>
    <portSpacing port="source_stacking examples" spacing="0"/>
    <portSpacing port="sink_stacking model" spacing="0"/>
    </process>
    </operator>
    <connect from_port="training set" to_op="Stacking (4)" to_port="training set"/>
    <connect from_op="Stacking (4)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model (4)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="8.1.001" expanded="true" height="82" name="performance stacking" width="90" x="179" y="34"/>
    <connect from_port="model" to_op="Apply Model (4)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (4)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (4)" from_port="labelled data" to_op="performance stacking" to_port="labelled data"/>
    <connect from_op="performance stacking" from_port="performance" to_port="performance 1"/>
    <connect from_op="performance stacking" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    <description align="left" color="blue" colored="true" height="103" resized="false" width="315" x="38" y="137">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
    </process>
    </operator>
    <operator activated="true" class="performance_to_data" compatibility="8.1.001" expanded="true" height="82" name="Performance to Data (4)" width="90" x="447" y="493"/>
    <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="transform data (2)" width="90" x="581" y="136">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Value|Criterion"/>
    </operator>
    <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename (2)" width="90" x="179" y="34">
    <parameter key="old_name" value="Value"/>
    <parameter key="new_name" value="NeuralNet"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="transpose" compatibility="8.1.001" expanded="true" height="82" name="Transpose (2)" width="90" x="313" y="85"/>
    <operator activated="true" class="rename_by_example_values" compatibility="8.1.001" expanded="true" height="82" name="Rename by Example Values (2)" width="90" x="447" y="85"/>
    <connect from_port="in 1" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
    <connect from_op="Rename (2)" from_port="example set output" to_op="Transpose (2)" to_port="example set input"/>
    <connect from_op="Transpose (2)" from_port="example set output" to_op="Rename by Example Values (2)" to_port="example set input"/>
    <connect from_op="Rename by Example Values (2)" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="transform data (3)" width="90" x="581" y="289">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Value|Criterion"/>
    </operator>
    <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename (3)" width="90" x="179" y="34">
    <parameter key="old_name" value="Value"/>
    <parameter key="new_name" value="DecisionTree"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="transpose" compatibility="8.1.001" expanded="true" height="82" name="Transpose (3)" width="90" x="313" y="85"/>
    <operator activated="true" class="rename_by_example_values" compatibility="8.1.001" expanded="true" height="82" name="Rename by Example Values (3)" width="90" x="447" y="85"/>
    <connect from_port="in 1" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Rename (3)" to_port="example set input"/>
    <connect from_op="Rename (3)" from_port="example set output" to_op="Transpose (3)" to_port="example set input"/>
    <connect from_op="Transpose (3)" from_port="example set output" to_op="Rename by Example Values (3)" to_port="example set input"/>
    <connect from_op="Rename by Example Values (3)" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="transform data (4)" width="90" x="581" y="442">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (4)" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Value|Criterion"/>
    </operator>
    <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename (4)" width="90" x="179" y="34">
    <parameter key="old_name" value="Value"/>
    <parameter key="new_name" value="Ensemble"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="transpose" compatibility="8.1.001" expanded="true" height="82" name="Transpose (4)" width="90" x="313" y="85"/>
    <operator activated="true" class="rename_by_example_values" compatibility="8.1.001" expanded="true" height="82" name="Rename by Example Values (4)" width="90" x="447" y="85"/>
    <connect from_port="in 1" to_op="Select Attributes (4)" to_port="example set input"/>
    <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Rename (4)" to_port="example set input"/>
    <connect from_op="Rename (4)" from_port="example set output" to_op="Transpose (4)" to_port="example set input"/>
    <connect from_op="Transpose (4)" from_port="example set output" to_op="Rename by Example Values (4)" to_port="example set input"/>
    <connect from_op="Rename by Example Values (4)" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.001" expanded="true" height="145" name="Append" width="90" x="782" y="391"/>
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (5)" width="90" x="916" y="442">
    <parameter key="attribute_filter_type" value="subset"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="x-Val SVM" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 2" to_op="x-val NN" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 3" to_op="x-val DT" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 4" to_op="x-val stacking" to_port="example set"/>
    <connect from_op="x-Val SVM" from_port="performance 1" to_op="Performance to Data" to_port="performance vector"/>
    <connect from_op="Performance to Data" from_port="example set" to_op="transform data" to_port="in 1"/>
    <connect from_op="Performance to Data" from_port="performance vector" to_port="result 1"/>
    <connect from_op="transform data" from_port="out 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="x-val NN" from_port="performance 1" to_op="Performance to Data (2)" to_port="performance vector"/>
    <connect from_op="Performance to Data (2)" from_port="example set" to_op="transform data (2)" to_port="in 1"/>
    <connect from_op="Performance to Data (2)" from_port="performance vector" to_port="result 2"/>
    <connect from_op="x-val DT" from_port="performance 1" to_op="Performance to Data (3)" to_port="performance vector"/>
    <connect from_op="Performance to Data (3)" from_port="example set" to_op="transform data (3)" to_port="in 1"/>
    <connect from_op="Performance to Data (3)" from_port="performance vector" to_port="result 3"/>
    <connect from_op="x-val stacking" from_port="performance 1" to_op="Performance to Data (4)" to_port="performance vector"/>
    <connect from_op="Performance to Data (4)" from_port="example set" to_op="transform data (4)" to_port="in 1"/>
    <connect from_op="Performance to Data (4)" from_port="performance vector" to_port="result 4"/>
    <connect from_op="transform data (2)" from_port="out 1" to_op="Append" to_port="example set 2"/>
    <connect from_op="transform data (3)" from_port="out 1" to_op="Append" to_port="example set 3"/>
    <connect from_op="transform data (4)" from_port="out 1" to_op="Append" to_port="example set 4"/>
    <connect from_op="Append" from_port="merged set" to_op="Select Attributes (5)" to_port="example set input"/>
    <connect from_op="Select Attributes (5)" from_port="example set output" to_port="result 5"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="147"/>
    <portSpacing port="sink_result 3" spacing="147"/>
    <portSpacing port="sink_result 4" spacing="147"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    <portSpacing port="sink_result 6" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    So I took a look at @twilight_baya's homework and find it to be wrong in sense that I would not analyze it that way. @kypexin's approach is the correct way IMHO. He' uses CV for each algorithm and them makes an overall 'performance' result.

     

    The process I shared with you in private is just a simple esemble application within a cross validation. You just used that and didn't even ask "what's this Cross Validation" about? Why did you use Vote? Voting and Stacking apply the data in an ensembling environment differently. 

     

    My first reaction to the ANN model is that it's overfitting, the DT *might be overfitting* because the recall for Atrisk is pisspor, and SVM is god awful. The worst mistake a new machine learning practictioner can make, IMHO, is to solely rely on the accuracy results. I always evaluate Area Under Curve (if two classes), accuracy, and precision/recall at a minimum. 

     

    My two cents. 

  • twilight_bayatwilight_baya Member Posts: 10 Contributor I

    Thank you very much, @kypexin for your help. I will try out your codes and let you know.

     

     

  • twilight_bayatwilight_baya Member Posts: 10 Contributor I

    Thank you @Thomas_Ott

     

    Well, I needed much help with the codes i was embarrased to ask you again :-)

    I know what cross-validation is but I couldn’t apply the codes to my data.I wanted to use stacking as it is proved to be best among other ensemble techniques.

    Thank you for all the other comments. I am taking them into consideration.

    Best regards.

     

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @twilight_baya

     

    You are welcome. I

    don't know what exactly issue you experienced with your certain dataset but I can guess that most likely the issue could be with NN and SVM algorithms as they cannot handle polynomial attributes (gender in your case), so if applied straight on the dataset you would get an error; this is why I used 'nominal to numerical' transformation on 'gender' attribute.

Sign In or Register to comment.