How to calculate the performance (MAE, RMSE, NMAE) of Decision tree

shannoncates_teshannoncates_te Member Posts: 4 Contributor I
edited December 2018 in Help

Hi guys :) I'm a newbie here and I really need your help for my thesis. Our team is using a hybrid kind algorithm (Decision tree + item k-nn) that produces a recommender system. 

So this is our process, it worked actually but the main problem is the result it only shows the ranking we want to know the performance result (MAE, RMSE, and NMAE) of the recommender system. Hope you can help me! thanks :)ranking result.pngquestion 1.png

 

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @shannoncates_te

     

    I will try to give you some response elements : 

    1. First to be sure : , the scoring metrics MAE, RMSE,NMAE are associated to the performance of a regression model (the predicted value is continuous).

    In your cases, it seems that you are on a classification task (recommendation system) and you are using classification algorithm (k-NN / decision tree) and recommendation system. So to measure the performance of your models, you need to calculate the accuracy (ratio right predictions / total predictions), the recall, the precision and other one which are proposed by Rapidminer.

    If you are on a regression task don't consider this paragraph 1.

     

    2.To measure the performance of your model, you can perform a cross validation with the cross validation operator associated to the performance (regression) operator : In the parameters of this last operator, you have to check the score metrics you want to calculate.

    Regression_lineaire.png

    Here you can find a simply process with a decision tree model inside a cross validation operator (to adapt to your own models/process): 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.0.001" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34"/>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Golf-Testset" width="90" x="45" y="238">
    <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="447" y="238">
    <list key="application_parameters"/>
    </operator>
    <connect from_op="Retrieve Golf" from_port="output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_op="Cross Validation" from_port="example set" to_port="result 2"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
    <connect from_op="Retrieve Golf-Testset" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 3"/>
    <connect from_op="Apply Model (2)" from_port="model" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

    NB : you have to replace performance (classification) operator by the performance (regression) operator in the right side

    of the cross-validation subprocess (testing part).

     

    NB2 : In your first screenshot, i don't  see any "prediction column".(results of apply model operator)

     

    I hope this wil be helpful,

     

    Regards,

     

    Lionel

     

Sign In or Register to comment.