Comparing roc - auc

dsideratosdsideratos Member Posts: 3 Contributor I
edited June 2020 in Help

Hello,

I am in trouble. I have already made the procces producing a ROC curve as it is described above. My problem is that I would like to compare ROC curves that come from diferent machine learning algorithms. In other words I would like to compare all the ROCs in one graph.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="false" class="performance_classification" compatibility="7.5.001" expanded="true" height="82" name="Performance (3)" width="90" x="849" y="544">
<list key="class_weights"/>
</operator>
<operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve acutal training binominal-real data" width="90" x="45" y="85">
<parameter key="repository_entry" value="data/acutal training binominal-real data"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.5.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="85">
<parameter key="attribute_name" value="Total definition"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="false" class="h2o:logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression (3)" width="90" x="313" y="85"/>
<operator activated="false" class="naive_bayes" compatibility="7.5.001" expanded="true" height="82" name="Naive Bayes (3)" width="90" x="246" y="595"/>
<operator activated="true" class="neural_net" compatibility="7.5.001" expanded="true" height="82" name="Neural Net (3)" width="90" x="380" y="493">
<list key="hidden_layers"/>
</operator>
<operator activated="false" class="concurrency:parallel_random_forest" compatibility="7.5.001" expanded="true" height="82" name="Random Forest (2)" width="90" x="514" y="493">
<parameter key="number_of_trees" value="15"/>
<parameter key="criterion" value="information_gain"/>
<parameter key="minimal_gain" value="0.0"/>
</operator>
<operator activated="false" class="concurrency:parallel_decision_tree" compatibility="7.5.001" expanded="true" height="82" name="Decision Tree (3)" width="90" x="514" y="391">
<parameter key="criterion" value="information_gain"/>
<parameter key="minimal_gain" value="0.0"/>
</operator>
<operator activated="false" class="k_nn" compatibility="7.5.001" expanded="true" height="82" name="k-NN (2)" width="90" x="246" y="391">
<parameter key="k" value="8"/>
<parameter key="measure_types" value="NumericalMeasures"/>
<parameter key="numerical_measure" value="CamberraDistance"/>
<parameter key="kernel_type" value="polynomial"/>
</operator>
<operator activated="false" class="concurrency:cross_validation" compatibility="7.5.001" expanded="true" height="145" name="Cross Validation" width="90" x="715" y="442">
<parameter key="sampling_type" value="stratified sampling"/>
<process expanded="true">
<operator activated="false" class="support_vector_machine_libsvm" compatibility="7.5.001" expanded="true" height="82" name="SVM" width="90" x="179" y="34">
<parameter key="kernel_type" value="linear"/>
<list key="class_weights"/>
<parameter key="calculate_confidences" value="true"/>
</operator>
<operator activated="false" class="h2o:deep_learning" compatibility="7.5.000" expanded="true" height="82" name="Deep Learning" width="90" x="45" y="340">
<enumeration key="hidden_layer_sizes">
<parameter key="hidden_layer_sizes" value="50"/>
<parameter key="hidden_layer_sizes" value="50"/>
</enumeration>
<enumeration key="hidden_dropout_ratios"/>
<list key="expert_parameters"/>
<list key="expert_parameters_"/>
</operator>
<operator activated="true" class="h2o:logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression" width="90" x="179" y="238">
<parameter key="solver" value="IRLSM"/>
<parameter key="missing_values_handling" value="Skip"/>
</operator>
<operator activated="false" class="concurrency:parallel_decision_tree" compatibility="7.5.001" expanded="true" height="82" name="Decision Tree" width="90" x="313" y="238">
<parameter key="criterion" value="gini_index"/>
<parameter key="minimal_gain" value="0.0"/>
</operator>
<operator activated="false" class="neural_net" compatibility="7.5.001" expanded="true" height="82" name="Neural Net" width="90" x="179" y="340">
<list key="hidden_layers"/>
</operator>
<operator activated="false" class="naive_bayes" compatibility="7.5.001" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="442"/>
<operator activated="false" class="k_nn" compatibility="7.5.001" expanded="true" height="82" name="k-NN" width="90" x="45" y="238">
<parameter key="k" value="15"/>
<parameter key="measure_types" value="NumericalMeasures"/>
</operator>
<operator activated="false" class="concurrency:parallel_random_forest" compatibility="7.5.001" expanded="true" height="82" name="Random Forest" width="90" x="313" y="340"/>
<operator activated="false" class="random_tree" compatibility="7.5.001" expanded="true" height="82" name="Random Tree" width="90" x="313" y="442"/>
<connect from_port="training set" to_op="Logistic Regression" to_port="training set"/>
<connect from_op="Logistic Regression" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_binominal_classification" compatibility="7.5.001" expanded="true" height="82" name="Performance (4)" width="90" x="313" y="34">
<parameter key="main_criterion" value="AUC"/>
<parameter key="AUC" value="true"/>
<parameter key="skip_undefined_labels" value="false"/>
</operator>
<operator activated="false" class="performance_support_vector_count" compatibility="7.5.001" expanded="true" height="82" name="Performance (2)" width="90" x="313" y="187"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (4)" to_port="labelled data"/>
<connect from_op="Performance (4)" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve acutal testing binominal real data" width="90" x="45" y="238">
<parameter key="repository_entry" value="data/acutal testing binominal real data"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.5.001" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="238">
<parameter key="attribute_name" value="Total definition"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="false" class="support_vector_machine_libsvm" compatibility="7.5.001" expanded="true" height="82" name="SVM (2)" width="90" x="380" y="595">
<parameter key="kernel_type" value="linear"/>
<list key="class_weights"/>
<parameter key="calculate_confidences" value="true"/>
</operator>
<operator activated="false" class="h2o:deep_learning" compatibility="7.5.000" expanded="true" height="82" name="Deep Learning (2)" width="90" x="246" y="493">
<enumeration key="hidden_layer_sizes">
<parameter key="hidden_layer_sizes" value="50"/>
<parameter key="hidden_layer_sizes" value="50"/>
</enumeration>
<enumeration key="hidden_dropout_ratios"/>
<list key="expert_parameters"/>
<list key="expert_parameters_"/>
</operator>
<operator activated="false" class="random_tree" compatibility="7.5.001" expanded="true" height="82" name="Random Tree (2)" width="90" x="514" y="595"/>
<operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="514" y="187">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_binominal_classification" compatibility="7.5.001" expanded="true" height="82" name="Performance" width="90" x="648" y="85">
<parameter key="main_criterion" value="AUC"/>
<parameter key="AUC" value="true"/>
<parameter key="skip_undefined_labels" value="false"/>
</operator>
<connect from_op="Retrieve acutal training binominal-real data" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Neural Net (3)" to_port="training set"/>
<connect from_op="Neural Net (3)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Retrieve acutal testing binominal real data" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model (2)" from_port="model" to_port="result 1"/>
<connect from_op="Performance" from_port="performance" to_port="result 2"/>
<connect from_op="Performance" from_port="example set" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>

Is there any idea?

Tagged:

Best Answer

  • Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist
    Solution Accepted

    Hi,

     

    'Compare ROC' operator can be used here.

     

    Please use this following example process which shows how to compare ROCs of more than one Machine learning algorithm.

     

    Hope this helps!

     

    Cheers,

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Root">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.1.000" expanded="true" height="68" name="Ripley-Set" width="90" x="313" y="120">
    <parameter key="repository_entry" value="//Samples/data/Ripley-Set"/>
    </operator>
    <operator activated="true" class="compare_rocs" compatibility="8.1.000" expanded="true" height="82" name="Compare ROCs" width="90" x="514" y="120">
    <process expanded="true">
    <operator activated="true" class="naive_bayes" compatibility="8.1.000" expanded="true" height="82" name="Naive Bayes" width="90" x="112" y="30"/>
    <operator activated="true" class="rule_induction" compatibility="8.1.000" expanded="true" height="82" name="Rule Induction" width="90" x="112" y="120"/>
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.000" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="210">
    <parameter key="confidence" value="0.1"/>
    </operator>
    <connect from_port="train 1" to_op="Naive Bayes" to_port="training set"/>
    <connect from_port="train 2" to_op="Rule Induction" to_port="training set"/>
    <connect from_port="train 3" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Naive Bayes" from_port="model" to_port="model 1"/>
    <connect from_op="Rule Induction" from_port="model" to_port="model 2"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model 3"/>
    <portSpacing port="source_train 1" spacing="0"/>
    <portSpacing port="source_train 2" spacing="72"/>
    <portSpacing port="source_train 3" spacing="72"/>
    <portSpacing port="source_train 4" spacing="18"/>
    <portSpacing port="sink_model 1" spacing="0"/>
    <portSpacing port="sink_model 2" spacing="72"/>
    <portSpacing port="sink_model 3" spacing="72"/>
    <portSpacing port="sink_model 4" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Ripley-Set" from_port="output" to_op="Compare ROCs" to_port="example set"/>
    <connect from_op="Compare ROCs" from_port="rocComparison" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="108"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Happily there is an operator that does exactly that, which is actually called "Compare ROCs".  You simply insert inside it the different learners that you want to compare to each other.  There is even a tutorial process for this operator that you can copy!

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • dsideratosdsideratos Member Posts: 3 Contributor I

    Sorry but, you don't understand my problem. Using Compare ROCs operator the problem is that graph ROCs relies on training data set. In my case, training data set with machine learning algorithm produce the model that applies in test data set. I think that this is clear above on xml. Then I would like to measure performance metrics. In order to visualize the diferent performance, I would like to make ROCs on the same graph. Unfortunatelly, using the compare rocs operator doesn't solve my problem.

  • Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist

    Hi,

     

    The new Auto model feature in 8.1 version has the capability to produce ROC comparison chart (as shown below) for all models built on validation data, together on one chart.

     

    Here is the complete details on Auto Model feature and how it works.

     

    Hope this helps!

     

    Cheers,

     

    ROC Comparision.png

  • baranbaran Member Posts: 5 Contributor II

    Hi there! 

     

    My process below but I can not apply roc compare. 

     

     

    Document data from files > select attributes > set role (it is sentiments -neg, noutr, pos-) > split validation (n-gram>apply model>performance) 

     

    Where can I put compare roc and how to?

     

    Thank you for your support :)

  • dsideratosdsideratos Member Posts: 3 Contributor I


    @Pavithra_Rao unfortunatelly, it doesn't solve my problem. I think have to make a new post because this is marked as sovled

     



     
Sign In or Register to comment.