Options

"Learning Curve" operator is necessary for a research paper.

User10313User10313 Member Posts: 2 Contributor I
Hello,
We used RapidMiner Studio 9.8, which we analyzed in a COVID-19-based study that we planned to publish in a 1st class journal. But in the current version we need the "Learning Curve" operator. However, we noticed that this operator has been deprecated. In order to complete this study, we need to provide a reference for the study using the "Learning Curve" operator. Can you please add the relevant operator to RapidMiner or provide a solution for us to use? We would also like to inform you that we will include information in our study that includes our gratitude for your help. We will also cited to the "Learning Curve" operator.
Thank you in advance for your interest.
Sincerely.

Best Answers

  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Solution Accepted
    Previous discussion of how to replace the Learning Curve operator can be found here: https://community.rapidminer.com/discussion/39096/how-to-plot-a-learning-curve-for-a-given-model
  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited January 2021 Solution Accepted
    I am not sure if this is what you want, however, here is an example how to create a learning curve showing a sample ratio vs validation performance (if you wanted to also show training performance you'd need to collect model performance on training data as well). Here is the chart displaying the accuracy, kappa and AUC validation performance measures:


    And here is the process which generated it (from the Titanic data).


    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.8.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.8.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="34">
    <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
    </operator>
    <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.8.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="313" y="34">
    <list key="parameters">
    <parameter key="Sample.sample_ratio" value="[0.02;1.0;49;linear]"/>
    </list>
    <parameter key="error_handling" value="fail on error"/>
    <parameter key="log_performance" value="true"/>
    <parameter key="log_all_criteria" value="true"/>
    <parameter key="synchronize" value="false"/>
    <parameter key="enable_parallel_execution" value="true"/>
    <process expanded="true">
    <operator activated="true" class="sample" compatibility="9.8.001" expanded="true" height="82" name="Sample" width="90" x="112" y="34">
    <parameter key="sample" value="relative"/>
    <parameter key="balance_data" value="false"/>
    <parameter key="sample_size" value="100"/>
    <parameter key="sample_ratio" value="0.1"/>
    <parameter key="sample_probability" value="0.1"/>
    <list key="sample_size_per_class"/>
    <list key="sample_ratio_per_class"/>
    <list key="sample_probability_per_class"/>
    <parameter key="use_local_random_seed" value="true"/>
    <parameter key="local_random_seed" value="2021"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="9.8.001" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="34">
    <parameter key="split_on_batch_attribute" value="false"/>
    <parameter key="leave_one_out" value="false"/>
    <parameter key="number_of_folds" value="10"/>
    <parameter key="sampling_type" value="automatic"/>
    <parameter key="use_local_random_seed" value="true"/>
    <parameter key="local_random_seed" value="2021"/>
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.8.001" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
    <parameter key="criterion" value="gain_ratio"/>
    <parameter key="maximal_depth" value="15"/>
    <parameter key="apply_pruning" value="true"/>
    <parameter key="confidence" value="0.1"/>
    <parameter key="apply_prepruning" value="true"/>
    <parameter key="minimal_gain" value="0.01"/>
    <parameter key="minimal_leaf_size" value="2"/>
    <parameter key="minimal_size_for_split" value="4"/>
    <parameter key="number_of_prepruning_alternatives" value="3"/>
    </operator>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="105"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="9.8.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
    </operator>
    <operator activated="true" class="performance_binominal_classification" compatibility="9.8.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
    <parameter key="manually_set_positive_class" value="false"/>
    <parameter key="main_criterion" value="first"/>
    <parameter key="accuracy" value="true"/>
    <parameter key="classification_error" value="false"/>
    <parameter key="kappa" value="true"/>
    <parameter key="AUC (optimistic)" value="false"/>
    <parameter key="AUC" value="true"/>
    <parameter key="AUC (pessimistic)" value="false"/>
    <parameter key="precision" value="false"/>
    <parameter key="recall" value="false"/>
    <parameter key="lift" value="false"/>
    <parameter key="fallout" value="false"/>
    <parameter key="f_measure" value="false"/>
    <parameter key="false_positive" value="false"/>
    <parameter key="false_negative" value="false"/>
    <parameter key="true_positive" value="false"/>
    <parameter key="true_negative" value="false"/>
    <parameter key="sensitivity" value="false"/>
    <parameter key="specificity" value="false"/>
    <parameter key="youden" value="false"/>
    <parameter key="positive_predictive_value" value="false"/>
    <parameter key="negative_predictive_value" value="false"/>
    <parameter key="psep" value="false"/>
    <parameter key="skip_undefined_labels" value="true"/>
    <parameter key="use_example_weights" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <connect from_op="Performance" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="84"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_port="input 1" to_op="Sample" to_port="example set input"/>
    <connect from_op="Sample" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_port="model"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="log_to_data" compatibility="9.8.001" expanded="true" height="82" name="Log to Data" width="90" x="514" y="136">
    <parameter key="log_name" value="Optimize Parameters (Grid)"/>
    </operator>
    <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="model" to_port="result 2"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="parameter set" to_port="result 3"/>
    <connect from_op="Log to Data" from_port="exampleSet" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

    Note that to avoid the impact of various random effects on performance and to ensure that the performance depends only on the sample ratio, you need to set random seeds of all operators, which have any random effect in their behaviour.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    edited January 2021 Solution Accepted

    We never delete operators, we only deprecated them to make them 'unavailable' but they still work in old processes. There is actually a trick how to get those in if you need them.
    Lets say you have a process with only one operator like this:
    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.8.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.8.001" expanded="true" height="68" name="Retrieve" width="90" x="380" y="34"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    You can then replace the class="retrieve" with the operator you want to get. In your case it is class="create_learning_curve". This way you get the old operator. You can then copy and paste the operator over to where you need it. You can see all available 'class' keywords on github: https://github.com/rapidminer/rapidminer-studio/blob/master/src/main/resources/com/rapidminer/resources/OperatorsCore.xml
    A warning: There are reasons why we deprecate operators. Usually because there are better ways how to do it or because of technical issues. So use this trick with care!

    Best,
    Martin





    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    user4460user4460 Member Posts: 3 Contributor I
    Solution Accepted
    Type your comment

Answers

  • Options
    User10313User10313 Member Posts: 2 Contributor I
    Dear jacobcybulski,
    We thank you so much for your help and sending XML code for the related process.
    Sincerely.
  • Options
    user4460user4460 Member Posts: 3 Contributor I
    First of all, thank you for your help. However, I could not make any progress on how to activate the learning curve in the stage where I took a screenshot. Can you explain how this process will work?
    Thank you
  • Options
    user4460user4460 Member Posts: 3 Contributor I
    Öncelikle yardımınız için teşekkür ederim. Ancak, ekran görüntüsü aldığım aşamada öğrenme eğrisini nasıl etkinleştireceğim konusunda herhangi bir ilerleme kaydedemedim. Bu sürecin nasıl işleyeceğini açıklayabilir misiniz?
    teşekkür ederim
Sign In or Register to comment.