Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Learning Curve" operator is necessary for a research paper.
Hello,
We used RapidMiner Studio 9.8, which we analyzed in a COVID-19-based study that we planned to publish in a 1st class journal. But in the current version we need the "Learning Curve" operator. However, we noticed that this operator has been deprecated. In order to complete this study, we need to provide a reference for the study using the "Learning Curve" operator. Can you please add the relevant operator to RapidMiner or provide a solution for us to use? We would also like to inform you that we will include information in our study that includes our gratitude for your help. We will also cited to the "Learning Curve" operator.
Thank you in advance for your interest.
Sincerely.
We used RapidMiner Studio 9.8, which we analyzed in a COVID-19-based study that we planned to publish in a 1st class journal. But in the current version we need the "Learning Curve" operator. However, we noticed that this operator has been deprecated. In order to complete this study, we need to provide a reference for the study using the "Learning Curve" operator. Can you please add the relevant operator to RapidMiner or provide a solution for us to use? We would also like to inform you that we will include information in our study that includes our gratitude for your help. We will also cited to the "Learning Curve" operator.
Thank you in advance for your interest.
Sincerely.
0
Best Answers
-
jacobcybulski Member, University Professor Posts: 391 UnicornPrevious discussion of how to replace the Learning Curve operator can be found here: https://community.rapidminer.com/discussion/39096/how-to-plot-a-learning-curve-for-a-given-model1
-
jacobcybulski Member, University Professor Posts: 391 UnicornI am not sure if this is what you want, however, here is an example how to create a learning curve showing a sample ratio vs validation performance (if you wanted to also show training performance you'd need to collect model performance on training data as well). Here is the chart displaying the accuracy, kappa and AUC validation performance measures:
And here is the process which generated it (from the Titanic data).<?xml version="1.0" encoding="UTF-8"?><process version="9.8.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.8.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="34">
<parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
</operator>
<operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.8.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="313" y="34">
<list key="parameters">
<parameter key="Sample.sample_ratio" value="[0.02;1.0;49;linear]"/>
</list>
<parameter key="error_handling" value="fail on error"/>
<parameter key="log_performance" value="true"/>
<parameter key="log_all_criteria" value="true"/>
<parameter key="synchronize" value="false"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="sample" compatibility="9.8.001" expanded="true" height="82" name="Sample" width="90" x="112" y="34">
<parameter key="sample" value="relative"/>
<parameter key="balance_data" value="false"/>
<parameter key="sample_size" value="100"/>
<parameter key="sample_ratio" value="0.1"/>
<parameter key="sample_probability" value="0.1"/>
<list key="sample_size_per_class"/>
<list key="sample_ratio_per_class"/>
<list key="sample_probability_per_class"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="2021"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.8.001" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="34">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="10"/>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="2021"/>
<parameter key="enable_parallel_execution" value="false"/>
<process expanded="true">
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.8.001" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
<parameter key="criterion" value="gain_ratio"/>
<parameter key="maximal_depth" value="15"/>
<parameter key="apply_pruning" value="true"/>
<parameter key="confidence" value="0.1"/>
<parameter key="apply_prepruning" value="true"/>
<parameter key="minimal_gain" value="0.01"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_size_for_split" value="4"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
</operator>
<connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="105"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.8.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance_binominal_classification" compatibility="9.8.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
<parameter key="manually_set_positive_class" value="false"/>
<parameter key="main_criterion" value="first"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="false"/>
<parameter key="kappa" value="true"/>
<parameter key="AUC (optimistic)" value="false"/>
<parameter key="AUC" value="true"/>
<parameter key="AUC (pessimistic)" value="false"/>
<parameter key="precision" value="false"/>
<parameter key="recall" value="false"/>
<parameter key="lift" value="false"/>
<parameter key="fallout" value="false"/>
<parameter key="f_measure" value="false"/>
<parameter key="false_positive" value="false"/>
<parameter key="false_negative" value="false"/>
<parameter key="true_positive" value="false"/>
<parameter key="true_negative" value="false"/>
<parameter key="sensitivity" value="false"/>
<parameter key="specificity" value="false"/>
<parameter key="youden" value="false"/>
<parameter key="positive_predictive_value" value="false"/>
<parameter key="negative_predictive_value" value="false"/>
<parameter key="psep" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="84"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Sample" to_port="example set input"/>
<connect from_op="Sample" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_port="model"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" compatibility="9.8.001" expanded="true" height="82" name="Log to Data" width="90" x="514" y="136">
<parameter key="log_name" value="Optimize Parameters (Grid)"/>
</operator>
<connect from_op="Retrieve Titanic Training" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="model" to_port="result 2"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter set" to_port="result 3"/>
<connect from_op="Log to Data" from_port="exampleSet" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>Note that to avoid the impact of various random effects on performance and to ensure that the performance depends only on the sample ratio, you need to set random seeds of all operators, which have any random effect in their behaviour.1 -
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data ScientistHi @User10313,We never delete operators, we only deprecated them to make them 'unavailable' but they still work in old processes. There is actually a trick how to get those in if you need them.Lets say you have a process with only one operator like this:<?xml version="1.0" encoding="UTF-8"?><process version="9.8.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.8.001" expanded="true" height="68" name="Retrieve" width="90" x="380" y="34"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>You can then replace the class="retrieve" with the operator you want to get. In your case it is class="create_learning_curve". This way you get the old operator. You can then copy and paste the operator over to where you need it. You can see all available 'class' keywords on github: https://github.com/rapidminer/rapidminer-studio/blob/master/src/main/resources/com/rapidminer/resources/OperatorsCore.xmlA warning: There are reasons why we deprecate operators. Usually because there are better ways how to do it or because of technical issues. So use this trick with care!Best,Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany2
Answers
We thank you so much for your help and sending XML code for the related process.
Sincerely.