Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
ensemble learning
I tried a data set with ensemble learning (using KNN, decision tree, Naïve bayes). Im not able to see any improvement in the accuracy/precision/recall.
1. Is there any way to view the performance of individual models while using ensemble learning. - to compare the performance of the individual as well as ensemble in a single process.
2. when I use generate weight option - there is a warning for Knn - sub model , saying " input example set has example weights, but learner will ignore them". Even if we say -the learner ignores, the accuracy still reduces. First of all what is that warning coming only for Knn and its impact?
thanks
thiru
1. Is there any way to view the performance of individual models while using ensemble learning. - to compare the performance of the individual as well as ensemble in a single process.
2. when I use generate weight option - there is a warning for Knn - sub model , saying " input example set has example weights, but learner will ignore them". Even if we say -the learner ignores, the accuracy still reduces. First of all what is that warning coming only for Knn and its impact?
thanks
thiru
1
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi @Thiru,
you can do a Cross-Validation inside the ensemble model. You can use the Remember operator to store the individual performance results, or even store them into the repository.
Here's an example process:<?xml version="1.0" encoding="UTF-8"?><process version="9.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.5.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="-1"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.5.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="34">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="generate_weight_stratification" compatibility="9.5.001" expanded="true" height="82" name="Generate Weight (Stratification)" width="90" x="246" y="34">
<parameter key="total_weight" value="1.0"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.5.001" expanded="true" height="145" name="Validation" width="90" x="380" y="34">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="10"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="vote" compatibility="9.5.001" expanded="true" height="68" name="Vote" width="90" x="112" y="34">
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="9.5.001" expanded="true" height="145" name="Validation DT" width="90" x="246" y="34">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="10"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.5.001" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34">
<parameter key="criterion" value="gain_ratio"/>
<parameter key="maximal_depth" value="10"/>
<parameter key="apply_pruning" value="true"/>
<parameter key="confidence" value="0.1"/>
<parameter key="apply_prepruning" value="true"/>
<parameter key="minimal_gain" value="0.01"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_size_for_split" value="4"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
</operator>
<connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<description align="left" color="green" colored="true" height="80" resized="false" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.5.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance" compatibility="9.5.001" expanded="true" height="82" name="Performance (DT)" width="90" x="179" y="34">
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (DT)" to_port="labelled data"/>
<connect from_op="Performance (DT)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (DT)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
<description align="left" color="blue" colored="true" height="103" resized="false" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).<br/>The performance is evaluated and sent to the operator results.</description>
</process>
</operator>
<operator activated="true" class="remember" compatibility="9.5.001" expanded="true" height="68" name="Remember DTResults" width="90" x="447" y="85">
<parameter key="name" value="DTResults"/>
<parameter key="io_object" value="PerformanceVector"/>
<parameter key="store_which" value="1"/>
<parameter key="remove_from_process" value="true"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.5.001" expanded="true" height="145" name="Validation k-NN" width="90" x="112" y="187">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="10"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="k_nn" compatibility="9.5.001" expanded="true" height="82" name="k-NN" width="90" x="112" y="34">
<parameter key="k" value="5"/>
<parameter key="weighted_vote" value="true"/>
<parameter key="measure_types" value="MixedMeasures"/>
<parameter key="mixed_measure" value="MixedEuclideanDistance"/>
<parameter key="nominal_measure" value="NominalDistance"/>
<parameter key="numerical_measure" value="EuclideanDistance"/>
<parameter key="divergence" value="GeneralizedIDivergence"/>
<parameter key="kernel_type" value="radial"/>
<parameter key="kernel_gamma" value="1.0"/>
<parameter key="kernel_sigma1" value="1.0"/>
<parameter key="kernel_sigma2" value="0.0"/>
<parameter key="kernel_sigma3" value="2.0"/>
<parameter key="kernel_degree" value="3.0"/>
<parameter key="kernel_shift" value="1.0"/>
<parameter key="kernel_a" value="1.0"/>
<parameter key="kernel_b" value="0.0"/>
</operator>
<connect from_port="training set" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<description align="left" color="green" colored="true" height="80" resized="false" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.5.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance" compatibility="9.5.001" expanded="true" height="82" name="Performance (k-NN)" width="90" x="179" y="34">
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (k-NN)" to_port="labelled data"/>
<connect from_op="Performance (k-NN)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (k-NN)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
<description align="left" color="blue" colored="true" height="103" resized="false" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).<br/>The performance is evaluated and sent to the operator results.</description>
</process>
</operator>
<operator activated="true" class="remember" compatibility="9.5.001" expanded="true" height="68" name="Remember k-NN results" width="90" x="447" y="238">
<parameter key="name" value="kNNResults"/>
<parameter key="io_object" value="PerformanceVector"/>
<parameter key="store_which" value="1"/>
<parameter key="remove_from_process" value="true"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.5.001" expanded="true" height="145" name="Validation Naive Bayes" width="90" x="246" y="340">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="10"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="naive_bayes" compatibility="9.5.001" expanded="true" height="82" name="Naive Bayes" width="90" x="112" y="34">
<parameter key="laplace_correction" value="true"/>
</operator>
<connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<description align="left" color="green" colored="true" height="80" resized="false" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.5.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance" compatibility="9.5.001" expanded="true" height="82" name="Performance (Naive Bayes)" width="90" x="179" y="34">
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
<connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (Naive Bayes)" to_port="labelled data"/>
<connect from_op="Performance (Naive Bayes)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (Naive Bayes)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
<description align="left" color="blue" colored="true" height="103" resized="false" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).<br/>The performance is evaluated and sent to the operator results.</description>
</process>
</operator>
<operator activated="true" class="remember" compatibility="9.5.001" expanded="true" height="68" name="Remember Bayes results" width="90" x="447" y="391">
<parameter key="name" value="NaiveBayesResults"/>
<parameter key="io_object" value="PerformanceVector"/>
<parameter key="store_which" value="1"/>
<parameter key="remove_from_process" value="true"/>
</operator>
<connect from_port="training set 1" to_op="Validation DT" to_port="example set"/>
<connect from_port="training set 2" to_op="Validation k-NN" to_port="example set"/>
<connect from_port="training set 3" to_op="Validation Naive Bayes" to_port="example set"/>
<connect from_op="Validation DT" from_port="model" to_port="base model 1"/>
<connect from_op="Validation DT" from_port="performance 1" to_op="Remember DTResults" to_port="store"/>
<connect from_op="Validation k-NN" from_port="model" to_port="base model 2"/>
<connect from_op="Validation k-NN" from_port="performance 1" to_op="Remember k-NN results" to_port="store"/>
<connect from_op="Validation Naive Bayes" from_port="performance 1" to_op="Remember Bayes results" to_port="store"/>
<portSpacing port="source_training set 1" spacing="0"/>
<portSpacing port="source_training set 2" spacing="0"/>
<portSpacing port="source_training set 3" spacing="0"/>
<portSpacing port="source_training set 4" spacing="0"/>
<portSpacing port="sink_base model 1" spacing="0"/>
<portSpacing port="sink_base model 2" spacing="0"/>
<portSpacing port="sink_base model 3" spacing="0"/>
</process>
</operator>
<connect from_port="training set" to_op="Vote" to_port="training set"/>
<connect from_op="Vote" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<description align="left" color="green" colored="true" height="80" resized="true" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.5.001" expanded="true" height="82" name="Apply Model (4)" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance" compatibility="9.5.001" expanded="true" height="82" name="Performance (Vote)" width="90" x="179" y="34">
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model (4)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (4)" to_port="unlabelled data"/>
<connect from_op="Apply Model (4)" from_port="labelled data" to_op="Performance (Vote)" to_port="labelled data"/>
<connect from_op="Performance (Vote)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (Vote)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
<description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).<br/>The performance is evaluated and sent to the operator results.</description>
</process>
</operator>
<operator activated="true" class="recall" compatibility="9.5.001" expanded="true" height="68" name="Recall Decision Tree" width="90" x="581" y="136">
<parameter key="name" value="DTResults"/>
<parameter key="io_object" value="PerformanceVector"/>
<parameter key="remove_from_store" value="true"/>
</operator>
<operator activated="true" class="recall" compatibility="9.5.001" expanded="true" height="68" name="Recall k-NN" width="90" x="648" y="238">
<parameter key="name" value="kNNResults"/>
<parameter key="io_object" value="PerformanceVector"/>
<parameter key="remove_from_store" value="true"/>
</operator>
<operator activated="true" class="recall" compatibility="9.5.001" expanded="true" height="68" name="Recall Naive Bayes" width="90" x="715" y="340">
<parameter key="name" value="NaiveBayesResults"/>
<parameter key="io_object" value="PerformanceVector"/>
<parameter key="remove_from_store" value="true"/>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Generate Weight (Stratification)" to_port="example set input"/>
<connect from_op="Generate Weight (Stratification)" from_port="example set output" to_op="Validation" to_port="example set"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="performance 1" to_port="result 2"/>
<connect from_op="Recall Decision Tree" from_port="result" to_port="result 3"/>
<connect from_op="Recall k-NN" from_port="result" to_port="result 4"/>
<connect from_op="Recall Naive Bayes" from_port="result" to_port="result 5"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="42"/>
<portSpacing port="sink_result 4" spacing="105"/>
<portSpacing port="sink_result 5" spacing="63"/>
<portSpacing port="sink_result 6" spacing="0"/>
</process>
</operator>
</process>
The warning is somewhat unexpected, as k-NN is being used for explaining example weighting on Academy:
https://academy.rapidminer.com/learn/video/sampling-weighting-intro
@jmergler @Knut-RM
The impact is: If you use example weighting to make some examples more important (e. g. minority class, or customers with a high revenue), you expect the models to make more effort in predicting these examples correctly. If the model is good and catches these examples correctly anyway, you won't see a big impact.
@mbs: No, Group Models won't help here. The output of the first model is a model. If you put a second model into it, it will complain because it expects an Example Set as its input.
Regards,
Balázs8
Answers
Hello
you can use Group model operator. This operator groups the given models into a single combined model. When this combined model is applied, it is equivalent to applying the original models in their respective order.
and you can use this link:
https://community.rapidminer.com/discussion/comment/61377#Comment_61377
All the best
mbs
Hello
Thank you for your help
that was the problem of document
https://docs.rapidminer.com/latest/studio/operators/rapidminer-studio-operator-reference.pdf
@sgenzer
please look in Group model part in pdf
Regards
mbs