ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.

VOTING MATTERS!

IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.

NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.

Feature Request: Model Simulator for Grouped Models

christos_karraschristos_karras Member Posts: 50 Guru
edited January 2020 in Product Ideas
I have a "Grouped Model" that groups different model pre-processing steps, for example Normalization, Imputation, etc. I tried to use this model with the Model Simulator, but I get this error:
Process failed: Wrong input of type 'Grouped Model' at port 'model'. Expected type 'Model'. 

As a workaround, I could re-apply all the pre-processing steps and use only the actual model with Model Simulator. However, this makes experimenting with different pre-processing steps (enable/disable normalization, enable/disable imputation, etc) harder because I have to duplicate code to make sure the steps are correctly applied for the model simulator.

Would it be feasible for a future version to support Grouped Models on the model simulator? Is there any technical reason why it wouldn't be supported, or is it just a missing feature?
Tagged:
0
0 votes

Scheduled for Release · Last Updated

MW-232 & IC-1764

Comments

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    we are on it. Give us a bit.

    Best,
    Martin

    CC: @IngoRM who may reveal more

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Now I can reveal a bit more :-)
    We just have released a new Beta version of RM 9.6 and you can try this yourself: https://static.rapidminer.com/rnd/html/rapidminer-9.6-preview.html
    Cheers,
    Ingo
  • christos_karraschristos_karras Member Posts: 50 Guru
    Hi @IngoRM, Great, so I guess I had good timing to ask that question :) . However, I just tried beta 9.6 and I see that there are still cases of grouped models that don't work:

    1. Nested grouped models: I have a first Group Model operator that is used for a sequence of pre-processing operations (normalization, PCA, etc). Then a second Group Model operator adds the prediction model. This results in a nested grouped model (a grouped model that contains another grouped model with the pre-processing steps). For this I have a workaround: I start by ungrouping the pre-processing model to get a collection of models, then I re-group this collection and the prediction model. This results in a flat group of models that works correctly.

    2. Some pre-processing steps also cause the error. For example, if I have a PCA followed by Random Forest, I get an error saying that the last model must be a prediction model. With the same model but without the PCA, I don't get this error.

    I tested this with the Model Simulator and Explain Predictions operators.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Thanks for the feedback - that is very helpful!  Re 1) we probably will not support nested models, at least not in the next version.  They are rarely nested and the workaround is relatively obvious (I hope).  But let's monitor and see if more people run into the same issue...
    Re 2) that is indeed an unintended bug - the check in the simulator is very rigorous and only allow preprocessing models which are defined as such, not general models which just behave like a preprocessing model.  We will look into relaxing this check to make this work for all models.
    Thanks again for pointing this out.
    Best,
    Ingo
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Just FYI, I have created an internal ticket for the second point (IC-1764).

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    And another FYI: the issue has been resolved.  Starting with RapidMiner 9.7, you can add any model to the grouped model, not just those which are (technically) preprocessing models.  As a result, you can now also combine for example a PCA model with a prediciton model.
    Hope this helps,
    Ingo
  • christos_karraschristos_karras Member Posts: 50 Guru

    I tested this change successfully with RapidMiner 9.8 using the Model Simulator and the Explain Predictions operators. However, I tried to deploy this grouped model to model ops, and I still have a similar error:

    Nov 17, 2020 4:48:57 PM com.rapidminer.extension.mdm.PluginInitModelManagement$3 done
    WARNING: Problem during deployment in background thread: java.lang.IllegalArgumentException: Only prediction models or grouped models consisting of preprocessing models and ending with a prediction model are supported for deployments.
    java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Only prediction models or grouped models consisting of preprocessing models and ending with a prediction model are supported for deployments.

    Could it simply be that a validation needs to be removed in Model Ops, now that Model Simulator and Explain Predictions support this? Or is there another limitation with Model Ops?

    Thanks
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi @christos_karras,

    Thanks for letting us know.  I just tried a couple of models including one and several preprocessing models with no problem.  Can you share the set of (preprocessing models) and the chain of models you have been trying?  I am happy to take a deeper look based on that.

    Cheers,
    Ingo
  • christos_karraschristos_karras Member Posts: 50 Guru

    I tried a "complex" example with:
    - Normalization (Z-Transformation) for one subset of attributes
    - Normalization (MinMaxNormalizationModel) for another subset of attributes
    - PCA
    - Final prediction model: Stacking

    However, I get the same same error with something simpler, for example PCA or ICA followed by Random Forest.
    I can create an example to reproduce the error later today if needed.

    Thanks
  • christos_karraschristos_karras Member Posts: 50 Guru
    Hi @IngoRM,

    I created a sample process to generate grouped models that will cause this error. The process includes several models in the group to try to identify what works and what doesn't. It's probably more complex than a real grouped model would typically be, but I think it can be a good test case for various features that use grouped models.

    This process will generate 2 files that can then be used to deploy a custom model to model ops:
    //Local Repository/ComplexGroupedModelData
    //Local Repository/ComplexGroupedModel

    The grouped model includes the following steps, and most of them work fine:

    Only PCA and KMeans Clustering (used to generate a "cluster" attribute as a pre-processing step) cause the grouped model to fail to deploy to model ops, even though this grouped model works with Explain Predictions and Model Simulator (as I also test in the process)

    To easily allow testing that removing PCA and KMeans Clustering resolves the issue, and adding either one of them causes the error, the process adds these two pre-processing models conditionally, based on two macros defined in the process context: IncludePCA and IncludeKMeans.

    To reproduce the error:
    * Set either IncludePCA or IncludeKMeans to 1, run the process, then deploy the resulting model to model ops

    To create a model that can be deployed to model ops without error:
    * Set both IncludePCA and IncludeKMeans to 0, run the process, then deploy the resulting model to model ops



    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
    <context>
    <input/>
    <output/>
    <macros>
    <macro>
    <key>IncludePCA</key>
    <value>1</value>
    </macro>
    <macro>
    <key>IncludeKMeans</key>
    <value>1</value>
    </macro>
    </macros>
    </context>
    <operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="9.8.000" expanded="true" height="68" name="Generate Data" width="90" x="313" y="187">
    <parameter key="target_function" value="driller oscillation timeseries"/>
    <parameter key="number_examples" value="100"/>
    <parameter key="number_of_attributes" value="20"/>
    <parameter key="attributes_lower_bound" value="-10.0"/>
    <parameter key="attributes_upper_bound" value="10.0"/>
    <parameter key="gaussian_standard_deviation" value="10.0"/>
    <parameter key="largest_radius" value="10.0"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.8.000" expanded="true" height="82" name="Set Role label (2)" width="90" x="447" y="187">
    <parameter key="attribute_name" value="att20"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="9.8.000" expanded="true" height="82" name="Generate ID" width="90" x="581" y="187">
    <parameter key="create_nominal_ids" value="false"/>
    <parameter key="offset" value="0"/>
    </operator>
    <operator activated="true" class="store" compatibility="9.8.000" expanded="true" height="68" name="Store ComplexGroupedModelData" width="90" x="715" y="187">
    <parameter key="repository_entry" value="ComplexGroupedModelData"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.8.000" expanded="true" height="124" name="Multiply All Data" width="90" x="849" y="187"/>
    <operator activated="true" class="subprocess" compatibility="9.8.000" expanded="true" height="103" name="Preprocessing models" width="90" x="1050" y="289">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="9.8.000" expanded="true" height="103" name="Normalization models" width="90" x="45" y="34">
    <process expanded="true">
    <operator activated="true" class="normalize" compatibility="9.8.000" expanded="true" height="103" name="Normalize (Z-transformation)" width="90" x="45" y="187">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="att1|att3|att5|att7|att9|att11|att13|att15"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="method" value="Z-transformation"/>
    <parameter key="min" value="0.0"/>
    <parameter key="max" value="1.0"/>
    <parameter key="allow_negative_values" value="false"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="9.8.000" expanded="true" height="103" name="Normalize (MinMax)" width="90" x="179" y="34">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="att1|att3|att5|att7|att9|att11|att13|att15"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="method" value="range transformation"/>
    <parameter key="min" value="-1.0"/>
    <parameter key="max" value="1.0"/>
    <parameter key="allow_negative_values" value="false"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="9.8.000" expanded="true" height="103" name="Normalize (IQR)" width="90" x="447" y="34">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="att17|att18|att19"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="method" value="interquartile range"/>
    <parameter key="min" value="0.0"/>
    <parameter key="max" value="1.0"/>
    <parameter key="allow_negative_values" value="false"/>
    </operator>
    <operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="124" name="Collect (2)" width="90" x="581" y="136">
    <parameter key="unfold" value="false"/>
    </operator>
    <connect from_port="in 1" to_op="Normalize (Z-transformation)" to_port="example set input"/>
    <connect from_op="Normalize (Z-transformation)" from_port="example set output" to_op="Normalize (MinMax)" to_port="example set input"/>
    <connect from_op="Normalize (Z-transformation)" from_port="preprocessing model" to_op="Collect (2)" to_port="input 2"/>
    <connect from_op="Normalize (MinMax)" from_port="example set output" to_op="Normalize (IQR)" to_port="example set input"/>
    <connect from_op="Normalize (MinMax)" from_port="preprocessing model" to_op="Collect (2)" to_port="input 1"/>
    <connect from_op="Normalize (IQR)" from_port="example set output" to_port="out 1"/>
    <connect from_op="Normalize (IQR)" from_port="preprocessing model" to_op="Collect (2)" to_port="input 3"/>
    <connect from_op="Collect (2)" from_port="collection" to_port="out 2"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="subprocess" compatibility="9.8.000" expanded="true" height="103" name="Discretize and dummy encode" width="90" x="179" y="187">
    <process expanded="true">
    <operator activated="true" class="discretize_by_bins" compatibility="9.8.000" expanded="true" height="103" name="Discretize att4" width="90" x="45" y="85">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="att4"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="number_of_bins" value="2"/>
    <parameter key="define_boundaries" value="false"/>
    <parameter key="range_name_type" value="long"/>
    <parameter key="automatic_number_of_digits" value="true"/>
    <parameter key="number_of_digits" value="3"/>
    </operator>
    <operator activated="true" class="nominal_to_numerical" compatibility="9.8.000" expanded="true" height="103" name="Nominal to Numerical (2)" width="90" x="246" y="34">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="att4"/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="file_path"/>
    <parameter key="block_type" value="single_value"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="single_value"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="coding_type" value="dummy coding"/>
    <parameter key="use_comparison_groups" value="false"/>
    <list key="comparison_groups"/>
    <parameter key="unexpected_value_handling" value="all 0 and warning"/>
    <parameter key="use_underscore_in_name" value="false"/>
    </operator>
    <operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="103" name="Collect (4)" width="90" x="313" y="187">
    <parameter key="unfold" value="false"/>
    </operator>
    <connect from_port="in 1" to_op="Discretize att4" to_port="example set input"/>
    <connect from_op="Discretize att4" from_port="example set output" to_op="Nominal to Numerical (2)" to_port="example set input"/>
    <connect from_op="Discretize att4" from_port="preprocessing model" to_op="Collect (4)" to_port="input 1"/>
    <connect from_op="Nominal to Numerical (2)" from_port="example set output" to_port="out 1"/>
    <connect from_op="Nominal to Numerical (2)" from_port="preprocessing model" to_op="Collect (4)" to_port="input 2"/>
    <connect from_op="Collect (4)" from_port="collection" to_port="out 2"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="branch" compatibility="9.8.000" expanded="true" height="103" name="if IncludePCA" width="90" x="313" y="340">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{IncludePCA} == &quot;1&quot;"/>
    <parameter key="io_object" value="ANOVAMatrix"/>
    <parameter key="return_inner_output" value="true"/>
    <process expanded="true">
    <operator activated="true" class="principal_component_analysis" compatibility="9.8.000" expanded="true" height="103" name="PCA (2)" width="90" x="112" y="34">
    <parameter key="dimensionality_reduction" value="keep variance"/>
    <parameter key="variance_threshold" value="0.95"/>
    <parameter key="number_of_components" value="1"/>
    </operator>
    <connect from_port="condition" to_op="PCA (2)" to_port="example set input"/>
    <connect from_op="PCA (2)" from_port="example set output" to_port="input 1"/>
    <connect from_op="PCA (2)" from_port="preprocessing model" to_port="input 2"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="branch" compatibility="9.8.000" expanded="true" height="103" name="if IncludeKMeans" width="90" x="447" y="442">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{IncludeKMeans} == &quot;1&quot;"/>
    <parameter key="io_object" value="ANOVAMatrix"/>
    <parameter key="return_inner_output" value="true"/>
    <process expanded="true">
    <operator activated="true" class="set_role" compatibility="9.8.000" expanded="true" height="82" name="Set Role (2)" width="90" x="45" y="136">
    <parameter key="attribute_name" value="label"/>
    <parameter key="target_role" value="label2"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:k_means" compatibility="9.8.000" expanded="true" height="82" name="Clustering (2)" width="90" x="179" y="136">
    <parameter key="add_cluster_attribute" value="true"/>
    <parameter key="add_as_label" value="false"/>
    <parameter key="remove_unlabeled" value="false"/>
    <parameter key="k" value="5"/>
    <parameter key="max_runs" value="10"/>
    <parameter key="determine_good_start_values" value="true"/>
    <parameter key="measure_types" value="BregmanDivergences"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="SquaredEuclideanDistance"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    <parameter key="max_optimization_steps" value="100"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.8.000" expanded="true" height="82" name="Set Role (3)" width="90" x="313" y="34">
    <parameter key="attribute_name" value="cluster"/>
    <parameter key="target_role" value="regular"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="nominal_to_numerical" compatibility="9.8.000" expanded="true" height="103" name="Nominal to Numerical" width="90" x="581" y="34">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="cluster"/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="file_path"/>
    <parameter key="block_type" value="single_value"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="single_value"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="coding_type" value="dummy coding"/>
    <parameter key="use_comparison_groups" value="false"/>
    <list key="comparison_groups"/>
    <parameter key="unexpected_value_handling" value="all 0 and warning"/>
    <parameter key="use_underscore_in_name" value="false"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.8.000" expanded="true" height="82" name="Set Role" width="90" x="782" y="34">
    <parameter key="attribute_name" value="att20"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="103" name="Collect" width="90" x="849" y="136">
    <parameter key="unfold" value="false"/>
    </operator>
    <connect from_port="condition" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Clustering (2)" to_port="example set"/>
    <connect from_op="Clustering (2)" from_port="cluster model" to_op="Collect" to_port="input 1"/>
    <connect from_op="Clustering (2)" from_port="clustered set" to_op="Set Role (3)" to_port="example set input"/>
    <connect from_op="Set Role (3)" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="preprocessing model" to_op="Collect" to_port="input 2"/>
    <connect from_op="Set Role" from_port="example set output" to_port="input 1"/>
    <connect from_op="Collect" from_port="collection" to_port="input 2"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="145" name="Collect Preprocessing Models" width="90" x="782" y="136">
    <parameter key="unfold" value="true"/>
    </operator>
    <connect from_port="in 1" to_op="Normalization models" to_port="in 1"/>
    <connect from_op="Normalization models" from_port="out 1" to_op="Discretize and dummy encode" to_port="in 1"/>
    <connect from_op="Normalization models" from_port="out 2" to_op="Collect Preprocessing Models" to_port="input 1"/>
    <connect from_op="Discretize and dummy encode" from_port="out 1" to_op="if IncludePCA" to_port="condition"/>
    <connect from_op="Discretize and dummy encode" from_port="out 2" to_op="Collect Preprocessing Models" to_port="input 2"/>
    <connect from_op="if IncludePCA" from_port="input 1" to_op="if IncludeKMeans" to_port="condition"/>
    <connect from_op="if IncludePCA" from_port="input 2" to_op="Collect Preprocessing Models" to_port="input 3"/>
    <connect from_op="if IncludeKMeans" from_port="input 1" to_port="out 1"/>
    <connect from_op="if IncludeKMeans" from_port="input 2" to_op="Collect Preprocessing Models" to_port="input 4"/>
    <connect from_op="Collect Preprocessing Models" from_port="collection" to_port="out 2"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.8.000" expanded="true" height="82" name="Set Role label" width="90" x="1318" y="136">
    <parameter key="attribute_name" value="att20"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="stacking" compatibility="9.8.000" expanded="true" height="68" name="Stacking" width="90" x="1452" y="136">
    <parameter key="keep_all_attributes" value="true"/>
    <parameter key="keep_confidences" value="false"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.8.000" expanded="true" height="103" name="Random Forest" width="90" x="179" y="34">
    <parameter key="number_of_trees" value="100"/>
    <parameter key="criterion" value="least_square"/>
    <parameter key="maximal_depth" value="10"/>
    <parameter key="apply_pruning" value="false"/>
    <parameter key="confidence" value="0.1"/>
    <parameter key="apply_prepruning" value="false"/>
    <parameter key="minimal_gain" value="0.01"/>
    <parameter key="minimal_leaf_size" value="2"/>
    <parameter key="minimal_size_for_split" value="4"/>
    <parameter key="number_of_prepruning_alternatives" value="3"/>
    <parameter key="random_splits" value="false"/>
    <parameter key="guess_subset_ratio" value="true"/>
    <parameter key="subset_ratio" value="0.2"/>
    <parameter key="voting_strategy" value="confidence vote"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="enable_parallel_execution" value="true"/>
    </operator>
    <operator activated="true" class="h2o:generalized_linear_model" compatibility="9.8.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="179" y="187">
    <parameter key="family" value="AUTO"/>
    <parameter key="link" value="family_default"/>
    <parameter key="solver" value="AUTO"/>
    <parameter key="reproducible" value="false"/>
    <parameter key="maximum_number_of_threads" value="4"/>
    <parameter key="use_regularization" value="true"/>
    <parameter key="lambda_search" value="false"/>
    <parameter key="number_of_lambdas" value="0"/>
    <parameter key="lambda_min_ratio" value="0.0"/>
    <parameter key="early_stopping" value="true"/>
    <parameter key="stopping_rounds" value="3"/>
    <parameter key="stopping_tolerance" value="0.001"/>
    <parameter key="standardize" value="true"/>
    <parameter key="non-negative_coefficients" value="false"/>
    <parameter key="add_intercept" value="true"/>
    <parameter key="compute_p-values" value="false"/>
    <parameter key="remove_collinear_columns" value="false"/>
    <parameter key="missing_values_handling" value="MeanImputation"/>
    <parameter key="max_iterations" value="0"/>
    <parameter key="specify_beta_constraints" value="false"/>
    <list key="beta_constraints"/>
    <parameter key="max_runtime_seconds" value="0"/>
    <list key="expert_parameters"/>
    </operator>
    <connect from_port="training set 1" to_op="Random Forest" to_port="training set"/>
    <connect from_port="training set 2" to_op="Generalized Linear Model" to_port="training set"/>
    <connect from_op="Random Forest" from_port="model" to_port="base model 1"/>
    <connect from_op="Generalized Linear Model" from_port="model" to_port="base model 2"/>
    <portSpacing port="source_training set 1" spacing="0"/>
    <portSpacing port="source_training set 2" spacing="0"/>
    <portSpacing port="source_training set 3" spacing="0"/>
    <portSpacing port="sink_base model 1" spacing="0"/>
    <portSpacing port="sink_base model 2" spacing="0"/>
    <portSpacing port="sink_base model 3" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="h2o:gradient_boosted_trees" compatibility="9.8.000" expanded="true" height="103" name="Gradient Boosted Trees" width="90" x="179" y="34">
    <parameter key="number_of_trees" value="50"/>
    <parameter key="reproducible" value="false"/>
    <parameter key="maximum_number_of_threads" value="4"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="maximal_depth" value="5"/>
    <parameter key="min_rows" value="10.0"/>
    <parameter key="min_split_improvement" value="1.0E-5"/>
    <parameter key="number_of_bins" value="20"/>
    <parameter key="learning_rate" value="0.01"/>
    <parameter key="sample_rate" value="1.0"/>
    <parameter key="distribution" value="AUTO"/>
    <parameter key="early_stopping" value="false"/>
    <parameter key="stopping_rounds" value="1"/>
    <parameter key="stopping_metric" value="AUTO"/>
    <parameter key="stopping_tolerance" value="0.001"/>
    <list key="monotone_constraints"/>
    <parameter key="max_runtime_seconds" value="0"/>
    <list key="expert_parameters"/>
    </operator>
    <connect from_port="stacking examples" to_op="Gradient Boosted Trees" to_port="training set"/>
    <connect from_op="Gradient Boosted Trees" from_port="model" to_port="stacking model"/>
    <portSpacing port="source_stacking examples" spacing="0"/>
    <portSpacing port="sink_stacking model" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="group_models" compatibility="9.8.000" expanded="true" height="103" name="Group Models" width="90" x="1653" y="238"/>
    <operator activated="true" class="store" compatibility="9.8.000" expanded="true" height="68" name="Store ComplexGroupedModel" width="90" x="1854" y="238">
    <parameter key="repository_entry" value="ComplexGroupedModel"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.8.000" expanded="true" height="103" name="Multiply ComplexGroupedModel" width="90" x="1988" y="238"/>
    <operator activated="true" class="subprocess" compatibility="9.8.000" expanded="true" height="103" name="Apply Each Model Separately" width="90" x="2122" y="340">
    <process expanded="true">
    <operator activated="true" class="remember" compatibility="9.8.000" expanded="true" height="68" name="Remember (2)" width="90" x="45" y="85">
    <parameter key="name" value="ApplyModelDataSet"/>
    <parameter key="io_object" value="ExampleSet"/>
    <parameter key="store_which" value="1"/>
    <parameter key="remove_from_process" value="true"/>
    </operator>
    <operator activated="true" class="ungroup_models" compatibility="9.8.000" expanded="true" height="68" name="Ungroup Models" width="90" x="179" y="34"/>
    <operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34">
    <parameter key="set_iteration_macro" value="false"/>
    <parameter key="macro_name" value="iteration"/>
    <parameter key="macro_start_value" value="1"/>
    <parameter key="unfold" value="true"/>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="9.8.000" expanded="true" height="68" name="Recall" width="90" x="112" y="136">
    <parameter key="name" value="ApplyModelDataSet"/>
    <parameter key="io_object" value="ExampleSet"/>
    <parameter key="remove_from_store" value="true"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="9.8.000" expanded="true" height="82" name="Apply Model" width="90" x="246" y="34">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
    </operator>
    <operator activated="true" class="remember" compatibility="9.8.000" expanded="true" height="68" name="Remember" width="90" x="514" y="34">
    <parameter key="name" value="ApplyModelDataSet"/>
    <parameter key="io_object" value="ExampleSet"/>
    <parameter key="store_which" value="1"/>
    <parameter key="remove_from_process" value="true"/>
    </operator>
    <connect from_port="single" to_op="Apply Model" to_port="model"/>
    <connect from_op="Recall" from_port="result" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Remember" to_port="store"/>
    <connect from_op="Remember" from_port="stored" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="recall" compatibility="9.8.000" expanded="true" height="68" name="Recall (2)" width="90" x="447" y="136">
    <parameter key="name" value="ApplyModelDataSet"/>
    <parameter key="io_object" value="ExampleSet"/>
    <parameter key="remove_from_store" value="true"/>
    </operator>
    <connect from_port="in 1" to_op="Ungroup Models" to_port="grouped model"/>
    <connect from_port="in 2" to_op="Remember (2)" to_port="store"/>
    <connect from_op="Ungroup Models" from_port="models" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Loop Collection" from_port="output 1" to_port="out 1"/>
    <connect from_op="Recall (2)" from_port="result" to_port="out 2"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="source_in 3" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">To debug the grouped models, we can ungroup them and apply them separately</description>
    </operator>
    <operator activated="true" class="subprocess" compatibility="9.8.000" expanded="true" height="103" name="Test Simulator and Explain Predictions" width="90" x="2122" y="646">
    <process expanded="true">
    <operator activated="true" class="multiply" compatibility="9.8.000" expanded="true" height="103" name="Multiply model" width="90" x="112" y="34"/>
    <operator activated="true" class="split_data" compatibility="9.8.000" expanded="true" height="103" name="Split Data" width="90" x="45" y="187">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.7"/>
    <parameter key="ratio" value="0.3"/>
    </enumeration>
    <parameter key="sampling_type" value="automatic"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.8.000" expanded="true" height="103" name="Multiply tes" width="90" x="179" y="493"/>
    <operator activated="true" class="multiply" compatibility="9.8.000" expanded="true" height="103" name="Multiply tra" width="90" x="179" y="340"/>
    <operator activated="true" class="model_simulator:explain_predictions" compatibility="9.8.000" expanded="true" height="124" name="Explain Predictions" width="90" x="380" y="289">
    <parameter key="maximal explaining attributes" value="3"/>
    <parameter key="apply maximum to importances output" value="false"/>
    <parameter key="local sample size" value="500"/>
    <parameter key="only create predictions" value="false"/>
    <parameter key="normalize global weights" value="false"/>
    <parameter key="sort_weights" value="true"/>
    <parameter key="sort_direction" value="descending"/>
    </operator>
    <operator activated="true" class="model_simulator:model_simulator" compatibility="9.8.000" expanded="true" height="103" name="Model Simulator" width="90" x="380" y="34"/>
    <connect from_port="in 1" to_op="Multiply model" to_port="input"/>
    <connect from_port="in 2" to_op="Split Data" to_port="example set"/>
    <connect from_op="Multiply model" from_port="output 1" to_op="Model Simulator" to_port="model"/>
    <connect from_op="Multiply model" from_port="output 2" to_op="Explain Predictions" to_port="model"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Multiply tra" to_port="input"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Multiply tes" to_port="input"/>
    <connect from_op="Multiply tes" from_port="output 1" to_op="Model Simulator" to_port="test data"/>
    <connect from_op="Multiply tes" from_port="output 2" to_op="Explain Predictions" to_port="test data"/>
    <connect from_op="Multiply tra" from_port="output 1" to_op="Model Simulator" to_port="training data"/>
    <connect from_op="Multiply tra" from_port="output 2" to_op="Explain Predictions" to_port="training data"/>
    <connect from_op="Explain Predictions" from_port="visualization output" to_port="out 1"/>
    <connect from_op="Model Simulator" from_port="simulator output" to_port="out 2"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="source_in 3" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Test RapidMiner features that did not work with grouped models before v9.7</description>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Set Role label (2)" to_port="example set input"/>
    <connect from_op="Set Role label (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Store ComplexGroupedModelData" to_port="input"/>
    <connect from_op="Store ComplexGroupedModelData" from_port="through" to_op="Multiply All Data" to_port="input"/>
    <connect from_op="Multiply All Data" from_port="output 1" to_op="Preprocessing models" to_port="in 1"/>
    <connect from_op="Multiply All Data" from_port="output 2" to_op="Apply Each Model Separately" to_port="in 2"/>
    <connect from_op="Multiply All Data" from_port="output 3" to_op="Test Simulator and Explain Predictions" to_port="in 2"/>
    <connect from_op="Preprocessing models" from_port="out 1" to_op="Set Role label" to_port="example set input"/>
    <connect from_op="Preprocessing models" from_port="out 2" to_op="Group Models" to_port="models in 1"/>
    <connect from_op="Set Role label" from_port="example set output" to_op="Stacking" to_port="training set"/>
    <connect from_op="Stacking" from_port="model" to_op="Group Models" to_port="models in 2"/>
    <connect from_op="Group Models" from_port="model out" to_op="Store ComplexGroupedModel" to_port="input"/>
    <connect from_op="Store ComplexGroupedModel" from_port="through" to_op="Multiply ComplexGroupedModel" to_port="input"/>
    <connect from_op="Multiply ComplexGroupedModel" from_port="output 1" to_op="Apply Each Model Separately" to_port="in 1"/>
    <connect from_op="Multiply ComplexGroupedModel" from_port="output 2" to_op="Test Simulator and Explain Predictions" to_port="in 1"/>
    <connect from_op="Apply Each Model Separately" from_port="out 1" to_port="result 1"/>
    <connect from_op="Apply Each Model Separately" from_port="out 2" to_port="result 2"/>
    <connect from_op="Test Simulator and Explain Predictions" from_port="out 1" to_port="result 4"/>
    <connect from_op="Test Simulator and Explain Predictions" from_port="out 2" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="315"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="189"/>
    <portSpacing port="sink_result 4" spacing="126"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

Sign In or Register to comment.