Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Loops / Iterations"
I consider the loops/iterations one of the most interesting tool of RapidMiner.
I am trying to use ClusterLoop, but something strange happened while using it.
In this example (see code) a ClusterLoop is set on the cluster-attribute: districtName. The ClusterLoop node contains a simple Replace node, which should “creates new attributes from nominal attributes with replaced substrings”: the Replace node is set on the attribute: dataset [“train”, “test”].
After setting the Debug-mode on, I start the process and check the results:
- the loop cycle accordingly to the different clusters, which is great!
- the values “test” are transformed into “TTTesTTT”, as expected
- the values “train” are transformed into “TTTesTTT” too bug?
For that reason, I decided to save them using the nodes WritePerformance and WriteModel. In the path I need to use the Parameter Macros. It seems that they do not work (is ralated to this bug? http://bugs.rapid-i.com/show_bug.cgi?id=84#c0
I tried both the old version %{a} and the one proposed in the tutorial %{loop_value} , but they didn’t work!
I am trying to use ClusterLoop, but something strange happened while using it.
In this example (see code) a ClusterLoop is set on the cluster-attribute: districtName. The ClusterLoop node contains a simple Replace node, which should “creates new attributes from nominal attributes with replaced substrings”: the Replace node is set on the attribute: dataset [“train”, “test”].
After setting the Debug-mode on, I start the process and check the results:
- the loop cycle accordingly to the different clusters, which is great!
- the values “test” are transformed into “TTTesTTT”, as expected
- the values “train” are transformed into “TTTesTTT” too bug?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>In another process I work on each cluster applying a LinaerRegression. I want to see the results of the operation, but I am not able to show them in the “Result page”.
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="521" width="1016">
<operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
<parameter key="repository_entry" value="data2"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID|dataset"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
<parameter key="name" value="DistrictName"/>
<parameter key="target_role" value="cluster"/>
</operator>
<operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
<process expanded="true" height="502" width="1059">
<operator activated="true" class="replace" expanded="true" height="76" name="Replace" width="90" x="447" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="dataset"/>
<parameter key="replace_what" value="t"/>
<parameter key="replace_by" value="TTT"/>
</operator>
<connect from_port="cluster subset" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_cluster subset" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="TestData2" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
<connect from_op="Loop Clusters" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="144"/>
</process>
</operator>
</process>
For that reason, I decided to save them using the nodes WritePerformance and WriteModel. In the path I need to use the Parameter Macros. It seems that they do not work (is ralated to this bug? http://bugs.rapid-i.com/show_bug.cgi?id=84#c0
I tried both the old version %{a} and the one proposed in the tutorial %{loop_value} , but they didn’t work!
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="521" width="1016">
<operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
<parameter key="repository_entry" value="data2"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID|dataset"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
<parameter key="name" value="DistrictName"/>
<parameter key="target_role" value="cluster"/>
</operator>
<operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
<process expanded="true" height="502" width="1016">
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dataset=train"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID"/>
</operator>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="447" y="30"/>
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples (3)" width="90" x="179" y="120">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dataset=test"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (3)" width="90" x="313" y="120">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID"/>
</operator>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="581" y="75">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" expanded="true" height="76" name="Performance" width="90" x="782" y="30">
<parameter key="root_mean_squared_error" value="true"/>
</operator>
<operator activated="true" class="write_performance" expanded="true" height="60" name="Write Performance" width="90" x="916" y="30">
<parameter key="performance_file" value="C:\IO\loop\performance%{a}.per"/>
</operator>
<operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="782" y="120">
<parameter key="model_file" value="C:\IO\loop\model%{a}.mod"/>
<parameter key="output_type" value="XML"/>
</operator>
<connect from_port="cluster subset" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="original" to_op="Filter Examples (3)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
<connect from_op="Performance" from_port="performance" to_op="Write Performance" to_port="input"/>
<portSpacing port="source_cluster subset" spacing="0"/>
<portSpacing port="source_in 1" spacing="54"/>
<portSpacing port="sink_out 1" spacing="0"/>
</process>
</operator>
<connect from_op="TestData2" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
could you try to reproduce this behavior on a generated example set? Then I simply could load your process and see what's going wrong.
Thank you in advance.
Greetings,
Sebastian
Here is the example code for the Replace problem:
Here is the example code for the save problem:
everything works fine for me, so every bug seems to be already fixed in the current version.
So your problems will be solved with the final version.
Greetings,
Sebastian