Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Bug in Loop and Deliver Best
marcin_blachnik
Member Posts: 61 Guru
I have noticed that, if one of the performances delivered to the "Loop and Deliver Best" operator is missing (NAN is delivered) then that operator treats it as the best performance and returns it at the output. Below is a sample process:
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="93" y="74">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="set_data" compatibility="7.4.000" expanded="true" height="82" name="Set Data" width="90" x="241" y="75">
<parameter key="example_index" value="3"/>
<parameter key="attribute_name" value="a1"/>
<parameter key="value" value="NaN"/>
<list key="additional_values"/>
</operator>
<operator activated="true" class="loop_and_deliver_best" compatibility="7.4.000" expanded="true" height="103" name="Loop and Deliver Best" width="90" x="514" y="69">
<process expanded="true">
<operator activated="true" class="extract_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="314" y="34">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="a1"/>
<parameter key="example_index" value="%{a}"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="556" y="34"/>
<connect from_port="in 1" to_op="Performance" to_port="example set"/>
<connect from_op="Performance" from_port="performance" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_port="performance vector"/>
<connect from_op="Multiply" from_port="output 2" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_performance vector" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Set Data" to_port="example set input"/>
<connect from_op="Set Data" from_port="example set output" to_op="Loop and Deliver Best" to_port="in 1"/>
<connect from_op="Loop and Deliver Best" from_port="performance" to_port="result 1"/>
<connect from_op="Loop and Deliver Best" from_port="out 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Best
Marcin
Tagged:
1
Answers
Can you repaste the XML or export it as an RMP file? I can't get it to populate.
Sorry
There was one tag missing:
Hmm. I'm not sure if that's a "bug bug" since you are introducing a NaN or missing input. Maybe it should give you a warning since there is a missing value that it might not work right. That happens to on the Forecasting Perf operator too if I don't keep good data quality.
Hmm,
I just wanted to pay your attention on such a strange interpretation of "Best Value".
By the way is there already any working bug tracker available?
Hi Marcin,
i've check our code. It's com/rapidminer/operator/performance/PerformanceCriterion.java line 102:
which uses the Java method compare. Having a look at this it is:
Which explains the behaviour. It might make some sense to handle NaNs in our own methods. What would you expect? NaN always "worse" than any other performance?
~Martin
Dortmund, Germany