Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Information Selection for Regression
I am using the information selection extension package for noise in label detection. The targets of my problem are numerical numbers. I am trying to use CNN but it does not select any examples. Does the package support regression as well or is it strictly for classification. If it also supports regression, what parameters are needed to set the process up correctly?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="read_csv" compatibility="7.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="/Users/xinkwang/test_file.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="use_quotes" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="false"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.1.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
<parameter key="attribute_name" value="att324"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="asin" value="id"/>
<parameter key="product_name" value="id"/>
</list>
</operator>
<operator activated="true" breakpoints="after" class="select_attributes" compatibility="7.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="product_name|asin"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="prules:cnn_sel" compatibility="7.0.000" expanded="true" height="103" name="Select by CNN" width="90" x="447" y="34">
<parameter key="measure_types" value="NumericalMeasures"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Select by CNN" to_port="exampleSet"/>
<connect from_op="Select by CNN" from_port="exampleSet" to_port="result 1"/>
<connect from_op="Select by CNN" from_port="prototypes" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
to be honest i need to admit that i never used this extension. It was always on my todo list. Marcin, the auther of the package, is also active on this board. You might send him a private message: http://rapid-i.com/rapidforum/index.php?action=profile;u=2560
~Martin
Dortmund, Germany
The threshold parameter must be configured for both regression decision functions. It defines the maximum acceptable difference between label predicted by nearest neighbors and the true label. In the case of Local decision function the threshold is multiplied by standard deviation of labels of the nearest examples (the noise estimation parameter defines number of nearest neighbors considered to estimate standard deviation of labels). The local function is more robust buth more computationally expensive. For non local decision function STD = 1;
Hope it helped
Best
Marcin