The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
Discretize by Entropy not working properly?
miguelbiron
Member Posts: 1 Learner III
Hello,
I'm doing some experimental tests on the capabilities of this software, which is apparently really great for datamining tasks, and I'm encountering a problem when using the "Discretize by Entropy" operator. Using the Iris Database, I apply the latter function and get that the two most powerful features, namely "Petal Width" and "Petal Length" (called "a3" and "a4" in the sample database that comes with Rapidminer), get erased by this operator as "useless atributes". This is nonsense (or I'm really missing something), since those attributes get selected by any method of attribute selection, or like i did, using "Decision Tree" operator, they are the only ones used on the resulting tree.
I looked all over the forum and googled, but couldn't find the answer. Interestingly, Weka uses a similar procedure called "Discretize", and it works great, but sadly it doesn't come with the implementation Rapidminer has.
Thanks, and sorry for the poor english...
P.S: this is the XML code of the procedure i'm experimenting with
I'm doing some experimental tests on the capabilities of this software, which is apparently really great for datamining tasks, and I'm encountering a problem when using the "Discretize by Entropy" operator. Using the Iris Database, I apply the latter function and get that the two most powerful features, namely "Petal Width" and "Petal Length" (called "a3" and "a4" in the sample database that comes with Rapidminer), get erased by this operator as "useless atributes". This is nonsense (or I'm really missing something), since those attributes get selected by any method of attribute selection, or like i did, using "Decision Tree" operator, they are the only ones used on the resulting tree.
I looked all over the forum and googled, but couldn't find the answer. Interestingly, Weka uses a similar procedure called "Discretize", and it works great, but sadly it doesn't come with the implementation Rapidminer has.
Thanks, and sorry for the poor english...
P.S: this is the XML code of the procedure i'm experimenting with
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="386" width="614">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="179" y="120"/>
<operator activated="true" class="decision_tree" compatibility="5.2.008" expanded="true" height="76" name="Decision Tree" width="90" x="375" y="155"/>
<operator activated="true" class="discretize_by_entropy" compatibility="5.2.008" expanded="true" height="94" name="Discretize" width="90" x="380" y="30">
<parameter key="attributes" value="lapiz|peo|"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Discretize" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="result 3"/>
<connect from_op="Discretize" from_port="example set output" to_port="result 2"/>
<connect from_op="Discretize" from_port="original" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
0
Answers
I just started to use RapidMiner after several years of working with Weka. I am experiencing the same problem with the entropy-based disretization.
Since the entropy-based descretization of Irani and Fayyad is extremly helpful for learners such as NB or J48, it would be nice if this problem would be fixed or, at least, the Weka discretization would be included.
Cheers,
Sebastian
you are right, something seems to be wrong. I created an internal issue for this operator. Thanks for reporting!
Best regards,
Marius
Regards,
Mario