The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
ID3 - Novice question
Hello,
I am very new to rapid miner and have just finished reading the RM tutorial (PDF).
I have a homework where an ID3 algortithm is to be applied to set of data made out of yeses and nos in an excel sheet.
I apply the ID3 algorithm and get a decision tree. It is Ok till here. But How do I get RapMin to give me coulmn of his own results preferably in an excel sheet.
I tried this: I named a coulumn as rapid and assigned 'prediction' attribute and used write excel operator to save the data in an xls format. But I just got the same sheet that I originally imported.
Thank you in advance.
ASKural
I am very new to rapid miner and have just finished reading the RM tutorial (PDF).
I have a homework where an ID3 algortithm is to be applied to set of data made out of yeses and nos in an excel sheet.
I apply the ID3 algorithm and get a decision tree. It is Ok till here. But How do I get RapMin to give me coulmn of his own results preferably in an excel sheet.
I tried this: I named a coulumn as rapid and assigned 'prediction' attribute and used write excel operator to save the data in an xls format. But I just got the same sheet that I originally imported.
Thank you in advance.
ASKural
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="6.3.000" expanded="true" height="60" name="Retrieve Denem_Roclu_ID3" width="90" x="45" y="30">
<parameter key="repository_entry" value="../Denem_Roclu_ID3"/>
</operator>
<operator activated="true" class="id3" compatibility="6.3.000" expanded="true" height="76" name="ID3" width="90" x="179" y="30"/>
<operator activated="true" class="write_excel" compatibility="6.3.000" expanded="true" height="76" name="Write Excel" width="90" x="313" y="30">
<parameter key="excel_file" value="C:\Users\serhat\Copy\SAU_Bahar_2015\Veri Madenciligi\Ödev2\Lastik.xlsx"/>
<parameter key="file_format" value="xls"/>
</operator>
<connect from_op="Retrieve Denem_Roclu_ID3" from_port="output" to_op="ID3" to_port="training set"/>
<connect from_op="ID3" from_port="model" to_port="result 1"/>
<connect from_op="ID3" from_port="exampleSet" to_op="Write Excel" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0
Answers
the decision tree model is just the set of rules. Like if A < 2 and B>3 then assign X.
What you want is to apply the rules on given data. Therefore you need the apply operator.
To assure that you never apply the model on the data you trained on (which would result in over fitting) you usually use a Cross Validation (X-Validation). If you need a predicted data set you can use X-Predicition. Below is an example process doing this.
~Martin
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="6.4.000" expanded="true" height="60" name="Retrieve Denem_Roclu_ID3" width="90" x="45" y="30">
<parameter key="repository_entry" value="../Denem_Roclu_ID3"/>
</operator>
<operator activated="true" class="x_prediction" compatibility="6.4.000" expanded="true" height="60" name="X-Prediction" width="90" x="179" y="30">
<process expanded="true">
<operator activated="true" class="id3" compatibility="6.4.000" expanded="true" height="76" name="ID3" width="90" x="112" y="30"/>
<connect from_port="training" to_op="ID3" to_port="training set"/>
<connect from_op="ID3" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="6.4.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="unlabelled data" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="labelled data"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_unlabelled data" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_labelled data" spacing="0"/>
</process>
</operator>
<operator activated="true" class="write_excel" compatibility="6.4.000" expanded="true" height="76" name="Write Excel" width="90" x="313" y="30">
<parameter key="excel_file" value="C:\Users\serhat\Copy\SAU_Bahar_2015\Veri Madenciligi\Ödev2\Lastik.xlsx"/>
<parameter key="file_format" value="xls"/>
</operator>
<connect from_op="Retrieve Denem_Roclu_ID3" from_port="output" to_op="X-Prediction" to_port="example set"/>
<connect from_op="X-Prediction" from_port="labelled data" to_op="Write Excel" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Dortmund, Germany
I think I wrongly iterated my question. The problem is not that ID3 doesn't run. It runs but where do I get the gain values etc. and the column of results that ID3 algoritm produces itself. I don't want a tree based on my results column which ID3 forces to be labeled as 'label'. I am seeking a column of results of yeses and nos of the ID3 algorithm and the values like the gain values.
Regards,
ASKural
i am kind of confused. The process i attached will produce a new coloum called prediction(labelname) and confidence values. This is then either "yes" or "no" if this is your label.
Label = coloumn with the truth
Prediction = coloumn with the prediction of your tree
confidence = "likelihood" for each class
Cheers,
Martin
Dortmund, Germany
I copied and pasted the code as is on the main process area and this is what I got.
What am doing wrong possibly?
Regards,
ASKural
???
X-Prediction.example set (example set) Meta data: - expects: ExampleSet, expects: ExampleSet 2 error(s): Mandatory input missing at port X-Prediction.example set. Mandatory input missing at port X-Prediction.example set.
Write Excel.input (input) Meta data: - expects: ExampleSet 1 error(s): Mandatory input missing at port Write Excel.input.
your process uses a relative path in the repository. Thus you need to save it accordingly or change the path.
Cheers,
Martin
Dortmund, Germany
It is like this at the moment and gives the following errors:
X-Prediction.example set (example set) Meta data: - expects: ExampleSet, expects: ExampleSet
Write Excel.input (input) Meta data: - expects: ExampleSet
Best Regards,
ASKural ::)
Martin!!!
It works.
Thank you so much.
Regards,
Aziz
Dortmund, Germany