ID3 - Novice question

askuralaskural Member Posts: 5 Contributor II
edited November 2018 in Help
Hello,
I am very new to rapid miner and have just finished reading the RM tutorial (PDF).
I have a homework where an ID3 algortithm is to be applied to set of data made out of yeses and nos in an excel sheet.
I apply the ID3 algorithm and get a decision tree. It is Ok till here. But How do I get RapMin to give me coulmn of his own results preferably in an excel sheet.
I tried this: I named a coulumn as rapid and assigned 'prediction' attribute and used write excel operator to save the data in an xls format. But I just got the same  sheet that I originally imported.
Thank you in advance.
ASKural
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="6.3.000" expanded="true" height="60" name="Retrieve Denem_Roclu_ID3" width="90" x="45" y="30">
        <parameter key="repository_entry" value="../Denem_Roclu_ID3"/>
      </operator>
      <operator activated="true" class="id3" compatibility="6.3.000" expanded="true" height="76" name="ID3" width="90" x="179" y="30"/>
      <operator activated="true" class="write_excel" compatibility="6.3.000" expanded="true" height="76" name="Write Excel" width="90" x="313" y="30">
        <parameter key="excel_file" value="C:\Users\serhat\Copy\SAU_Bahar_2015\Veri Madenciligi\Ödev2\Lastik.xlsx"/>
        <parameter key="file_format" value="xls"/>
      </operator>
      <connect from_op="Retrieve Denem_Roclu_ID3" from_port="output" to_op="ID3" to_port="training set"/>
      <connect from_op="ID3" from_port="model" to_port="result 1"/>
      <connect from_op="ID3" from_port="exampleSet" to_op="Write Excel" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,482 RM Data Scientist
    Hello ASKurai,

    the decision tree model is just the set of rules. Like if A < 2 and B>3 then assign X.
    What you want is to apply the rules on given data. Therefore you need the apply operator.

    To assure that you never apply the model on the data you trained on (which would result in over fitting) you usually use a Cross Validation (X-Validation). If you need a predicted data set you can use X-Predicition. Below is an example process doing this.

    ~Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.4.000" expanded="true" height="60" name="Retrieve Denem_Roclu_ID3" width="90" x="45" y="30">
            <parameter key="repository_entry" value="../Denem_Roclu_ID3"/>
          </operator>
          <operator activated="true" class="x_prediction" compatibility="6.4.000" expanded="true" height="60" name="X-Prediction" width="90" x="179" y="30">
            <process expanded="true">
              <operator activated="true" class="id3" compatibility="6.4.000" expanded="true" height="76" name="ID3" width="90" x="112" y="30"/>
              <connect from_port="training" to_op="ID3" to_port="training set"/>
              <connect from_op="ID3" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="6.4.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="unlabelled data" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_port="labelled data"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_unlabelled data" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_labelled data" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_excel" compatibility="6.4.000" expanded="true" height="76" name="Write Excel" width="90" x="313" y="30">
            <parameter key="excel_file" value="C:\Users\serhat\Copy\SAU_Bahar_2015\Veri Madenciligi\Ödev2\Lastik.xlsx"/>
            <parameter key="file_format" value="xls"/>
          </operator>
          <connect from_op="Retrieve Denem_Roclu_ID3" from_port="output" to_op="X-Prediction" to_port="example set"/>
          <connect from_op="X-Prediction" from_port="labelled data" to_op="Write Excel" to_port="input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • askuralaskural Member Posts: 5 Contributor II
    Hello,
    I think I wrongly iterated my question. The problem is not that ID3 doesn't run. It runs but where do I get the gain values etc. and the column of results that ID3 algoritm produces itself. I don't want a tree based on my results column which ID3 forces to be labeled as 'label'. I am seeking a column of results of yeses and nos of the ID3 algorithm and the values like the gain values.
    Regards,
    ASKural :'(
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,482 RM Data Scientist
    Hi,

    i am kind of confused. The process i attached will produce a new coloum called prediction(labelname) and confidence values. This is then either "yes" or "no" if this is your label.

    Label = coloumn with the truth
    Prediction = coloumn with the prediction of your tree
    confidence = "likelihood" for each class

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • askuralaskural Member Posts: 5 Contributor II
    Hello,
    I copied and pasted the code as is on the main process area and this is what I got.
    What am doing wrong possibly?
    Regards,
    ASKural
    ???

    X-Prediction.example set (example set) Meta data: - expects: ExampleSet, expects: ExampleSet 2 error(s): Mandatory input missing at port X-Prediction.example set. Mandatory input missing at port X-Prediction.example set.

    Write Excel.input (input) Meta data: - expects: ExampleSet 1 error(s): Mandatory input missing at port Write Excel.input.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,482 RM Data Scientist
    Hi,

    your process uses a relative path in the repository. Thus you need to save it accordingly or change the path.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • askuralaskural Member Posts: 5 Contributor II
    Hello,

    It is like this at the moment and gives the following errors:

    X-Prediction.example set (example set) Meta data: - expects: ExampleSet, expects: ExampleSet

    Write Excel.input (input) Meta data: - expects: ExampleSet
    Best Regards,
    ASKural ::)
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.3.000" expanded="true" height="60" name="Retrieve deneme_RP_ID3" width="90" x="45" y="30">
            <parameter key="repository_entry" value="../deneme_RP_ID3"/>
          </operator>
          <operator activated="true" class="x_prediction" compatibility="6.3.000" expanded="true" height="60" name="X-Prediction" width="90" x="179" y="30">
            <process expanded="true">
              <operator activated="true" class="id3" compatibility="6.3.000" expanded="true" height="76" name="ID3" width="90" x="112" y="30"/>
              <connect from_port="training" to_op="ID3" to_port="training set"/>
              <connect from_op="ID3" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="6.3.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="unlabelled data" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_port="labelled data"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_unlabelled data" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_labelled data" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_excel" compatibility="6.3.000" expanded="true" height="76" name="Write Excel" width="90" x="313" y="30">
            <parameter key="excel_file" value="C:\Users\serhat\Copy\SAU_Bahar_2015\Veri Madenciligi\Ödev2\Lastik.xlsx"/>
            <parameter key="file_format" value="xls"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="6.3.000" expanded="true" height="60" name="Retrieve deneme_RP_ID3 (2)" width="90" x="45" y="30">
            <parameter key="repository_entry" value="../deneme_RP_ID3"/>
          </operator>
          <connect from_op="Retrieve deneme_RP_ID3" from_port="output" to_op="X-Prediction" to_port="example set"/>
          <connect from_op="X-Prediction" from_port="labelled data" to_op="Write Excel" to_port="input"/>
          <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • askuralaskural Member Posts: 5 Contributor II
    Hello,
    Martin!!!
    It works.
    Thank you so much.
    Regards,
    Aziz
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,482 RM Data Scientist
    You are welcome :)
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.