Re: Evaluating numeric label

tektek Member Posts: 19 Contributor II
edited November 2018 in Help
Hey there,

thanks for the micro thread.

Here is the requested code:

This process gives me an error msg (I hope it does at well for you). Some remarks: Even so the "description" for X-Validation says "uses decision tree" it actually is using a neural net. I replaced the decision tree.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="393" width="748">
      <operator activated="true" class="read_excel" compatibility="5.1.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
        <parameter key="excel_file" value="c:\abx.xls"/>
        <parameter key="imported_cell_range" value="A1:AM280"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="ID.true.polynominal.id"/>
          <parameter key="1" value="attribute1.true.polynominal.attribute"/>
          <parameter key="2" value="attribute2.true.polynominal.attribute"/>
          <parameter key="3" value="attribute3.true.polynominal.attribute"/>
          <parameter key="4" value="attribute4.true.polynominal.attribute"/>
          <parameter key="5" value="attribute5.true.polynominal.attribute"/>
          <parameter key="6" value="attribute6.true.polynominal.attribute"/>
          <parameter key="7" value="attribute7.true.polynominal.attribute"/>
          <parameter key="8" value="attribute8.true.polynominal.attribute"/>
          <parameter key="9" value="attribute9.true.numeric.attribute"/>
          <parameter key="10" value="attribute10.true.numeric.attribute"/>
          <parameter key="11" value="attribute11.true.numeric.attribute"/>
          <parameter key="12" value="attribute12.true.numeric.attribute"/>
          <parameter key="13" value="attribute13.true.polynominal.attribute"/>
          <parameter key="14" value="attribute14.true.numeric.label"/>
        </list>
      </operator>
      <operator activated="true" class="filter_example_range" compatibility="5.1.006" expanded="true" height="76" name="Filter Example Range" width="90" x="179" y="30">
        <parameter key="first_example" value="1"/>
        <parameter key="last_example" value="4"/>
        <parameter key="invert_filter" value="true"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="5.1.006" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30"/>
      <operator activated="true" class="x_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="447" y="30">
        <description>A cross-validation evaluating a decision tree model.</description>
    %2

Answers

  • earmijoearmijo Member Posts: 270 Unicorn
    Hi Tek:

    You are confused. The operator X-validation does work with both numeric and categorical labels. Here's a quick example (it uses the default  measure of performance for numerical labels --rmse):
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="550" width="480">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="5.0.000" expanded="true" height="112" name="Validation" width="90" x="313" y="75">
            <description>Cross Validation with Neural Nets.</description>
            <process expanded="true" height="654" width="466">
              <operator activated="true" class="neural_net" compatibility="5.1.006" expanded="true" height="76" name="Neural Net" width="90" x="188" y="30">
                <list key="hidden_layers"/>
              </operator>
              <connect from_port="training" to_op="Neural Net" to_port="training set"/>
              <connect from_op="Neural Net" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="654" width="466">
              <operator activated="true" class="apply_model" compatibility="5.0.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_regression" compatibility="5.1.006" expanded="true" height="76" name="Performance" width="90" x="255" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • tektek Member Posts: 19 Contributor II
    Hi,

    indeed, confused I am. : )

    Thanks for the reply. I just recently started with RM and I am using it for my studies at university. Hope you understand if my question are kind of stupid.
    I pasted the code you replied into RM and actually it gave me an error message:

    Message: X-Validation cannot handle numerical label.
    Fixes:
    -Switch sampling to shuffled
    -Add discretization operator
    Location: Validation.Training

    Thats the problem I tried to describe above. : /

    If I use the fixe "switch sampling to shuffled" it actually works.Though I cannot understand why. How is the sampling method related to the data types?

    You mentioned the X-Validation uses the Root Mean Squared Error for evaluating numeric labels. What is the second Error given (the "mikro" one):

    root_mean_squared_error: 23.233 +/- 11.548 (mikro: 25.945 +/- 0.000)

    Thanks a lot for further replies!
  • earmijoearmijo Member Posts: 270 Unicorn
    Don't worry Tek. Your questions are not stupid. We all have been there.  (I wasn't trying to put you down or anything like that. I was just stating the fact that X-validation handles both numeric and categorical labels).

    About your reply: The error you are getting typically means that the Learning operator you are using does not accept numeric labels (for instance if you try to use a Decision Tree with a numerical label). Since the learning operator is inside the X-Validation operator, it sounds as if it is the latter the one complaining.  Please upload your process to take a look at it.

    For your question about mikro, I'll refer you to an entry by Ingo Mierswa:

    http://rapid-i.com/rapidforum/index.php/topic,3718.0.html
  • tektek Member Posts: 19 Contributor II
    Hey there,

    thanks for the micro thread.

    Here is the requested code:

    This process gives me an error msg (I hope it does at well for you). Some remarks: Even so the "description" for X-Validation says "uses decision tree" it actually is using a neural net. I replaced the decision tree.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="393" width="748">
          <operator activated="true" class="read_excel" compatibility="5.1.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="c:\abx.xls"/>
            <parameter key="imported_cell_range" value="A1:AM280"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="ID.true.polynominal.id"/>
              <parameter key="1" value="attribute1.true.polynominal.attribute"/>
              <parameter key="2" value="attribute2.true.polynominal.attribute"/>
              <parameter key="3" value="attribute3.true.polynominal.attribute"/>
              <parameter key="4" value="attribute4.true.polynominal.attribute"/>
              <parameter key="5" value="attribute5.true.polynominal.attribute"/>
              <parameter key="6" value="attribute6.true.polynominal.attribute"/>
              <parameter key="7" value="attribute7.true.polynominal.attribute"/>
              <parameter key="8" value="attribute8.true.polynominal.attribute"/>
              <parameter key="9" value="attribute9.true.numeric.attribute"/>
              <parameter key="10" value="attribute10.true.numeric.attribute"/>
              <parameter key="11" value="attribute11.true.numeric.attribute"/>
              <parameter key="12" value="attribute12.true.numeric.attribute"/>
              <parameter key="13" value="attribute13.true.polynominal.attribute"/>
              <parameter key="14" value="attribute14.true.numeric.label"/>
            </list>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.1.006" expanded="true" height="76" name="Filter Example Range" width="90" x="179" y="30">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="4"/>
            <parameter key="invert_filter" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.1.006" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30"/>
          <operator activated="true" class="x_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="447" y="30">
            <description>A cross-validation evaluating a decision tree model.</description>
            <process expanded="true" height="654" width="466">
              <operator activated="true" class="neural_net" compatibility="5.1.006" expanded="true" height="76" name="Neural Net" width="90" x="188" y="30">
                <list key="hidden_layers"/>
              </operator>
              <connect from_port="training" to_op="Neural Net" to_port="training set"/>
              <connect from_op="Neural Net" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="654" width="466">
              <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.1.006" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    This one actually is error free:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <parameter key="parallelize_main_process" value="true"/>
        <process expanded="true" height="393" width="701">
          <operator activated="true" class="read_excel" compatibility="5.1.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="c:\abx"/>
            <parameter key="imported_cell_range" value="A1:AM280"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="ID.true.polynominal.id"/>
              <parameter key="1" value="attribute1.true.polynominal.attribute"/>
              <parameter key="2" value="attribute2.true.polynominal.attribute"/>
              <parameter key="3" value="attribute3.true.polynominal.attribute"/>
              <parameter key="4" value="attribute4.true.polynominal.attribute"/>
              <parameter key="5" value="attribute5.true.polynominal.attribute"/>
              <parameter key="6" value="attribute6.true.polynominal.attribute"/>
              <parameter key="7" value="attribute7.true.polynominal.attribute"/>
              <parameter key="8" value="attribute8.true.polynominal.attribute"/>
              <parameter key="9" value="attribute9.true.polynominal.attribute"/>
              <parameter key="10" value="attribute10.true.polynominal.attribute"/>
              <parameter key="11" value="attribute11.true.polynominal.attribute"/>
              <parameter key="12" value="attribute12.true.polynominal.attribute"/>
              <parameter key="13" value="attribute13.true.polynominal.attribute"/>
              <parameter key="14" value="attribute14.true.polynominal.label"/>
            </list>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.1.006" expanded="true" height="76" name="Filter Example Range" width="90" x="112" y="30">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="4"/>
            <parameter key="invert_filter" value="true"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.1.006" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
            <parameter key="name" value="ID"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles">
              <parameter key="Reibung" value="label"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.1.006" expanded="true" height="76" name="Set Role (2)" width="90" x="246" y="30">
            <parameter key="name" value="Reibung"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles">
              <parameter key="Reibung" value="label"/>
            </list>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.1.006" expanded="true" height="94" name="Nominal to Numerical" width="90" x="380" y="30"/>
          <operator activated="true" class="x_validation" compatibility="5.1.006" expanded="true" height="112" name="Validation" width="90" x="514" y="30">
            <description>A cross-validation evaluating a decision tree model.</description>
            <parameter key="sampling_type" value="linear sampling"/>
            <parameter key="parallelize_training" value="true"/>
            <parameter key="parallelize_testing" value="true"/>
            <process expanded="true" height="393" width="165">
              <operator activated="true" class="neural_net" compatibility="5.1.006" expanded="true" height="76" name="Neural Net" width="90" x="45" y="30">
                <list key="hidden_layers"/>
              </operator>
              <connect from_port="training" to_op="Neural Net" to_port="training set"/>
              <connect from_op="Neural Net" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="393" width="300">
              <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.1.006" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Nominal to Numerical" from_port="original" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Maybe the explanation is the following: In the first process, the numeric label is defined in the "retrieve" operator. In the second process, the label is defined in "set role" and afterwards changed to numeric. Does that makes a difference?

    Please understand that I cannot post the excel sheet, since it includes internal data. : /

    Thanks for the replies!

    /edit: I just noticed that you used a different performance evalutor than I did.. Why did you use the "performance(regression)" operator?
  • earmijoearmijo Member Posts: 270 Unicorn
    Hi Tek:

    You can still use the Performance Operator which  for numerical labels will default to the RSME metric, but if you want control over the exact metric it uses then you have to use the one I used.

    I understand you cannot post the data. I tried to run your example with a similar dataset but I have no problem whatsoever.  I noticed a small detail too. When you read the Excel sheet you have to declare the first row as NAME (you do this in Step 3 out 4 in the Data Import Wizard Process).  Click on the first row (first column labeled Annotation) and select Name.  Try doing that.
  • tektek Member Posts: 19 Contributor II
    Hey,

    okay thanks a lot. But actually, the first row of the spreadsheet isnt the names. There are 3 to4 rows almost empty and a couple of attributes arnt named either. So I decided to rename then manually.

    In regard to the evaluator: thanks again, I am gonna try around some more. Maybe I can find the error. I am just glad it works now. : )

    Thanks!
Sign In or Register to comment.