input example set does not have a label attribute

filan · May 2017

Is there a way to get by without having label in test data?

Thomas_Ott · May 2017

Ah your process is not quite right. I would use the Cross Validation building block and encapusltate the Linear Regression/Apply Model/Perf measure in there. then pass the Test set. See this example:

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Training Set" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.001" expanded="true" height="145" name="Validation" width="90" x="313" y="34">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <process expanded="true">
          <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="82" name="Generalized Linear Model" width="90" x="45" y="34">
            <list key="beta_constraints"/>
            <list key="expert_parameters"/>
          </operator>
          <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/>
          <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="7.5.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
          <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
        </process>
        <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Testing Set" width="90" x="45" y="289">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="581" y="187">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Training Set" from_port="output" to_op="Validation" to_port="example set"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="performance 1" to_port="result 2"/>
      <connect from_op="Testing Set" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

This way you can an honest evaluation and performance measure of your model.

Good luck on your Kaggle, let us know how well you place. We'll give you some swag.

Thomas_Ott · May 2017

Hey, there are no dumb questions.

GLM is a better algorithm in some cases. You can swap it out with a LR if you like. In fact, you might want even try a SVM in some cases because of the "no free lunch" theorm.

Too output the performance results, just connect the PER port on the Cross Validation to the RES port.

filan · May 2017

I was able to run the process successfully but I have 2 questions.

First, why should we use GLM instead of the normal Linear Regression operator?

Second, is there a method to output my model's accuracy after running the processes?

My apologies if my questions are dumb, is my second time using RapidMiner :cathappy:

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

input example set does not have a label attribute

Best Answers

Answers