Options

"Error using Adaboost"

inthewoodsinthewoods Member Posts: 9 Contributor II
edited May 2019 in Help
I get the following errors when I try and run a simulation with the Adaboost component:

Exception: com.rapidminer.example.AttributeTypeException
Message: Cannot map index of nominal attribute to nominal value: index 4 is out of bounds!
Stack trace:

  com.rapidminer.example.table.PolynominalMapping.mapIndex(PolynominalMapping.java:137)
  com.rapidminer.operator.learner.meta.AdaBoostModel.evaluateSpecialAttributes(AdaBoostModel.java:231)
  com.rapidminer.operator.learner.meta.AdaBoostModel.performPrediction(AdaBoostModel.java:166)
  com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
  com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.Process.run(Process.java:899)
  com.rapidminer.Process.run(Process.java:795)
  com.rapidminer.Process.run(Process.java:790)
  com.rapidminer.Process.run(Process.java:780)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)

Here's the setup:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
    <process expanded="true" height="341" width="605">
      <operator activated="true" class="retrieve" compatibility="5.0.11" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//NewLocalRepository/SPY_test_data"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.0.11" expanded="true" height="60" name="Retrieve (2)" width="90" x="99" y="164">
        <parameter key="repository_entry" value="//NewLocalRepository/SPY_apply_model"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing" width="90" x="179" y="30">
        <parameter key="horizon" value="1"/>
        <parameter key="window_size" value="1"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="ROC-1"/>
      </operator>
      <operator activated="true" class="adaboost" compatibility="5.0.11" expanded="true" height="76" name="AdaBoost" width="90" x="357" y="35">
        <process expanded="true" height="315" width="605">
          <operator activated="true" class="parallel:decision_tree_weight_based_parallel" compatibility="5.0.1" expanded="true" height="60" name="DecisionTree (Weight-Based)" width="90" x="243" y="58">
            <process expanded="true">
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_weights" spacing="0"/>
            </process>
          </operator>
          <connect from_port="training set" to_op="DecisionTree (Weight-Based)" to_port="training set"/>
          <connect from_op="DecisionTree (Weight-Based)" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.0.11" expanded="true" height="76" name="Apply Model (2)" width="90" x="380" y="210">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Windowing" from_port="example set output" to_op="AdaBoost" to_port="training set"/>
      <connect from_op="AdaBoost" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
I've submitted the bug, but I was wondering if anyone had any insight as to what I'm doing wrong.

Thanks!
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I guess you have different nominal values in your both data sets. I admit this shouldn't cause any problems, but if you first combine both datasets and after this split them into train and test set, this error won't happen.

    Greetings,
    Sebastian
  • Options
    inthewoodsinthewoods Member Posts: 9 Contributor II
    If you look at the way I've got it setup, I've got two different data sets.  So I'm feeding in a test data set, and the outputing a model and applying that model to a new dataset.  So I don't think what you've highlighted is the problem.  Other thoughts?
  • Options
    inthewoodsinthewoods Member Posts: 9 Contributor II
    Woops Sebastian - I misread what you wrote - in answer to your question, the two data sets have the same data - but I don't know what you mean by having different nominal values - I'm afraid my level of math isn't high enough to understand the definition!
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    actually nominal values don't have anything connected to math: Nominal values are non numerical values like words, etc. What can happen is:
    You have a train data set that contains examples about things of two different colors like "red" and "green". But what happens if the color "blue" is now mentioned in the test set? actually this value isn't know to any model, because it simply cant know that it exists. This is a general problem and all what the model could (and definitively should do) is to throw a better and more detailed error message.

    To avoid this problem: Append one data set to the other and split it again. Then the datasets know  which values exists in the combined data! Then the model will cope with this.

    Anyway I will search this problem causing the crash right now.

    Greetings,
      Sebastian
Sign In or Register to comment.