Testing a model

nicugeorgiannicugeorgian Member Posts: 31 Maven
edited November 2018 in Help
Hi,

I have a model obtained upon learning a training set. More exactly, it's the regression tree model I get as a result of applying the learner W-M5P.

I then apply this model to a test set which has the same structure as the training set, i.e., nominal attributes and a numerical (to-be-predicted) label.

When using the operator ModelApplier , I get the following warnings and errors:

Warning 1:  The number of nominal values is not the same for training and application for attribute 'x', training 150, application 100

Warning 2: The internal nominal mappings are not the same between training and application for attribute 'y'. This will probably lead to wrong results during model application.

Error: AttributeTypeException caught: Attribute 'z'. Cannot map index of nominal attribute to nominal value: index 81 is out of bounds!

Concerning Warning 1: I think it's nothing to worry about. The training set is anyway larger than the test set.

Concerning Warning 2: I don't really get the message here. Why do the two internal mappings differ?

Concerning Error: Could this show up because there are (nominal) values of the attribute 'z' in the test set that are not among the values of the same attribute 'z' in the training set? Why would this be so? There would anyway be a branch of the regression tree those values can be mapped to, right? After all, every tree node's underlying boolean condition can be evaluated regardless of whether some test values cannot be found among the training values.

Thanks,
Geo

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi Geo,
    nicugeorgian wrote:

    Concerning Error: Could this show up because there are (nominal) values of the attribute 'z' in the test set that are not among the values of the same attribute 'z' in the training set? Why would this be so? There would anyway be a branch of the regression tree those values can be mapped to, right? After all, every tree node's underlying boolean condition can be evaluated regardless of whether some test values cannot be found among the training values.
    This error presumably does indeed show up because of the reason you mention. But this should not result in an error as the values which are only in the test set simply do not occur in the attribute-value-tests in the tree. Hence, the behaviour you observed seems to be a bug. We will have a look onto that issue and keep you informed if we find anything irregular.

    Regards,
    Tobias
  • nicugeorgiannicugeorgian Member Posts: 31 Maven
    Tobias, thanks for the answer.

    Please let me know when you find something.

    Geo
  • martynsmartyns Member Posts: 15 Maven
    This is happening to me right now.

    My training set is over 1000 instances and the test set is 50 instances, so a warning that some values do not exist in the test set seems well and good, but the other warning "internal nominal mappings are not the same" I have no idea about.

    It seems that the results are coming out almost correctly but are swapped around for the class label. In my data set there are 2 class labels Successful and Unsuccessful. There are 37 Successful and 13 unsuccessful in the test set.

    When I add up the numbers on the confusion matrix there are 13 listed as being "true successful" and 37 as "true unsuccessful".

    If I break after loading the testexamplesource it looks correct.

    Was there any resolution to this problem?
  • steffensteffen Member Posts: 347 Maven
    Hello Martyns

    The problem is that rapidminer stores the nominal values via an intern mapping. This mapping is build the first time you load a dataset in a non-rapidminer-format.  I assume that your test- and trainingset are stored in different files. In this case it is possible that the intern mapping is differing. To workaround this problem you can...
    • Check the *.aml files to compare the sequence of the nominal values and (if necessary) adjust them.
    • Reload the data from the original source to keep training and test in the same set. Use a special self-constructed attribute to mark what test and what train is
    Please note that in case of binominalmapping the so called "positive class" has to be the second value in the sequence of nominal values (in the *.aml -file). This sequence has changed in your case, which explains the shifting of succesful and unsucessful.

    If your data is already in one set and you still get this error, please post the process setup.

    hope this was helpful

    regards,

    Steffen
  • BellaBella Member Posts: 4 Contributor I
    Hello,
    I decided to add here my problem as it is a bit similar to the one presented here (I think). I am using Rapid Miner 5.3, I am using decision tree as a model. I have already trained the data set on labeled data and saved the model. What I am trying to do is to run the model on unlabeled data. Unfortunately I got this error message for several attributes (the one occurring in my decision tree)

    Tree: The internal nominal mappings are not the same between training and application for attribute ...

    I have sen the discussion about changing the *. aml files but I do not know where to find those?What else could be done to fix the problem?

    Thanks a lot for the advice
    Regards
    Bella
  • memon_mehranmemon_mehran Member Posts: 2 Contributor I
    Sir kindly give me hint when i perform naive  based algorithm in numerical formate date  it gives error that naive base does not support numerical formate data how can i use this algorithm on numerical data
Sign In or Register to comment.