NominalToNumerical inconsistency with different sources

pablo_admig · March 2011

The situation could be replicated with the Template "Apply to Test Set" having, i.e., one nominal column, and changing the kNN model for a Neural Network.

So, in order to use the Neural Network (or any alghoritm that does not support nominal attributes), I have to convert that attribute to a numerical one with the NominalToNumerical operator, and RapidMiner does a "mapping" of each category. For example, the operator reads category "Sunny" in that column and assigns the number 1, reads the category "cloudy" and assigns the number 2, and so on.

The problem comes when this mapping or conversion is not the same in Training and Test set, because I need two NominalToNumerical operators, (Training and Test set), and they are not related, so each one will convert the category into numbers following the natural order of each table. For example, if the first record of the training set has "Sunny", it will convert into 1. And if the first record of the Test set has "Cloudy", it will convert into 1 as well ! So for the neural network Cloudy=Sunny, turning this into a serious problem.

I want to know if it has a solution into the RapidMiner enviorment.

Thanks in advance,
Pablo.

IngoRM · March 2011

Hi Pablo,

yes, there is a solution: you don't have to worry about this as far as I know

The neural net model, as all other models, keeps the header information of the input example set used for training. This information also contains the information about the used mapping, i.e. the fact that "Sunny" was assigned to "1" and so on. During model application, the incoming values of the test set like "1" will first be translated to "Cloudy" (since this was the transformation used in the test set) and "Cloudy" will then be transformed again based on the training header information to "2" before the model actually is applied. So there is actually no serious problem - at least as long as no bug is preventing this automatic nominal mapping as it used to has a couple of years ago

If you want to transform the values yourself in order to make it absolutely sure without having to rely on the automatic mechanism described above, you could of course first use the operator "Map" to map the nominal values to "nominal" numbers and afterwards use "Parse Numbers" in order to transform them to real numbers. But I would actually not bother with this.

Cheers,
Ingo

pablo_admig · March 2011

Ingo, thanks for the reply.
I test in detail that with a simple example. And it's right, the prediction is the same. However, if I see the outputs of the conversions, in Training and Test set (with the label from the model), I could see the "inconsistency". That is, if I see the numbers instead of categorical values and their associated label, the label calculation is consistency, columns input (transformed to numerical) in the table with the label, are not.
Is it clear?

Regards,
Pablo.

IngoRM · March 2011

Hi Pablo,

yes, I see. But be assured: Those "inconsitencies" only exist as long as the model is not applied since this would make sure that the inconsistency is resolved. So sometimes it's easier to not look into too much details

Cheers,
Ingo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

NominalToNumerical inconsistency with different sources

Answers