How to retain real nominal values after nominal2numerical and processing?

ruser · August 2009

I use these operators for 'Training' with the i/p data set:
- ExampleSource
- Nominal2Numerical
- LinearRegression
- ModelWriter

and I use the following ones for 'Testing' with the model created in the previous execution:
- ExampleSource
- Nominal2Numerical
- ModelLoader
- ModelApplier

What happens in the output result is that the columns meant for the nominal values show the internal index numbers that were assigned for each of the nominal values.
For example, if if I have a column 'login id' with values 'A1', 'A2', 'B3' etc., the output is generated with '0', '1', '2',.. etc. Basically, I loose the real nominal values and I have to do the mapping manually to understand which output record is meant for which input record given in 'Testing' phase.

kochan · August 2009

Hi,

If I understand your problem correctly, then you can get arount the problem by saving a copy of the original nominal value before you use Nominal2Numerical. So your process might look like:

- ExampleSource
- AttributeCopy
- ChangeAttributeRole
- Nominal2Numerical
- ModelLoader
- ModelApplier

In AttributeCopy you can create a copy of the "login_id" and call it for example "login_id_copy" and in ChangeAttributeRole you can turn "login_id_copy" into for example an id attribute which will be unaffected by Nominal2Numerical.

Regards,

Andreas

fischer · August 2009

Umh, well. First, the aml files should actually define the mapping. Are you sure both mappings are defined equally?

More importantly: Do you think your process setup makes a lot of sense? A linear regression over login ids doesn't look particularly promising to me. Maybe try Nominal2Binominal.

I may have got your question wrong and RapidMiner is not confusing the internal indices between the two processes, and your point is only that you dislike the "output", i.e. the fact that your predictions are 0,1,2... rather than login ids. Well, that perfectly corresponds to the fact that linear regression on nominal data coming in disguise of numbers doesn't make much sense.

Best,
Simon

ruser · August 2009

Great, it works!

But now, I tried to remove the unwanted attribute ('login_id' which is shown with internal index values) from the output in the whole process.
I used
- AttributeFilter (attribute_name_filter, 'login_id', invert_filter=true)
- ChangeAttributeName (old_name='login_id_copy', new_name='login_id')

The 'AttributeFilter' operation works fine i.e. it removes the 'login_id' column. But, the 'ChangeAttributeName' fails with the exception: 'Cannot rename attribute. Duplicate name: login_id'. So, it looks like the attribuite which was removed in 'AttributeFilter' is still kept inside and that causes the error during the 'ChangeAttributeName' operation. How do we solve it?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How to retain real nominal values after nominal2numerical and processing?

Answers