How to retain real nominal values after nominal2numerical and processing?

ruserruser Member Posts: 40 Maven
edited November 2018 in Help
I use these operators for 'Training' with the i/p data set:
- ExampleSource
- Nominal2Numerical
- LinearRegression
- ModelWriter

and I use the following ones for 'Testing' with the model created in the previous execution:
- ExampleSource
- Nominal2Numerical
- ModelLoader
- ModelApplier

What happens in the output result is that the columns meant for the nominal values show the internal index numbers that were assigned for each of the nominal values.
For example, if if I have a column 'login id' with values 'A1', 'A2', 'B3' etc., the output is generated with '0', '1', '2',.. etc. Basically, I loose the real nominal values and I have to do the mapping manually to understand which output record is meant for which input record given in 'Testing' phase.

Answers

  • kochankochan Member Posts: 11 Contributor II
    Hi,

    If I understand your problem correctly, then you can get arount the problem by saving a copy of the original nominal value before you use Nominal2Numerical. So your process might look like:

    - ExampleSource
    - AttributeCopy
    - ChangeAttributeRole
    - Nominal2Numerical
    - ModelLoader
    - ModelApplier

    In AttributeCopy you can create a copy of the "login_id" and call it for example "login_id_copy" and in ChangeAttributeRole you can turn "login_id_copy" into for example an id attribute which will be unaffected by Nominal2Numerical.

    Regards,

    Andreas
  • fischerfischer Member Posts: 439 Maven
    Umh, well. First, the aml files should actually define the mapping. Are you sure both mappings are defined equally?

    More importantly: Do you think your process setup makes a lot of sense? A linear regression over login ids doesn't look particularly promising to me. Maybe try Nominal2Binominal.

    I may have got your question wrong and RapidMiner is not confusing the internal indices between the two processes, and your point is only that you dislike the "output", i.e. the fact that your predictions are 0,1,2... rather than login ids. Well, that perfectly corresponds to the fact that linear regression on nominal data coming in disguise of numbers doesn't make much sense.

    Best,
    Simon
  • ruserruser Member Posts: 40 Maven
    Great, it works!

    But now, I tried to remove the unwanted attribute ('login_id' which is shown with internal index values) from the output in the whole process.
    I used
    - AttributeFilter (attribute_name_filter, 'login_id', invert_filter=true)
    - ChangeAttributeName (old_name='login_id_copy', new_name='login_id')

    The 'AttributeFilter' operation works fine i.e. it removes the 'login_id' column. But, the 'ChangeAttributeName' fails with the exception: 'Cannot rename attribute. Duplicate name: login_id'. So, it looks like the attribuite which was removed in 'AttributeFilter' is still kept inside and that causes the error during the 'ChangeAttributeName' operation. How do we solve it?
Sign In or Register to comment.