Cleanup of metadata in the process

mfulgerimfulgeri Member Posts: 3 Contributor I
edited December 2018 in Help

Hi guys, 

I'll try to be short and clear: after "dummyfying" a nominal attribute, I see that the system is generating columns even for the values that has been previously filtered: the image below is just to show that before I'm filtering out all the country except Italy, Germany and Spain,  and then I'm applying the "Nominal to Numberical" operator to dummyfy the Country column.

2017-12-08_17-09-25.jpgthe whole process

In the result, I'm still able to see the dummy columns generated for the coultry that are not in the data anymore. 

2017-12-08_16-57-08.jpgResult

Now: from my understanding, the system is keeping some kind of "metadata" (like all the possible values of Country, in this example) of the initial state of the data. Same this is happening when I'm opening the file statistics, so is clear that this is a desired (and helpful) behaviour.

2017-12-08_17-19-19.jpgstatistics

HERE THE QUESTION: is possible to cleanup those information during the execution of the process (so that after the dummy operation, I would have just 3 columns?)

A solution would be to write somewhere the data and load it back, but I was hoping for something more elegant.. Any suggestion?

 

Thank you in advance!

Matteo

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi,

     

    just use Remove Useless Values first.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.