Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

One-Hot Encoding Top 10 Items (Fractional) Rest Other

ZarrokZarrok Member Posts: 3 Learner I
Hello together,

i am searching for a smart solution for One-Hot Encoding to the Top 10 (Fractional) Items. 
Currently I solve the problem by creating a new attribute for the top 10 values. For example:
  For each Attribute I need to generate a new Column:
if((contains([Attri],"Example Data")) ,1,0) 

Does anybody have a smart solution for this kind of issue ?

Kind regards,
ZaRRoK
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist
    Hi,
    likely just use Remove Rare Values first and then One Hot Encoding?
    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ZarrokZarrok Member Posts: 3 Learner I
    edited July 2022
    I understand what you mean, problem is rather that I have a large dataset with about 4000 groups, of which I would like to look at the top 100, the others should be defined as "Other". I would have 101 columns.
    The top 100 groups account for about 70% of the total.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist
    Yeh, thats why I would propse to use the Remove Rare Values operator to replace all strings which are not in the top100 with "Other"?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ZarrokZarrok Member Posts: 3 Learner I
    I have found a solution, but it does not make me happy... I have created a aggregation(fractional) which I then join back to the table. Then I create a new attribute, which after the appropriate share either takes over the attribute or defines it as " Other ".

Sign In or Register to comment.