Choose elements from Column

Me_Again447Me_Again447 Member Posts: 4 Contributor I
edited December 2018 in Help

My problem is that i need to remove all rows from a datasheet which have in a specific column unique input.

 

for example .... Lets say there is an column that have results from 1 to 9 ... and those can exist for 0 to 100 times or more ... if the numbers 1 and 2 in the column exist only once I want to remove their rows. 

 

any ideas ?

 

thanks

 

Best Answer

  • FBTFBT Member Posts: 106 Unicorn
    Solution Accepted

    Ok, got it. It sounds like you could try to use the "Aggregate" operator with the aggregation function "Count" on your attributes, in order to get the values that should be filtered out (because they rarely show up). Then you could use those values as input in the "Filter Examples" operator, e.g. with a macro ("Extract Macro" operator). You would need to use the "Multiply" operator to get different threads of your data though.  It may become a bit labor-intensive, if you have a huge amount of attributes, but there would probably be a way to solve this kind of situation with a loop operator.

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Use the RegEx parameter in Select Attributes, write the RegEx, and then toggle on Invert Condition.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    AH I just read your post a bit more, you want to remove Rows based on a specific column value.

  • Me_Again447Me_Again447 Member Posts: 4 Contributor I

    I cant find the correct RegEx ... I can not understand the spelling for it :/

  • FBTFBT Member Posts: 106 Unicorn

    You could try generating a filter attribute with "Generate Attributes" and then filter out the rows that have the specific filter value with "Filter Examples". If you can post a small subset of your data, I'll have a look. 

  • Me_Again447Me_Again447 Member Posts: 4 Contributor I

    there are almost 150 different values and the 80% from those exist only once or twise. (ex of value: A,B,AA,CA,GT ect.)

    I need to remove them in order to have a clear sample result.

     

  • Me_Again447Me_Again447 Member Posts: 4 Contributor I

    Thanks a lot ... Aggregate and filter operator did the trick to get the results I needed

     

    Thank your help all:)

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Interestingly a similar problem and solution is taught in the official RM Radoop training.  
    I recommend going through as many RapidMiner training courses as you can because as well as a snazzy certificate there's quite a few practical tips on how to approach data mining problems like this. 

     

     

Sign In or Register to comment.