The Altair Community and the RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options
Dynamic Attribute Filter
![fstarsinic](https://us.v-cdn.net/6030995/uploads/defaultavatar/nCCNNSPK1YM69.jpg)
![](https://s3.amazonaws.com/rapidminer.community/vanilla-rank-images/contributor-16x16.png )
When testing I read data from a CSV. I'd like to limit the samples to several categories which is dynamically generated from a training set.
The training set might only have 20 categories but the test set could have 200. I only want to test on the 20.
The rest of the samples will be filtered out.
I read in the training set and extract the category list.
I remove duplicates to now have a unique list of categories.
This is what I want to filter my test set on.
I save the list to a file for later lookup if needed.
Now i'd like to read in the test data, filter on that list of categories and press on with testing.
How would I do such a thing?
Thanks.
The training set might only have 20 categories but the test set could have 200. I only want to test on the 20.
The rest of the samples will be filtered out.
I read in the training set and extract the category list.
I remove duplicates to now have a unique list of categories.
This is what I want to filter my test set on.
I save the list to a file for later lookup if needed.
Now i'd like to read in the test data, filter on that list of categories and press on with testing.
How would I do such a thing?
Thanks.
Tagged:
0
Best Answers
-
Options
fstarsinic Member Posts: 20
Contributor II
I realized I could solve this be taking the unique list of categories and performing an inner join (operator) with the test set using the category column as the key attribute. that removes all the unwanted samples. easy!2 -
Options
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525
RM Data Scientist
hi @fstarsinic ,this is a great solution and hopefully i would have also recommended this if I would have seen this earlier! Beatiful!Best,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5