Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Filter examples with dictionary
Hi,
I'm looking for a solution to filter out examples of a data set, using a dictionary containing words. That would be a "filter example" operator working as the "replace (Dictionary)". That could allow to filter out all examples if a chosen attribute would contain words contained in the dictionary (or, with the "invert filter" option, keep only them).
Best,
I'm looking for a solution to filter out examples of a data set, using a dictionary containing words. That would be a "filter example" operator working as the "replace (Dictionary)". That could allow to filter out all examples if a chosen attribute would contain words contained in the dictionary (or, with the "invert filter" option, keep only them).
Best,
Tagged:
0
Answers
Best,
Dortmund, Germany
I explain: when you have a dictionary containing words that you're sure only one category of people are using, such a functionality could allow someone to isolate the entire rows (i.e the speakers), then filtering / splitting the orignal data set. that could be very relevant for the next analysis stuff. therefore, I'd like to have a functionality that could allow me, to match a set of words, via an excel file - the dictionary - in the attribute containing the text (verbatim). The "replace (Dictionary)" operator is the idea, but the output would be => filtering the entire row rather than replacing a string by another.
best,
Dortmund, Germany
you're definitely right, and I've done it. Thanks for the solution.
Remain few questions.
First, here's what I've done:
1- Level 0 => a Loop attribute operator to loop over the att of the dictionary:
2- LEVEL -1 => "Loop Values" operator within the "Loop Attributes" operator =>
3- LEVEL -2 => inside the loop-value operator:
The remaining questions for which I couldn't find solution by myself:
A- Inside loop_value: I've added a "Generate attribute" operator after the "Filter example" operator to keep in the output, for each row of the dataset, firstly the names of the dictionary categories and secondly the number of times the words of the dictionary were matched. The problem is that I failed to find a way to calculate the value of the generated attribute:
Thanks in advance for all help!
best,