The Altair Community and the RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options
Keywords
One Followup Question,
right now we have two streams, the first one is reading the excel list with the texts and the second one is reading the one with the keywords. Then we used Process Documents with tokenize for both paths (1.Read excel 2.Nominal to text 3.Process Documents form Data 4.Data to Documents-Process Documents). We see all the words and in which row they occur, but is it possible to filter the results or change the setting so that we only see the keywords? This would be a lot more convenient because you wouldn't have to look through all the words.
right now we have two streams, the first one is reading the excel list with the texts and the second one is reading the one with the keywords. Then we used Process Documents with tokenize for both paths (1.Read excel 2.Nominal to text 3.Process Documents form Data 4.Data to Documents-Process Documents). We see all the words and in which row they occur, but is it possible to filter the results or change the setting so that we only see the keywords? This would be a lot more convenient because you wouldn't have to look through all the words.
0
Best Answer
-
Options
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364
RM Data Scientist
Hi @Tim91,
When you use the "process document from data" operator, have you tried to use a short wordlist for key words detection? Please check out the process under the community repository
//Community Samples/Community Data Science/Text Mining Tutorials by Neil McGuigan/Part 6 - Applying the Model to New Documents/2 - Applying the Model to New Documents
In this way, only the listed keywords from wordlist will be tokenized.
Cheers,
YY
7
Answers
Cheers
Tim