Options
Text Processing Help! (Beginner at Rapidminer)
antioquia_jonas
Member Posts: 1 Contributor I
Im new to Rapidminer and I wanted to generate N-grams from my excel file that contains comments and replies from forum posts. My process design currently contains the following operators: Data, Process Documents (w/ Tokenize, Filter Stopwords English, Generate n-grams, Filter Tokens by Length), and Write Excel. I am not sure why my results are showing me all the possible combinations of words within the data instead of just showing me the combinations that occur twice or more. Maybe im missing an important detail. Really need urgent help! TIA!
(Images below depicting my current problem)
what i want it to look likewhat it actually looks like
Tagged:
0
Answers
I want to extract five words with the highest tf-idf in the output tf-idf matrix.
How should i do ???
Thanks
Hi @antioquia_jonas,
You can find here a process, which extract the token and the number of occurences of this token in an Excel file.
I don't know how to create the attribute "string" (where the token is repeated n times).
This process is to adapt to your own data :
I hope it helps,
Regards,
Lionel