Filter Stopwords with Regular Expression
Hi guys,
I'm currently doing a sentiment analysis in Rapidminer with Knn. I want to count the number of words that are left in the document when removing stopwords. Using the "Filter stopwords" operator inside the "process documents from data operator" only works if I tokenize the data and use the "Nominal to Text" operator first. The issue here is that the output then is as in the image below. I want to be able to count the words that are left after removing the stopwords, so I wonder if there is maybe a regular expression which could be used inside a "Replace" operator or so, to only remove the stopwords without tokenizing it.
Cheers!
I'm currently doing a sentiment analysis in Rapidminer with Knn. I want to count the number of words that are left in the document when removing stopwords. Using the "Filter stopwords" operator inside the "process documents from data operator" only works if I tokenize the data and use the "Nominal to Text" operator first. The issue here is that the output then is as in the image below. I want to be able to count the words that are left after removing the stopwords, so I wonder if there is maybe a regular expression which could be used inside a "Replace" operator or so, to only remove the stopwords without tokenizing it.
Cheers!
Tagged:
0
Answers