Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Problem with stopwords(dictionary)
hi everyone,
i want to filter some txt files and remove some useless words.i use the process filterstopwords Dictionary(greek words).but the problem is that the words that i want to remove are there after the filtering.I use utf 8 for encoding and all the txt files are in utf 8. firstly, my txt files were in ANSI encode and the stopwords were removed but the wordlist contained incomprehensible words.Now the word list (with utf8) is correct but the stopwords are still there.sorry for my Engish.
Thanks!!
i want to filter some txt files and remove some useless words.i use the process filterstopwords Dictionary(greek words).but the problem is that the words that i want to remove are there after the filtering.I use utf 8 for encoding and all the txt files are in utf 8. firstly, my txt files were in ANSI encode and the stopwords were removed but the wordlist contained incomprehensible words.Now the word list (with utf8) is correct but the stopwords are still there.sorry for my Engish.
Thanks!!
Tagged:
0
Answers
A part solution is to tranform the portuguese letters into English.(with replace tokens)
for example the greek word συμφωνώ transformed into simfono.
With this the problem solved.
ut you have to do this again in the classification problems.If you want any further information just tell me
Regards