Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
UTF8
Hi everyone
I have an example set which has one column text with UTF8 encoding (Title of news articles) and I want to cluster news articles based on their title. With which operator can convert UTF8 to plain text.
BTW, titles are in the German language.
I really appreciate if someone can help. Thank you
Tagged:
0
Answers
RM can handle different encoding types, you can set this in the Preference dialog. First thing though, in what format are these files? If they are text, then you can use Process Documents from File operator (Text Processing extension) or Open File. Do you plan on doing tokenization or some sort of entity extraction?
Hi,
The following 6-part video series is probably still one of the best resources about how to do those kind of analyses with RapidMiner:
http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html
Hope this helps,
Ingo