Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Text mining in utf-8
Hello all,
I need to use RapidMiner for text mining in Cyrilic.
I tried setting the encoding to utf-8. It gives me some results which are displayed in characters instead of cyrilic words.
Thanks,
Tagged:
0
Best Answer
-
i_anicka Member Posts: 2 Contributor I
Hi guys,
I have solved my problem.
I had set the utf-8 encoding everywhere except on the process level.
I changed this and it works!
Thank you all for your replies.
Ana,
1
Answers
Hi,
could you maybe post an example?
~Martin
Dortmund, Germany
It could be that your original document isn't in UTF-8, but in another encoding.
One way to be absolutely sure is to create a loop which changes the encoding parameter in your process documents using macros and to look at all the resulting outputs. The one that looks 'right'.
agreed. Just did a quick check and there's no problem with Cyrillic in UTF-8.
Scott
I want to use Tamil language for text mining
Where you have change the UTF-8 option for this
I have tried in process level but unable to get
Plz anybody give the answer
for changing the unicode option to UTF-8 ( for processing tamil language)
I have changed in the Rapidminer studio preference - encoding to UTF-8
I have simply read the document using ReadDocument operator in Text mining extension
But it is not working, the screen shot is attached ( doc7.docx)
Kindly help me to sort out this problem
Tahnk you
Hello @arunasethupathy - so Tamil is not a language I have worked with before. Could you please post your XML process AND your text document (in Tamil) so I can take a look?
Thank you.
Scott
Sir,
Kindly find the attached for the sample tamil text document
thank you @arunasethupathy. Can you please also post your XML process?
Scott