Options

"How to process tamil text in rapidminer?"

arunasethupathyarunasethupathy Member Posts: 4 Contributor I
edited June 2019 in Help

Hello everybody

I am in the process of mining "tamil" language text in Rapidminer.

Is there is a option to process tamil language in rapidminer? ( I have seen post related to Arabic, Cyliric etc..)

I have used "encodig - UTF-8" in the preference of Rapidminer, the .txt file I encoded in utf-8 for saving.

But I am unable to read the file using Read Document Operator in Textmining Extension.

Any other solution? Kindly suggest

Thank you

Answers

  • Options
    Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist

    Hi Aruna,

     

    To process Tamil words there is no specific operator similar to Filter stop words (German), Stem (German)etc,.

     

    But you could try using the Filter Stopwords (Dictionary), Stem(Dictionary) etc and provide the file containing Tamil (or any other language) words to accomplish your task here.

     

     

    Hope this helps.

     

    Cheers,

    Pavithra

  • Options
    Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist

    Hi Aruna,

     

    After checking the error screenshot you had attached in the following post; 

    https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Text-mining-in-utf-8/m-p/34525#M24221

     

    Figured out that for "read/write" Tamil text the encoding is TSCII and as of now the "Read Document" operator on RaoidMiner does not support this encoding format.

     

    But there is a workaround. You could try leveraging Python code described in the following blog post within "Execute Python" operator in RM Studio to do the conversions.

     

    https://ezhillang.blog/tag/open-tamil-text-processing/

     

    Hope this helps.

     

    Cheers,

Sign In or Register to comment.