The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

UTF8

MOKAMOKA Member Posts: 1 Contributor I
edited November 2018 in Help

Hi everyone

I have an example set which has one column text with UTF8 encoding (Title of news articles) and I want to cluster news articles based on their title. With which operator can convert UTF8 to plain text.
BTW, titles are in the German language.

I really appreciate if someone can help. Thank you

Tagged:

Answers

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    RM can handle different encoding types, you can set this in the Preference dialog.  First thing though, in what format are these files? If they are text, then you can use Process Documents from File operator (Text Processing extension) or Open File. Do you plan on doing tokenization or some sort of entity extraction?

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    The following 6-part video series is probably still one of the best resources about how to do those kind of analyses with RapidMiner:

     

    http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html

     

    Hope this helps,

    Ingo

Sign In or Register to comment.