The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

language filter issue

huaiyanggongzihuaiyanggongzi Member Posts: 39 Contributor II
edited June 2019 in Help
I have a document that include both chinese and english. Can I filter all those english text and keep chinese text only? Or in the other direction, can I filter all those chinese text and keep english text only?

Answers

  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Just saw this one.  Yes you can, simply use a regular expression in your filter and search for \p{Han} this only selects Chinese characters. 

    To get the reverse just invert it. 

  • Options
    LaurenPlummerLaurenPlummer Member Posts: 4 Contributor I

    Btw, my team has released a RapidMiner extension to perform multilingual text analysis - the Rosette Text Toolkit. We have an "Identify Language" operator that returns the language of every cell in the input attribute (identifies 56 languages, including Chinese). The extension may help in analyzing multiple-language input - and most of our operators support Chinese.

     

    -Lauren

     

  • Options
    masirumimasirumi Member Posts: 2 Contributor I
    I have tried using Rosette's Identify Language operator but received "Couldn't retrieve data for input text" error message. Have tried with text produced from RapidMiner's Generate Nominal Example and other datasets but to no avail.

    Hope Rosette could fix this soon.

    Rumi
Sign In or Register to comment.