The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

n-Grams with a length of 3-6

KathiKathi Member Posts: 11 Contributor I
Hi everyone, 

I am currently using the n-Grams operator. If I set the length to 6, all n-Grams will be displayed with a word count of 1-6. I just want to see the n-Grams with a length of 3-6 words. Is that possible?


Best Answer

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted
    Hi @Kathi

    Nice challenge ! 

    Yes, it is possible. You have to duplicate your text processing using : 

    for the first : max length = 6
    for the second : max length = 2

    then Transpose the 2 resulting example sets

    then use a Set Minus operator to keep only the attribute with  3< max length <6

    finally (re)Transpose the final example set.

    The process is in attached file.




  • Options
    kaymankayman Member Posts: 662 Unicorn
    Use filter tokens by content, select match and use a regex like ^.*?_.*?_.*
    This will filter on all n-grams having at least 2 underscores, so matching 3 words or more
  • Options
    KathiKathi Member Posts: 11 Contributor I

    I first tried the set-minus-process, it's working and looks fantastic! Thank you both!


Sign In or Register to comment.