Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"count syllables based on a predefined dictionary"

markus_dresselmarkus_dressel Member Posts: 5 Contributor I
edited June 2019 in Help

Hi community,

I want to count the syllables within in document. Therefore, I have a predefined dictionary (in excel) which contains 85000 words and its corresponding syllable values. 

Now I want to tokenize the document and and count the sylables within the document. As result I would to retrieve the number of syllables in the text. What operators do I need ?

I hope you can help me with that topic.

 

Best regards and thanks in advance,

 

Markus 

Best Answer

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    Sure, this is actually pretty straightforward.  First tokenize the document using "Process Document" and then output the wordlist using the "Wordlist to Data" operator, which will give you an exampleset of the wordlist with the counts.  Then you should be able to join your syllable count in using "Join" (you'll join on the words/tokens) and use "Generate Attributes" to compute the product of the word count and the syllable count per word, and then use "Aggregate" to get the sum of that product.  And you should have total syllables in the document!

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Thanks, @Telcontar120, for that nice solution.  @markus_dressel - would you be willing to share your excel "syllable" sheet with the community?  It may be a resource that others find useful.

     

    Scott

  • markus_dresselmarkus_dressel Member Posts: 5 Contributor I

    That was exactly the solution I was looking for. It worked perfectly. Thank you so much @Telcontar120

     

    @sgenzer I use the Business dictionary provided by Lougrhan&McDonald. 

    The list and a comprehensive explanation can be found here:

     

    Thank you so much for the help,

     

    Best regards,

     

    Markus 

  • TFJ95TFJ95 Member Posts: 1 Learner I
    Thanks for your help @Telcontar120
    I just started using RapidMiner yesterday, so I'm really new and a bit overwhelmed right now. Could you maybe explain to me with a screenshot what the solution looks like in the end?


Sign In or Register to comment.