Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Looking for help-Building Models for Text Classification/TopicModeling/Clustering

Doudr85Doudr85 Member Posts: 1 Learner I
edited November 2018 in Help

Hello all,

I'm just getting familiarized with topic modeling/text classification using clustering or supervised learning to build models. Is there a way to edit or manually create part of a model so that I can force a category or ensure that key words that may not have been in the tagged data are included in future runs of the model?

 

I haven't posted my current process because I don't know where to start with the model building. The test data I am working with is a list of past exam questions. I want to run them through a process that categorizes them based on topic. Is there a way to adjust the model after running a training set of data to ensure that specific key words rank high in the distribution table?

 

Thanks,

 

Ryan

Tagged:

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You can certainly create a wordlist from one dataset and then apply it on a future dataset with no problems.  That's why the Process Documents operators have a wordlist input object.  Or you can even create your own wordlist manually and use that if you like.  But if that wordlist isn't part of a dataset that you use to build a model, it won't be incorporated into any machine-learning based model that you create using typical techniques such as Naive Bayes, SVM, neural net, etc.

    So you would basically have to create a machine-learning model based on an actual dataset of words, and then you could supplement that with a set of rules or overrides manually, but it would be a multi-step process, not all combined in a single model.  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.