Looking for help-Building Models for Text Classification/TopicModeling/Clustering

Doudr85Doudr85 Member Posts: 1 Learner I
edited November 2018 in Help

Hello all,

I'm just getting familiarized with topic modeling/text classification using clustering or supervised learning to build models. Is there a way to edit or manually create part of a model so that I can force a category or ensure that key words that may not have been in the tagged data are included in future runs of the model?


I haven't posted my current process because I don't know where to start with the model building. The test data I am working with is a list of past exam questions. I want to run them through a process that categorizes them based on topic. Is there a way to adjust the model after running a training set of data to ensure that specific key words rank high in the distribution table?







  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You can certainly create a wordlist from one dataset and then apply it on a future dataset with no problems.  That's why the Process Documents operators have a wordlist input object.  Or you can even create your own wordlist manually and use that if you like.  But if that wordlist isn't part of a dataset that you use to build a model, it won't be incorporated into any machine-learning based model that you create using typical techniques such as Naive Bayes, SVM, neural net, etc.

    So you would basically have to create a machine-learning model based on an actual dataset of words, and then you could supplement that with a set of rules or overrides manually, but it would be a multi-step process, not all combined in a single model.  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.