Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Support Vector Machine Process for Text Mining

jdude35jdude35 Member Posts: 6 Contributor I
edited July 2020 in Help

I am trying to set up a process model that will classify a dataset of movie reviews into two classes, negative and positive using support vector machine. I created this model which uses SVM and I split the dataset into a "training" set with 700 text reviews (both positive and negative) and a "test" set with 300 reviews (positive and negative). Whenever I run the model I get this error about three quarters of the way through. I tried adding a "stopwords" dictionary to solve the error, but the model seemed to hate every other word it came across. Can someone help me shed some light on this? I have attached the saved model.

error.jpg

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You need to save your wordlist from your training data and then apply that when you process any new reviews later (using the Wordlist input in the Process Documents operator).  The error you are getting is because your wordlist is different in the new data you are trying to score, so attributes that are in the model you saved from the original wordlist are not present.  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.