Support Vector Machine Process for Text Mining

jdude35jdude35 Member Posts: 6 Contributor I
edited July 2020 in Help

I am trying to set up a process model that will classify a dataset of movie reviews into two classes, negative and positive using support vector machine. I created this model which uses SVM and I split the dataset into a "training" set with 700 text reviews (both positive and negative) and a "test" set with 300 reviews (positive and negative). Whenever I run the model I get this error about three quarters of the way through. I tried adding a "stopwords" dictionary to solve the error, but the model seemed to hate every other word it came across. Can someone help me shed some light on this? I have attached the saved model.

error.jpg

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You need to save your wordlist from your training data and then apply that when you process any new reviews later (using the Wordlist input in the Process Documents operator).  The error you are getting is because your wordlist is different in the new data you are trying to score, so attributes that are in the model you saved from the original wordlist are not present.  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    sgenzerSGolbert
Sign In or Register to comment.