Polynominal sentiment analysis in SVM

HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor II
Dear all,

I am trying to perform SVM on the dataset where customer review as polynominal and sentiment score as bionominal. I had read the tutorials and figured out that SVM can only handle numerical and needed to convert nominal to numerical. However, is it to convert both customer reviews ans sentiment score to numerical? In which steps we need to convert? After processed the data? I am a bit confused of how sentiment analysis work in SVM in rapidminer. The RM tutorial under the sample templates is using text and binominal and not even converting to numerical.
Can anyone suggest me how to fix this issue correctly?
I had attached my process flow for your easy reference.

thanks.
Heikoe

Best Answer

Answers

  • HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor II
    Hello,

    Could anyone please help me with this understanding, please? :(

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited July 2020
    I am not sure at what point your model fails. This is just a binomial classification of reviews, in which your binomial label happens to represent sentiment. I cannot see any major problems with your model training and its cross-validation, and I suspect this is not where the process fails. However, the process will definitely fail in your honest testing (the lower leg of your process), as your pre-processing for training and cross-validation is different from pre-processing for model testing (you do not create an ID, not defining a label, and not converting the nominal to numerical - you also must apply exactly the same pre-processing model here). So the process will definitely fail, regardless if you use SVM or some another model, which I'd also recommend to try).
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    yes agree with @jacobcybulski of course :) I would also wonder why you are insistent that you use SVM. It's very possible that another model might give you much better performance. Also have you explored the samples in the Community repo?



    If you shared your Excel then we could run your process and see what's going on.Β 

    Scott
  • HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor II
    @sgenzer
    @jacobcybulski


    Hello both,

    Thanks for your kind input.
    Yes, I had changed the label to binominal and processed the data.
    I am really not sure how to pick the optimum model (I tried SVM and NBC so far).
    I see the sample sentiment analysis is using SVM in cross validation as well.
    Let me check the sample repository once more again and explore other models.
    And, yes for Sure, I can share with you the file.

    Much appreciated for all your input for real!!

    thanks and regards,
    Heikoe
  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    When you test different binomial models of course you can select the best by accuracy, kappa or auc. The issue is that models such as SVM are difficult to optimise. I suggest using Grid Optimizer, in which you can place the whole cross-validation or its holdout equivalent when you have lots of data (for the efficiency sake). Then you can vary your SVM parameters (which depends on the selected kernel, e. g. C and gamma) and when you execute the process you will be able to view the log of performance indicators to see what combination of SVM parameters results in the best performance. Once you find these optimal parameters go back to the process you have previously created and plug these values into the SVM.Β 
  • HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor II
    Hello @jacobcybulski

    Thanks much again here also.
    Does it mean, I need to place my cross validation process inside Grid Optimizer?
    Currently, inside cross validation is SVM process.
    So, now, i put corss validation inside grid and run the process, it will return the parameters which best fits. I take that parameter and apply that in the actual SVM process. AM i correct?

    thanks and regards,
    Heikoe
  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Let's say you logged 1000 results of your grid optimisation (make sure you pick the option to log all performance measurements, Kappa, accuracy and AUC). Of course you can then order it by accuracy (I'd avoid it if your label has a class imbalance), Kappa (pretty good) or AUC (especially if you are prepared to optimise your performance later in threshold) and you can pick the best performance and its best SVM parameters. However, I'd recommend plotting parameters vs performance (which is a challenge in its own right when you have multiple dimensions) and pick not necessarily the overall best but rather the best in the range of a stable combination of parameters (e. g. avoid the maximum Kappa surrounded by the cliffs of poor performance).
Sign In or Register to comment.