RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

How to get F_score in Naive Bayes sentiment analysis

HeikoeWin786HeikoeWin786 Member Posts: 49 Contributor II
edited July 15 in Help
Dear all,

I am getting an error when I connect the performance matrix (binomial) to the model.
I need to calculate F_score as my datasets is imbalance..
Will be truly appreciated if anyone of you faced this issue before or can suggest me the way out here.

thanks a lot in advance,
regards,
Heikoe

Best Answer

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,530   Unicorn
    This error is telling you that your label is polynominal (meaning it has many potential values) and not binominal (meaning it has exactly two values).  So you need to make sure you are using a compatible label for this performance operator.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    HeikoeWin786
  • jacobcybulskijacobcybulski Member, University Professor Posts: 235   Unicorn
    You can also use a normal classification performance and measure Kappa, which also better copes with imbalanced class distribution. However, any model trained on an imbalanced label classes may end up biased towards the majority class, so performance measurement may not fix your problems. Instead you could try balancing your classes before model training, e. g. using SMOTE operator, and then apply the resulting model to the test set which has the original class distribution (to get a realistic idea on the model performance). Also always check the whole confusion matrix rather than a single value performance measure. 
    HeikoeWin786
  • HeikoeWin786HeikoeWin786 Member Posts: 49 Contributor II
    @jacobcybulski

    Hello Jacob, thanks a lot for explanation. For this, if I understood correctly,
    1) Retrieve training dataset --> SMOTE --> Pre-processing the data (Process data to doc) --> NBC --> Store the model
    2) Retrieve training dataset --> Pre-processing the data (Process data to doc) -->apply the model (which we stored in step 1)

    Am I correct?

    thanks much,
    Heikoe
  • HeikoeWin786HeikoeWin786 Member Posts: 49 Contributor II
    @jacobcybulski

    Fully understood, Jacob. I will try as advised. Much appreciated for your time and help.

    Regards,
    Heikoe
Sign In or Register to comment.