Should One-Class SVM be trained on Positive or Negative examples?

mohammadreza · May 2015

Hi all,

I have a data set containing roughly 1000 examples with 900 negative examples and 100 positive examples. I want to apply one-class SVM and train the model using just one class label. Is there any idea which help me find out whether I should train the model on negative examples or on the positive ones?

Cheers,

JEdward · May 2015

Personally it seems to be a two class problem that you have so I'd choose a different model learner.

However, as you are wanting to use one class I would advise training for the class that has the highest ROI for your problem.
So, if getting matches to the positive class is worth more to you than negative (for example in a direct marketing problem) then train on the positive class.
If getting matches to the negative class is worth more to you (for example on insurance fraud) then train on the negative class.

Given that you have far more positive than negative in your data though, I would imagine that this is where you'll find the most success as you are not looking at the differences between the positive and the negative classes so training on the negative class might simply return you results that should be positive.

You could always learn a one-class SVM on both and compare/combine the models.

MartinLiebig · May 2015

Hi,

could you eloberate on the reason why you do not use a usual SVM or any other supervised learner? Going unsupervised/semisupervised is always tricky, because it is hard to define performance values. You def. need them to tune the paramaters of your one-class SVM.

Cheers,
Martin

mohammadreza · May 2015

Especial thanks to Edward for his illuminating explanation.
Martin, the reason that I am considering the semi supervised approach such as one-class is that my actual negative examples are impossible to gather since they are too many negative examples. I gathered 1000 so far, but if I want I can gather even 100000000 negative examples. On the other hand the number of positive samples are so rare in compare to negatives. Please let me know if you think think this is a good justification for using semi-supervised methods?

MartinLiebig · May 2015

Hi,

I personally would rather go use 1000 examples per class and look if it works that way. You simply loose to much predictive power if you go unsupervised.

Cheers,
Martin

mohammadreza · May 2015

Thank you all. Nice discussion.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Should One-Class SVM be trained on Positive or Negative examples?

Answers