Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Semi-Supervised classification
Hi all,
I would like to build a classifier which learns from only a set of positive examples and a set of unlabeled examples.
This classifier, when predicting on new instances, should also output a probability value for positive and negative class (as, for example, Logistic Regression).
Any suggestion about it? Is there a semi-supervised classifier in RapidMiner?
Thank you
I would like to build a classifier which learns from only a set of positive examples and a set of unlabeled examples.
This classifier, when predicting on new instances, should also output a probability value for positive and negative class (as, for example, Logistic Regression).
Any suggestion about it? Is there a semi-supervised classifier in RapidMiner?
Thank you
Tagged:
0
Answers
You could use classification via clustering.
For performance, you count how many positive examples are clustered around positive centroids.
Beware, it can be quite tricky to come up with a good performance measure.
Alternatively, you can generate a set of negatively labeled examples.
For example, simply set all your unlabeled examples to negative.
Then you can build any classifier in the standard way.
Hopefully you can use the model to filter back out some wrongly labeled examples.
For example, if you use boosting, after each round of boosting more 'difficult' examples will gain more weight.
So you can look at examples that gain a lot of weight, these are likely to be wrongly labeled.
Best regards,
Wessel