RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
Classification by Regression Operator vs. Polynomial By Binomial Classification
I am analyzing some given text mining processes which use the SVM (mySVM) classification. Because there are several possible classes a multi class approach is needed. I also want an multi class output in order to get for each class a confidence value, but only one label / prediction.From my point of view only binomial classification can be used regarding SVM. In order to enable the multi class feature these two operators can be used to wrap around the SVM operator: Classification by Regression Operator or Polynomial By Binomial Classification.
For each of them I have trained a model and get different kinds of confidence values.
For Classification by Regression Operator: confidence value element of (-∞, 1]. This seems to be the signed distance to the hyperplane. Is this correct? Why are there no values higher than 1? (1 would mean that it is on the edge of the margin. Might it depend on the kernel function?)
For Polynomial By Binomial Classification: confidence value element of [0,1]. Is this any kind of probability? Definition?
For my purpose I need confidence values which are quantitatively comparable. But from my point of view the signed distances of binomial classification models are not comparable? A probability would be very helpful. I have read that Platt-Scaling and Isotonic Regression are suitable methods to achieve this - unfortunately I have not understand these methods yet (can I apply them after the training only based on the confidence values?).
So my final question is, how do these two operators handle the training and the training data, and also what is the meaning/definition of the confidence values? Are there any references or official information to the definitions? The RapidMiner documentation does not give any hint regarding these issues.