RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Logistic Regression(Radial kernel) weights -> Implementation question

convergenceconvergence Member Posts: 2 Contributor I
edited November 2018 in Help
I've recently started using Rapidminer and am learning as I go forward.

For the binary classification problem that I've been working on, I've chosen Logistic Regression with Radial Kernel, which after some tuning has given me some really good result(AUC: 0.9xx)

Now, I'd like to be able to implement this model in an application using a programming language (without involving Rapidminer) and make use of it in production.

While the documentation mentions that the radial kernel "is defined by exp(-g ||x-y||^2) where g is the gamma", it is not as clear as to how the weights generated by Rapidminer correspond to the equation.

I'd like to be able to use the model results (weights provided by RapidMiner), in the corresponding Logit+Kernel equation that gives me a probability from 0-1 (as the standard logistic regression provides)

I'm trying to figure this out on my own, but am really stuck here. Since I've already spent enough time to tune the model, I was hoping that making use of the model results would have been straightforward.

Can anybody provide any sort of pointers (or pseudocode if possible) on how the weights resulted by the Logistic Regression(Kernel: Radial) can be implemented outside of Rapidminer (doesn't have to code, just an equation would be fine)

Answers

  • earmijoearmijo Member Posts: 263   Unicorn
    Check the operator "Create Formula". There is an example associated with the operator. They show you precisely what you want.

    \E
  • convergenceconvergence Member Posts: 2 Contributor I
    Thanks earmijo !

    For Logistic Regression it seems to output a really long equation which seems to have an entry per training row all of which are then summed up. This increases proportionally to size of the training data.

    Is this the intended result ? because, it seems very expensive to implement and run.
  • earmijoearmijo Member Posts: 263   Unicorn
    Logistic regression in RapdiMiner is closer to Support Vector Machines to the classic  Logistic regression of statistics books. If you want that one, use the extension Weka. That one is a breeze to code.
  • earmijoearmijo Member Posts: 263   Unicorn
    Convergence:
    convergence wrote:

    For Logistic Regression, it seems to output a really long equation which seems to have an entry per training row all of which are then summed up. This increases proportionally to size of the training data.

    Is this the intended result ? because, it seems very expensive to implement and run.
    Yes, that is the intended result. For KLR,

    f(x) = bias + Sum_over_i ( alpha_i * K( x_i, x )

    and typically all the alpha_i s  are different from 0 (unlike SVM where there is some data reduction). So the formula involves all points. This is a disadvantage of KLR compared to SVM. Hastie & Zhu came up with a version of KLR that achieves some compression (they called it Import Vector Machine). This one is not available in RM.

    But... the formula obviously can simplify a lot (at least when you are using the dot kernel), but you'll have to do that on your own.

    [ Notice too that Rapidminer normalizes (scales) before running KLR ]
Sign In or Register to comment.