RapidMiner

RapidMiner

Logistic Regression

Contributor II

Re: Logistic Regression

I was able to spend the afternoon today playing around with different learners and model outputs, and I finally think I understand the resulting model from the W-logistic learner (mentioned above by ratheesan). Using Haddock's example, I was able to substitute W-logistic as the learner and get a resulting model that I was able to implement outside of RapidMiner and calculate similar probabilities/confidences/predictions as RapidMiner. Yay!!!

However, without having to dig into sourcecode, I would still like to be able to take the resulting models from the RapidMiner (not Weka) classification learners and be able to implement these models outside of RapidMiner. This includes Logistic Regression, SVM, etc.

I run an experiment, I obtain a model with weightings of attributes and an offset, I see the example set and the calculated confidence levels from the RapidMiner experiment, but I don't know how RapidMiner is coming up with these calculations. I believe B_Miner is running into the same issue.

Is there anyone out there who can help me understand how to use these SVM or Logistic Regression models once they are created by RapidMiner? What are the formulas that these weightings/offsets get plugged into? Are the formulas linear, quadratic, higher-order polynomial equations? Again, any guidance would be greatly appreciated.

Many thanks in advance,
David
Regular Contributor

Re: Logistic Regression

Hey ElPato,

Yep I am stumped by the non-Weka implementations of logistic regression. There is  (1) a bug or (2) this is some flavor of LR besides the ordinary one implemented in SAS, R, SPSS etc. I.e. the one of Hosmer/Lemeshow and Agresti.

I did not know there were issues with SVM as well? What are you setting this up as (can you post code) and what are you using to compare the results to?
Contributor II

Re: Logistic Regression

Hey B_Miner,

Thanks for the reply! Glad to see I am not the only one a bit confused. I know I am not an expert in data mining or machine learning algorithms, but I am trying to educate myself as much as possible. It just seems kinda important to understand exactly what the different algorithms are doing otherwise how can someone possibly understand how to interpret the results.

As far as the Logistic regression operators go, I ran the same set of data above with the W-SimpleLogistic operator and received the exact same results as the RapidMiner Logistic regression operator! They must be performing the same calculations. Now ... if only someone can explain what those calculations are, I would be extemely grateful  Smiley Happy.

As far as the SVM models go, let's say I take the same example as Haddock gave above, but substitute the LibSVM RapidMiner learner. Below is the XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Root">
    <process expanded="true" height="758" width="882">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="238" y="29">
        <parameter key="kernel_type" value="poly"/>
        <list key="class_weights"/>
      </operator>
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="210">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="210">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="SVM" to_port="training set"/>
      <connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>


I get the following results:

Kernel Model
Total number of Support Vectors: 159
Bias (offset): -1.191 
w[attribute_1] = 23749.738
w[attribute_2] = 31592.323
w[attribute_3] = 35680.074
w[attribute_4] = 46113.371
w[attribute_5] = 58430.884
w[attribute_6] = 74797.426
w[attribute_7] = 86353.872
w[attribute_8] = 95989.628
w[attribute_9] = 129648.901
w[attribute_10] = 152098.800
w[attribute_11] = 179324.874
w[attribute_12] = 191024.717
w[attribute_13] = 200005.157
w[attribute_14] = 207625.943
...
...
w[attribute_58] = 6238.179
w[attribute_59] = 6269.692
w[attribute_60] = 4968.341 
number of classes: 2
number of support vectors for class Rock: 78
number of support vectors for class Mine: 81


Using the polynomial kernel, how am I suppose to apply the weightings to the attributes? What about for some of the other kernels, like rbf or sigmoid? I understand the concept and math surrounding SVMs and the separating hyperplane, but I have no idea about how to apply these weightings or derive confidence/prediction values. Any assistance, again, would be greatly appreciated (even if it involves pointing me elsewhere on the web for education).

Thanks,
David
Elite II

Re: Logistic Regression

Hi David,
if you are going to understand what each learner does, I would recommend taking a look in "Elements of Statistical Learning" of Hastie and Tibshirani. It's a very statistical oriented book, but gives in detail insight to this methods and models.

Greetings,
  Sebastian
Old World Computing - Establishing the Future

Professional consulting for your Data Science problems

Regular Contributor

Re: Logistic Regression

Hi,

Should logistic regression in RM produce weights that match say SAS or SPSS ?
Elite II

Re: Logistic Regression

Hi,
probably there will be differences in the implementations and I doubt the weights will be the same. But they should either come near to the other weights or at least perform equally.

Greetings,
  Sebastian
Old World Computing - Establishing the Future

Professional consulting for your Data Science problems

Regular Contributor

Re: Logistic Regression

Its curious, the weights are not close for RM or WEKA logistic regression (RM was set to dot kernel and WEKA is the Simple Logistic) compared to SAS. They are not close to each other at all. The prediction probabilities for WEKA are close to SAS, RM is far different.

Its curious because logistic regression is used not only for prediction but for inference, from a strictly statistical position, were the exponentiated weights are odds ratios.

I have coefficient from SAS and small data file if interested.
RMStaff

Re: Logistic Regression

Hello,

it is actually not a big surprise that those differences occur. First, in contrast to most other implementations, the logistic regression learner from RapidMiner is basically a support vector machine with a different loss function. The author of this implementation told me once that the whole optimization approach is a bit different from that known from more traditional implementations. While this make some nifty things possible like the integration of kernel function, this might also lead to different results. At least, the predictions should rely a lot on some parameters as "C" and can hardly be directly compared.

The second difference seems to be the way how the confidences are calculated. Due to the kernel based optimization approach they are derived from the predictions based on the lagrange multipliers, the training examples and the kernel function. On those predictions a probability scaling somewhat similar (but much simpler) to Platt scaling is applied. As long as you read the confidences as what they are (as "confidence" instead of "probability") this is usual fine.

Cheers,
Ingo


How to load processes in XML from the forum into RapidMiner: Read this!
Regular Contributor

Re: Logistic Regression

Thanks Ingo! If I get a chance, I will test performance of this implementation against the traditional maximum likelihood logistic regression (SAS) and advise if I see anything interesting.

B
RMStaff

Re: Logistic Regression

Yes, please keep me updated if you get the chance. I could imagine that the real strength of the kernel logistic regression lies in cases where classification tasks are non-linear and an appropriate kernel function is used. The traditional logistic regression on the other hand might outperform in the linear case and is definitely better suited if real probabilities are necessary. But maybe I am completely wrong  Smiley Very Happy

Don't forget to optimize at least C since without it the kernel logistic regression is not likely to produce good results anyway...

Cheers,
Ingo

How to load processes in XML from the forum into RapidMiner: Read this!