Logistic Regression

ratheesan · March 2010

Hi,

I have 2 independent variables namely Score1 and Score2 and my dependent class variable is state which contain two values yes and No.My objective is to get
log(odd ratio) or logit =a+b*score1+c*score2. I applied logistic regression on these data and the result is like.

Bias(offset)=1.2
W[score1]=17.32
W[Score2]=18.33

Please anybody help me to interpret the results.

By
Ratheesan

haddock · March 2010

Hi there,

There is a Wiki page on logistic regresssion, http://en.wikipedia.org/wiki/Logistic_regression, from which it looks like...

Intercept = β0 = a = Bias(offset) = 1.2
Coefficient = β1 = b = W[score1] = 17.32
Coefficient = β2 = c = W[score2] = 18.33

???

ratheesan · March 2010

Hello Haddock,

Thanks for your valuable help.

Thanks
Ratheesan

ratheesan · March 2010

Hello,
I have one more doubt,ie,Is the model corresponding to probability of YES or probability of NO.

Thanks
Ratheesan

haddock · March 2010

Greetings,

The regression enables you to work out the probabilities for each of the label values, and to select the most probable, as you can see from the following...

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Root">
    <description>Using a logistic regression learner for a classification task of numerical data.</description>
    <process expanded="true" height="584" width="962">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="../../data/Sonar"/>
      </operator>
      <operator activated="true" class="logistic_regression" expanded="true" height="94" name="MyKLRLearner" width="90" x="179" y="30">
        <parameter key="calculate_weights" value="false"/>
        <parameter key="return_optimization_performance" value="false"/>
      </operator>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="431" y="28">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="MyKLRLearner" to_port="training set"/>
      <connect from_op="MyKLRLearner" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="MyKLRLearner" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

ElPato · March 2010

I think my earlier post might be very similar to what ratheesan is wanting to know.

For a logistic regression problem, Rapid Miner calculates the intercept and the coefficients: β0,β1, and β2 in his example.

Then, according to http://en.wikipedia.org/wiki/Logistic_regression, we use these values to obtain a "z" value. For example,
z = β0 + β1*<attribute1 value> + β2*<attribute2 value>.

This "z" value is then evaluated in the equation 1/(1+e^-z) in order to obtain the probabiliity that this particular instance will evaluate to "TRUE". Is this correct?

I am interested in extending a Rapid Miner model to evaluate real-time data and calculate confidence probabilities outside of RapidMiner and want to know if I am on the right track as to how to accomplish this. Unfortunately, when I follow the process outlined above, I am not obtaining the same confidence probabilities for data entries that are being calculated in RapidMiner.

Again, thanks for all the help!

David

haddock · March 2010

Hi David,

This "z" value is then evaluated in the equation 1/(1+e-z) in order to obtain the probabiliity that this particular instance will evaluate to "TRUE". Is this correct?

In a word, no. If you run the code I posted you will see that z is calculated for each possible label value, i.e the probability that this instance is a rock, and the probability that this instance is a mine.

ElPato · March 2010

Thanks for the reply! I am still confused, but I have high hopes I will understand this very soon.

Let me tell you where I am at ...
In your example you provided, I run it and view the Example Set in Data View. Here I can see each row, the class, the confidence(Rock), the confidence(Mine), the prediction(class), as well as each attribute value for each row.

Explicitly, how can I replicate the confidence values calculated for each row of data (is this the possible label value which you are speaking of)?

I have tried to replicate these values using the instructions discussed in the Logistic Regression wiki, but I do not get the same answers. Again, I would like to extend what I am discovering in RapidMiner to real-time confidence calculations of data.

Thanks again for all the education!

David

B_Miner · March 2010

Is the confusion around the fitted parameters? Depending on if you are modeling 1 or 0, the signs of the coefficients flip....at least in other implementations I am familiar with (SAS).

I tried to look and see what RM was doing...but then I got worried.

I took the sonar data Haddock and used just used the first attribute as a predictor (see below)
The first example has attribute 1 = 0.020 so with the model returned here is what I would have expected:


Bias	0.184
weight_att 1	0.69

att1	0.02
linear predictor (mu)	0.1978
1/(1+exp(-mu))	0.549289401

But the two confidences RM gives are 0.6438660616438266 and 0.3561339383561734
Any ideas?

Here is the RM xml:




<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Root">
    <description>Using a logistic regression learner for a classification task of numerical data.</description>
    <process expanded="true" height="298" width="614">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="work_on_subset" expanded="true" height="94" name="Work on Subset" width="90" x="112" y="120">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="attribute_1|class"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="keep_subset_only" value="true"/>
        <process expanded="true" height="298" width="614">
          <connect from_port="exampleSet" to_port="example set"/>
          <portSpacing port="source_exampleSet" spacing="0"/>
          <portSpacing port="sink_example set" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <portSpacing port="sink_through 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="logistic_regression" expanded="true" height="94" name="Logistic Regression" width="90" x="246" y="75"/>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="165">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Work on Subset" to_port="example set"/>
      <connect from_op="Work on Subset" from_port="example set" to_op="Logistic Regression" to_port="training set"/>
      <connect from_op="Work on Subset" from_port="through 1" to_port="result 4"/>
      <connect from_op="Logistic Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Logistic Regression" from_port="weights" to_port="result 3"/>
      <connect from_op="Logistic Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

ratheesan · April 2010

Hi,
I applied W-logistic instead of logistic regression on the above mentioned algorithm.There it clearly pointed that the model is corresponding to Class=ROCK.
So I think the result provided by Logistic Regression is also for class=ROCK. Is it right??

Thanks
Ratheesan

ElPato · April 2010

I was able to spend the afternoon today playing around with different learners and model outputs, and I finally think I understand the resulting model from the W-logistic learner (mentioned above by ratheesan). Using Haddock's example, I was able to substitute W-logistic as the learner and get a resulting model that I was able to implement outside of RapidMiner and calculate similar probabilities/confidences/predictions as RapidMiner. Yay!!!

However, without having to dig into sourcecode, I would still like to be able to take the resulting models from the RapidMiner (not Weka) classification learners and be able to implement these models outside of RapidMiner. This includes Logistic Regression, SVM, etc.

I run an experiment, I obtain a model with weightings of attributes and an offset, I see the example set and the calculated confidence levels from the RapidMiner experiment, but I don't know how RapidMiner is coming up with these calculations. I believe B_Miner is running into the same issue.

Is there anyone out there who can help me understand how to use these SVM or Logistic Regression models once they are created by RapidMiner? What are the formulas that these weightings/offsets get plugged into? Are the formulas linear, quadratic, higher-order polynomial equations? Again, any guidance would be greatly appreciated.

Many thanks in advance,
David

B_Miner · April 2010

Hey ElPato,

Yep I am stumped by the non-Weka implementations of logistic regression. There is (1) a bug or (2) this is some flavor of LR besides the ordinary one implemented in SAS, R, SPSS etc. I.e. the one of Hosmer/Lemeshow and Agresti.

I did not know there were issues with SVM as well? What are you setting this up as (can you post code) and what are you using to compare the results to?

ElPato · April 2010

Hey B_Miner,

Thanks for the reply! Glad to see I am not the only one a bit confused. I know I am not an expert in data mining or machine learning algorithms, but I am trying to educate myself as much as possible. It just seems kinda important to understand exactly what the different algorithms are doing otherwise how can someone possibly understand how to interpret the results.

As far as the Logistic regression operators go, I ran the same set of data above with the W-SimpleLogistic operator and received the exact same results as the RapidMiner Logistic regression operator! They must be performing the same calculations. Now ... if only someone can explain what those calculations are, I would be extemely grateful

.

As far as the SVM models go, let's say I take the same example as Haddock gave above, but substitute the LibSVM RapidMiner learner. Below is the XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Root">
    <process expanded="true" height="758" width="882">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="238" y="29">
        <parameter key="kernel_type" value="poly"/>
        <list key="class_weights"/>
      </operator>
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="210">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="210">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="SVM" to_port="training set"/>
      <connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

I get the following results:


Kernel Model
Total number of Support Vectors: 159 
Bias (offset): -1.191  
w[attribute_1] = 23749.738 
w[attribute_2] = 31592.323 
w[attribute_3] = 35680.074 
w[attribute_4] = 46113.371 
w[attribute_5] = 58430.884 
w[attribute_6] = 74797.426 
w[attribute_7] = 86353.872 
w[attribute_8] = 95989.628 
w[attribute_9] = 129648.901 
w[attribute_10] = 152098.800 
w[attribute_11] = 179324.874 
w[attribute_12] = 191024.717 
w[attribute_13] = 200005.157 
w[attribute_14] = 207625.943 
...
...
w[attribute_58] = 6238.179 
w[attribute_59] = 6269.692 
w[attribute_60] = 4968.341  
number of classes: 2 
number of support vectors for class Rock: 78 
number of support vectors for class Mine: 81

Using the polynomial kernel, how am I suppose to apply the weightings to the attributes? What about for some of the other kernels, like rbf or sigmoid? I understand the concept and math surrounding SVMs and the separating hyperplane, but I have no idea about how to apply these weightings or derive confidence/prediction values. Any assistance, again, would be greatly appreciated (even if it involves pointing me elsewhere on the web for education).

Thanks,
David

land · April 2010

Hi David,
if you are going to understand what each learner does, I would recommend taking a look in "Elements of Statistical Learning" of Hastie and Tibshirani. It's a very statistical oriented book, but gives in detail insight to this methods and models.

Greetings,
Sebastian

B_Miner · April 2010

Hi,

Should logistic regression in RM produce weights that match say SAS or SPSS ?

land · April 2010

Hi,
probably there will be differences in the implementations and I doubt the weights will be the same. But they should either come near to the other weights or at least perform equally.

Greetings,
Sebastian

B_Miner · April 2010

Its curious, the weights are not close for RM or WEKA logistic regression (RM was set to dot kernel and WEKA is the Simple Logistic) compared to SAS. They are not close to each other at all. The prediction probabilities for WEKA are close to SAS, RM is far different.

Its curious because logistic regression is used not only for prediction but for inference, from a strictly statistical position, were the exponentiated weights are odds ratios.

I have coefficient from SAS and small data file if interested.

IngoRM · April 2010

Hello,

it is actually not a big surprise that those differences occur. First, in contrast to most other implementations, the logistic regression learner from RapidMiner is basically a support vector machine with a different loss function. The author of this implementation told me once that the whole optimization approach is a bit different from that known from more traditional implementations. While this make some nifty things possible like the integration of kernel function, this might also lead to different results. At least, the predictions should rely a lot on some parameters as "C" and can hardly be directly compared.

The second difference seems to be the way how the confidences are calculated. Due to the kernel based optimization approach they are derived from the predictions based on the lagrange multipliers, the training examples and the kernel function. On those predictions a probability scaling somewhat similar (but much simpler) to Platt scaling is applied. As long as you read the confidences as what they are (as "confidence" instead of "probability") this is usual fine.

Cheers,
Ingo

B_Miner · April 2010

Thanks Ingo! If I get a chance, I will test performance of this implementation against the traditional maximum likelihood logistic regression (SAS) and advise if I see anything interesting.

B

IngoRM · April 2010

Yes, please keep me updated if you get the chance. I could imagine that the real strength of the kernel logistic regression lies in cases where classification tasks are non-linear and an appropriate kernel function is used. The traditional logistic regression on the other hand might outperform in the linear case and is definitely better suited if real probabilities are necessary. But maybe I am completely wrong

Don't forget to optimize at least C since without it the kernel logistic regression is not likely to produce good results anyway...

Cheers,
Ingo

psantipov · November 2016

Hello, dear friends!
I see that it is very old theme and that Ingo wrotea lot of interesting information, but I should ask. For RM logistic regression model is there any method to use it, for example, in Excel (via logistic function formula with weights defined in RM or smth like this)? Maybe I should use some specific logistic regression operator for that goals?
I am not quite good in data science and English but I hope that somebody Who knows the answer will read it soon.
Thank you for reading!
Pavel.

MartinLiebig · November 2016

Dear Pavel,

welcome to the community!

It is technically possible to take the equation from RM and put it into RM. But the big question is - why do you want this?

A Model is always the connection of the preprocessing and the machine learning part itself. If you use a machine learning model on a table which is not prepeared in the very same way it will work but create wrong or unreasonable results.

Why dont you create a rm porcess: Read Excel -> Prepare Data -> Apply Model -> Write Excel

Best,

Martin

psantipov · November 2016

Dear Martin,

thanks a lot for the response!

Our goal is to make an instrument to predict some potential bad bank loans in the future. We use some transact data and for simpicity we want to apply to this data in Excel a logistic function with specific weights from RM Model. But now it seems that we couldn't. Am I right? If yes, how do you think, what should we do? Insert RM Model code into the core of script, that mines our data? Or, however, we have more simple way?

Thanks a lot
and best regards,
Pavel

Thomas_Ott · November 2016

IMHO, the method that Martin proposes is probably the faster and better solution.

Import your data (by Excel or Database) do your training and scoring in RapidMiner, and at the end write out the predicted results to Excel. Going through the trouble of finding the weights in RapidMiner to then plug it into a Excel Logistic Regression model and crunching it there feels very time consuming.

However, you can extract the Logistic Regression operator weights by using a Weight to Data operator and then Write Excel.

psantipov · November 2016

Dear T-bone,

Thanks for response and opinion!
Couldn't you please clarify one thing for me (it was discussed earlier, but I unfortunatly didn't catch).
Why if I use the logistic formula in Excel 1/(1+exp(-z)) with weights from RM model for atributes I find another results for probabilities on the same data? How it works? I really couldn't understand and I will be trully thankfull If someone could explain it.

Thanks a lot,
Pavel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Logistic Regression

Answers