RapidMiner

RapidMiner

Logistic Regression

Regular Contributor

Logistic Regression

Hi,

I have 2 independent variables namely Score1 and Score2 and my dependent class variable is state which contain two values yes and No.My objective is to get
log(odd ratio) or logit =a+b*score1+c*score2. I applied logistic regression on these data and the result is like.

Bias(offset)=1.2
W[score1]=17.32
W[Score2]=18.33

Please anybody help me to interpret the results.

By
Ratheesan
19 REPLIES
Regular Contributor

Re: Logistic Regression

Hi there,

There is a Wiki page on logistic regresssion, http://en.wikipedia.org/wiki/Logistic_regression, from which it looks like...


Intercept        =  β0 = a = Bias(offset) = 1.2
Coefficient      =  β1 = b = W[score1]  = 17.32
Coefficient      =  β2 = c = W[score2]  = 18.33

???
Regular Contributor

Re: Logistic Regression

Hello Haddock,

Thanks for your valuable help.

Thanks
Ratheesan
Regular Contributor

Re: Logistic Regression

Hello,
I have one more doubt,ie,Is the model corresponding to probability of YES or probability of NO.

Thanks
Ratheesan
Regular Contributor

Re: Logistic Regression

Greetings,

The regression enables you to work out the probabilities for each of the label values,  and to select the most probable, as you can see from the following...

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Root">
    <description>Using a logistic regression learner for a classification task of numerical data.</description>
    <process expanded="true" height="584" width="962">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="../../data/Sonar"/>
      </operator>
      <operator activated="true" class="logistic_regression" expanded="true" height="94" name="MyKLRLearner" width="90" x="179" y="30">
        <parameter key="calculate_weights" value="false"/>
        <parameter key="return_optimization_performance" value="false"/>
      </operator>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="431" y="28">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="MyKLRLearner" to_port="training set"/>
      <connect from_op="MyKLRLearner" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="MyKLRLearner" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Contributor II

Re: Logistic Regression

I think my earlier post might be very similar to what ratheesan is wanting to know.

For a logistic regression problem, Rapid Miner calculates the intercept and the coefficients: β0,β1, and β2 in his example.

Then, according to http://en.wikipedia.org/wiki/Logistic_regression, we use these values to obtain a "z" value. For example,
z = β0 + β1*<attribute1 value> + β2*<attribute2 value>.

This "z" value is then evaluated in the equation 1/(1+e-z) in order to obtain the probabiliity that this particular instance will evaluate to "TRUE". Is this correct?

I am interested in extending a Rapid Miner model to evaluate real-time data and calculate confidence probabilities outside of RapidMiner and want to know if I am on the right track as to how to accomplish this. Unfortunately, when I follow the process outlined above, I am not obtaining the same confidence probabilities for data entries that are being calculated in RapidMiner.

Again, thanks for all the help!

David
Regular Contributor

Re: Logistic Regression

Hi David,

This "z" value is then evaluated in the equation 1/(1+e-z) in order to obtain the probabiliity that this particular instance will evaluate to "TRUE". Is this correct?


In a word, no. If you run the code I posted you will see that z is calculated for each possible label value, i.e the probability that this instance is a rock, and the probability that this instance is a mine.
Contributor II

Re: Logistic Regression

Thanks for the reply! I am still confused, but I have high hopes I will understand this very soon.

Let me tell you where I am at ...
In your example you provided, I run it and view the Example Set in Data View. Here I can see each row, the class, the confidence(Rock), the confidence(Mine), the prediction(class), as well as each attribute value for each row.

Explicitly, how can I replicate the confidence values calculated for each row of data (is this the possible label value which you are speaking of)?

I have tried to replicate these values using the instructions discussed in the Logistic Regression wiki, but I do not get the same answers. Again, I would like to extend what I am discovering in RapidMiner to real-time confidence calculations of data.

Thanks again for all the education!

David
Regular Contributor

Re: Logistic Regression

Is the confusion around the fitted parameters? Depending on if you are modeling 1 or 0, the signs of the coefficients flip....at least in other implementations I am familiar with (SAS).

I tried to look and see what RM was doing...but then I got worried.

I took the sonar data Haddock and used just used the first attribute as a predictor (see below)
The first example has attribute 1 = 0.020 so with the model returned here is what I would have expected:


Bias 0.184
weight_att 1 0.69

att1 0.02
linear predictor (mu) 0.1978
1/(1+exp(-mu)) 0.549289401



But the two confidences RM gives are 0.6438660616438266 and 0.3561339383561734
Any ideas?

Here is the RM xml:



<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Root">
    <description>Using a logistic regression learner for a classification task of numerical data.</description>
    <process expanded="true" height="298" width="614">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="work_on_subset" expanded="true" height="94" name="Work on Subset" width="90" x="112" y="120">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="attribute_1|class"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="keep_subset_only" value="true"/>
        <process expanded="true" height="298" width="614">
          <connect from_port="exampleSet" to_port="example set"/>
          <portSpacing port="source_exampleSet" spacing="0"/>
          <portSpacing port="sink_example set" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <portSpacing port="sink_through 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="logistic_regression" expanded="true" height="94" name="Logistic Regression" width="90" x="246" y="75"/>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="165">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Work on Subset" to_port="example set"/>
      <connect from_op="Work on Subset" from_port="example set" to_op="Logistic Regression" to_port="training set"/>
      <connect from_op="Work on Subset" from_port="through 1" to_port="result 4"/>
      <connect from_op="Logistic Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Logistic Regression" from_port="weights" to_port="result 3"/>
      <connect from_op="Logistic Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>




Regular Contributor

Re: Logistic Regression

Hi,
I applied W-logistic instead of logistic regression on the above mentioned algorithm.There it clearly pointed that the model is corresponding to Class=ROCK.
So I think the result provided by Logistic Regression is also for class=ROCK. Is it right??

Thanks
Ratheesan