**RapidMiner 9.7 is Now Available**

### Lots of amazing new improvements including true version control! Learn more about what's new here.

### CLICK HERE TO DOWNLOAD

# Logistic Regression

Hi,

I have 2 independent variables namely Score1 and Score2 and my dependent class variable is state which contain two values yes and No.My objective is to get

log(odd ratio) or logit =a+b*score1+c*score2. I applied logistic regression on these data and the result is like.

Bias(offset)=1.2

W[score1]=17.32

W[Score2]=18.33

Please anybody help me to interpret the results.

By

Ratheesan

I have 2 independent variables namely Score1 and Score2 and my dependent class variable is state which contain two values yes and No.My objective is to get

log(odd ratio) or logit =a+b*score1+c*score2. I applied logistic regression on these data and the result is like.

Bias(offset)=1.2

W[score1]=17.32

W[Score2]=18.33

Please anybody help me to interpret the results.

By

Ratheesan

Tagged:

0

## Answers

849GuruThere is a Wiki page on logistic regresssion, http://en.wikipedia.org/wiki/Logistic_regression, from which it looks like...

Intercept = β0 = a = Bias(offset) = 1.2

Coefficient = β1 = b = W[score1] = 17.32

Coefficient = β2 = c = W[score2] = 18.33

???

68MavenThanks for your valuable help.

Thanks

Ratheesan

68MavenI have one more doubt,ie,Is the model corresponding to probability of YES or probability of NO.

Thanks

Ratheesan

849GuruThe regression enables you to work out the probabilities for each of the label values, and to select the most probable, as you can see from the following...

10Contributor IIFor a logistic regression problem, Rapid Miner calculates the intercept and the coefficients: β0,β1, and β2 in his example.

Then, according to http://en.wikipedia.org/wiki/Logistic_regression, we use these values to obtain a "z" value. For example,

z = β0 + β1*<attribute1 value> + β2*<attribute2 value>.

This "z" value is then evaluated in the equation 1/(1+e

^{-z}) in order to obtain the probabiliity that this particular instance will evaluate to "TRUE". Is this correct?I am interested in extending a Rapid Miner model to evaluate real-time data and calculate confidence probabilities outside of RapidMiner and want to know if I am on the right track as to how to accomplish this. Unfortunately, when I follow the process outlined above, I am not obtaining the same confidence probabilities for data entries that are being calculated in RapidMiner.

Again, thanks for all the help!

David

849Guru, i.e the probability that this instance is a rock, and the probability that this instance is a mine.for each possible label value10Contributor IILet me tell you where I am at ...

In your example you provided, I run it and view the Example Set in Data View. Here I can see each row, the class, the confidence(Rock), the confidence(Mine), the prediction(class), as well as each attribute value for each row.

Explicitly, how can I replicate the confidence values calculated for each row of data (is this the possible label value which you are speaking of)?

I have tried to replicate these values using the instructions discussed in the Logistic Regression wiki, but I do not get the same answers. Again, I would like to extend what I am discovering in RapidMiner to real-time confidence calculations of data.

Thanks again for all the education!

David

72MavenI tried to look and see what RM was doing...but then I got worried.

I took the sonar data Haddock and used just used the first attribute as a predictor (see below)

The first example has attribute 1 = 0.020 so with the model returned here is what I would have expected: But the two confidences RM gives are 0.6438660616438266 and 0.3561339383561734

Any ideas?

Here is the RM xml:

68MavenI applied W-logistic instead of logistic regression on the above mentioned algorithm.There it clearly pointed that the model is corresponding to Class=ROCK.

So I think the result provided by Logistic Regression is also for class=ROCK. Is it right??

Thanks

Ratheesan

10Contributor IIHowever, without having to dig into sourcecode, I would still like to be able to take the resulting models from the RapidMiner (not Weka) classification learners and be able to implement these models outside of RapidMiner. This includes Logistic Regression, SVM, etc.

I run an experiment, I obtain a model with weightings of attributes and an offset, I see the example set and the calculated confidence levels from the RapidMiner experiment, but I don't know how RapidMiner is coming up with these calculations. I believe B_Miner is running into the same issue.

Is there anyone out there who can help me understand how to use these SVM or Logistic Regression models once they are created by RapidMiner? What are the formulas that these weightings/offsets get plugged into? Are the formulas linear, quadratic, higher-order polynomial equations? Again, any guidance would be greatly appreciated.

Many thanks in advance,

David

72MavenYep I am stumped by the non-Weka implementations of logistic regression. There is (1) a bug or (2) this is some flavor of LR besides the ordinary one implemented in SAS, R, SPSS etc. I.e. the one of Hosmer/Lemeshow and Agresti.

I did not know there were issues with SVM as well? What are you setting this up as (can you post code) and what are you using to compare the results to?

10Contributor IIThanks for the reply! Glad to see I am not the only one a bit confused. I know I am not an expert in data mining or machine learning algorithms, but I am trying to educate myself as much as possible. It just seems kinda important to understand exactly what the different algorithms are doing otherwise how can someone possibly understand how to interpret the results.

As far as the Logistic regression operators go, I ran the same set of data above with the W-SimpleLogistic operator and received the exact same results as the RapidMiner Logistic regression operator! They must be performing the same calculations. Now ... if only someone can explain what those calculations are, I would be extemely grateful .

As far as the SVM models go, let's say I take the same example as Haddock gave above, but substitute the LibSVM RapidMiner learner. Below is the XML: I get the following results: Using the polynomial kernel, how am I suppose to apply the weightings to the attributes? What about for some of the other kernels, like rbf or sigmoid? I understand the concept and math surrounding SVMs and the separating hyperplane, but I have no idea about how to apply these weightings or derive confidence/prediction values. Any assistance, again, would be greatly appreciated (even if it involves pointing me elsewhere on the web for education).

Thanks,

David

2,531Unicornif you are going to understand what each learner does, I would recommend taking a look in "Elements of Statistical Learning" of Hastie and Tibshirani. It's a very statistical oriented book, but gives in detail insight to this methods and models.

Greetings,

Sebastian

72MavenShould logistic regression in RM produce weights that match say SAS or SPSS ?

2,531Unicornprobably there will be differences in the implementations and I doubt the weights will be the same. But they should either come near to the other weights or at least perform equally.

Greetings,

Sebastian

72MavenIts curious because logistic regression is used not only for prediction but for inference, from a strictly statistical position, were the exponentiated weights are odds ratios.

I have coefficient from SAS and small data file if interested.

1,749RM Founderit is actually not a big surprise that those differences occur. First, in contrast to most other implementations, the logistic regression learner from RapidMiner is basically a support vector machine with a different loss function. The author of this implementation told me once that the whole optimization approach is a bit different from that known from more traditional implementations. While this make some nifty things possible like the integration of kernel function, this might also lead to different results. At least, the predictions should rely a lot on some parameters as "C" and can hardly be directly compared.

The second difference seems to be the way how the confidences are calculated. Due to the kernel based optimization approach they are derived from the predictions based on the lagrange multipliers, the training examples and the kernel function. On those predictions a probability scaling somewhat similar (but much simpler) to Platt scaling is applied. As long as you read the confidences as what they are (as "confidence" instead of "probability") this is usual fine.

Cheers,

Ingo

72MavenB

1,749RM FounderDon't forget to optimize at least C since without it the kernel logistic regression is not likely to produce good results anyway...

Cheers,

Ingo

3Contributor II see that it is very old theme and that Ingo wrotea lot of interesting information, but I should ask. For RM logistic regression model is there any method to use it, for example, in Excel (via logistic function formula with weights defined in RM or smth like this)? Maybe I should use some specific logistic regression operator for that goals?

I am not quite good in data science and English but I hope that somebody Who knows the answer will read it soon.

Thank you for reading!

Pavel.

2,485RM Data ScientistDear Pavel,

welcome to the community!

It is technically possible to take the equation from RM and put it into RM. But the big question is - why do you want this?

A Model is always the connection of the preprocessing and the machine learning part itself. If you use a machine learning model on a table which is not prepeared in the very same way it will work but create wrong or unreasonable results.

Why dont you create a rm porcess: Read Excel -> Prepare Data -> Apply Model -> Write Excel

Best,

Martin

Dortmund, Germany

3Contributor Ithanks a lot for the response!

Our goal is to make an instrument to predict some potential bad bank loans in the future. We use some transact data and for simpicity we want to apply to this data in Excel a logistic function with specific weights from RM Model. But now it seems that we couldn't. Am I right? If yes, how do you think, what should we do? Insert RM Model code into the core of script, that mines our data? Or, however, we have more simple way?

Thanks a lot

and best regards,

Pavel

1,761UnicornIMHO, the method that Martin proposes is probably the faster and better solution.

Import your data (by Excel or Database) do your training and scoring in RapidMiner, and at the end write out the predicted results to Excel. Going through the trouble of finding the weights in RapidMiner to then plug it into a Excel Logistic Regression model and crunching it there feels very time consuming.

However, you can extract the Logistic Regression operator weights by using a Weight to Data operator and then Write Excel.

3Contributor IThanks for response and opinion!

Couldn't you please clarify one thing for me (it was discussed earlier, but I unfortunatly didn't catch).

Why if I use the logistic formula in Excel 1/(1+exp(-z)) with weights from RM model for atributes I find another results for probabilities on the same data? How it works? I really couldn't understand and I will be trully thankfull If someone could explain it.

Thanks a lot,

Pavel