turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community Home
- :
- Product Help
- :
- Use Cases Forum
- :
- Logistic Regression

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-07-2010 05:45 PM

04-07-2010 05:45 PM

I was able to spend the afternoon today playing around with different learners and model outputs, and I finally think I understand the resulting model from the W-logistic learner (mentioned above by ratheesan). Using Haddock's example, I was able to substitute W-logistic as the learner and get a resulting model that I was able to implement outside of RapidMiner and calculate similar probabilities/confidences/predictions as RapidMiner. Yay!!!

However, without having to dig into sourcecode, I would still like to be able to take the resulting models from the RapidMiner (not Weka) classification learners and be able to implement these models outside of RapidMiner. This includes Logistic Regression, SVM, etc.

I run an experiment, I obtain a model with weightings of attributes and an offset, I see the example set and the calculated confidence levels from the RapidMiner experiment, but I don't know how RapidMiner is coming up with these calculations. I believe B_Miner is running into the same issue.

Is there anyone out there who can help me understand how to use these SVM or Logistic Regression models once they are created by RapidMiner? What are the formulas that these weightings/offsets get plugged into? Are the formulas linear, quadratic, higher-order polynomial equations? Again, any guidance would be greatly appreciated.

Many thanks in advance,

David

However, without having to dig into sourcecode, I would still like to be able to take the resulting models from the RapidMiner (not Weka) classification learners and be able to implement these models outside of RapidMiner. This includes Logistic Regression, SVM, etc.

I run an experiment, I obtain a model with weightings of attributes and an offset, I see the example set and the calculated confidence levels from the RapidMiner experiment, but I don't know how RapidMiner is coming up with these calculations. I believe B_Miner is running into the same issue.

Is there anyone out there who can help me understand how to use these SVM or Logistic Regression models once they are created by RapidMiner? What are the formulas that these weightings/offsets get plugged into? Are the formulas linear, quadratic, higher-order polynomial equations? Again, any guidance would be greatly appreciated.

Many thanks in advance,

David

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-07-2010 05:56 PM

04-07-2010 05:56 PM

Hey ElPato,

Yep I am stumped by the non-Weka implementations of logistic regression. There is (1) a bug or (2) this is some flavor of LR besides the ordinary one implemented in SAS, R, SPSS etc. I.e. the one of Hosmer/Lemeshow and Agresti.

I did not know there were issues with SVM as well? What are you setting this up as (can you post code) and what are you using to compare the results to?

Yep I am stumped by the non-Weka implementations of logistic regression. There is (1) a bug or (2) this is some flavor of LR besides the ordinary one implemented in SAS, R, SPSS etc. I.e. the one of Hosmer/Lemeshow and Agresti.

I did not know there were issues with SVM as well? What are you setting this up as (can you post code) and what are you using to compare the results to?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-09-2010 03:36 PM

04-09-2010 03:36 PM

Hey B_Miner,

Thanks for the reply! Glad to see I am not the only one a bit confused. I know I am not an expert in data mining or machine learning algorithms, but I am trying to educate myself as much as possible. It just seems kinda important to understand exactly what the different algorithms are doing otherwise how can someone possibly understand how to interpret the results.

As far as the Logistic regression operators go, I ran the same set of data above with the W-SimpleLogistic operator and received the exact same results as the RapidMiner Logistic regression operator! They must be performing the same calculations. Now ... if only someone can explain what those calculations are, I would be extemely grateful .

As far as the SVM models go, let's say I take the same example as Haddock gave above, but substitute the LibSVM RapidMiner learner. Below is the XML:

I get the following results:

Using the polynomial kernel, how am I suppose to apply the weightings to the attributes? What about for some of the other kernels, like rbf or sigmoid? I understand the concept and math surrounding SVMs and the separating hyperplane, but I have no idea about how to apply these weightings or derive confidence/prediction values. Any assistance, again, would be greatly appreciated (even if it involves pointing me elsewhere on the web for education).

Thanks,

David

Thanks for the reply! Glad to see I am not the only one a bit confused. I know I am not an expert in data mining or machine learning algorithms, but I am trying to educate myself as much as possible. It just seems kinda important to understand exactly what the different algorithms are doing otherwise how can someone possibly understand how to interpret the results.

As far as the Logistic regression operators go, I ran the same set of data above with the W-SimpleLogistic operator and received the exact same results as the RapidMiner Logistic regression operator! They must be performing the same calculations. Now ... if only someone can explain what those calculations are, I would be extemely grateful .

As far as the SVM models go, let's say I take the same example as Haddock gave above, but substitute the LibSVM RapidMiner learner. Below is the XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<process version="5.0">

<context>

<input>

<location/>

</input>

<output>

<location/>

<location/>

<location/>

</output>

<macros/>

</context>

<operator activated="true" class="process" expanded="true" name="Root">

<process expanded="true" height="758" width="882">

<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">

<parameter key="repository_entry" value="//Samples/data/Sonar"/>

</operator>

<operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="238" y="29">

<parameter key="kernel_type" value="poly"/>

<list key="class_weights"/>

</operator>

<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="210">

<parameter key="repository_entry" value="//Samples/data/Sonar"/>

</operator>

<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="210">

<list key="application_parameters"/>

</operator>

<connect from_op="Retrieve" from_port="output" to_op="SVM" to_port="training set"/>

<connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>

<connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>

<connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>

<connect from_op="Apply Model" from_port="model" to_port="result 1"/>

<portSpacing port="source_input 1" spacing="0"/>

<portSpacing port="sink_result 1" spacing="0"/>

<portSpacing port="sink_result 2" spacing="0"/>

<portSpacing port="sink_result 3" spacing="0"/>

</process>

</operator>

</process>

I get the following results:

Kernel Model

Total number of Support Vectors: 159

Bias (offset): -1.191

w[attribute_1] = 23749.738

w[attribute_2] = 31592.323

w[attribute_3] = 35680.074

w[attribute_4] = 46113.371

w[attribute_5] = 58430.884

w[attribute_6] = 74797.426

w[attribute_7] = 86353.872

w[attribute_8] = 95989.628

w[attribute_9] = 129648.901

w[attribute_10] = 152098.800

w[attribute_11] = 179324.874

w[attribute_12] = 191024.717

w[attribute_13] = 200005.157

w[attribute_14] = 207625.943

...

...

w[attribute_58] = 6238.179

w[attribute_59] = 6269.692

w[attribute_60] = 4968.341

number of classes: 2

number of support vectors for class Rock: 78

number of support vectors for class Mine: 81

Using the polynomial kernel, how am I suppose to apply the weightings to the attributes? What about for some of the other kernels, like rbf or sigmoid? I understand the concept and math surrounding SVMs and the separating hyperplane, but I have no idea about how to apply these weightings or derive confidence/prediction values. Any assistance, again, would be greatly appreciated (even if it involves pointing me elsewhere on the web for education).

Thanks,

David

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-12-2010 05:06 AM

04-12-2010 05:06 AM

Hi David,

if you are going to understand what each learner does, I would recommend taking a look in "Elements of Statistical Learning" of Hastie and Tibshirani. It's a very statistical oriented book, but gives in detail insight to this methods and models.

Greetings,

Sebastian

if you are going to understand what each learner does, I would recommend taking a look in "Elements of Statistical Learning" of Hastie and Tibshirani. It's a very statistical oriented book, but gives in detail insight to this methods and models.

Greetings,

Sebastian

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-14-2010 06:10 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-15-2010 04:09 AM

04-15-2010 04:09 AM

Hi,

probably there will be differences in the implementations and I doubt the weights will be the same. But they should either come near to the other weights or at least perform equally.

Greetings,

Sebastian

probably there will be differences in the implementations and I doubt the weights will be the same. But they should either come near to the other weights or at least perform equally.

Greetings,

Sebastian

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-16-2010 06:17 PM

04-16-2010 06:17 PM

Its curious, the weights are not close for RM or WEKA logistic regression (RM was set to dot kernel and WEKA is the Simple Logistic) compared to SAS. They are not close to each other at all. The prediction probabilities for WEKA are close to SAS, RM is far different.

Its curious because logistic regression is used not only for prediction but for inference, from a strictly statistical position, were the exponentiated weights are odds ratios.

I have coefficient from SAS and small data file if interested.

Its curious because logistic regression is used not only for prediction but for inference, from a strictly statistical position, were the exponentiated weights are odds ratios.

I have coefficient from SAS and small data file if interested.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-19-2010 12:00 PM

04-19-2010 12:00 PM

Hello,

it is actually not a big surprise that those differences occur. First, in contrast to most other implementations, the logistic regression learner from RapidMiner is basically a support vector machine with a different loss function. The author of this implementation told me once that the whole optimization approach is a bit different from that known from more traditional implementations. While this make some nifty things possible like the integration of kernel function, this might also lead to different results. At least, the predictions should rely a lot on some parameters as "C" and can hardly be directly compared.

The second difference seems to be the way how the confidences are calculated. Due to the kernel based optimization approach they are derived from the predictions based on the lagrange multipliers, the training examples and the kernel function. On those predictions a probability scaling somewhat similar (but much simpler) to Platt scaling is applied. As long as you read the confidences as what they are (as "confidence" instead of "probability") this is usual fine.

Cheers,

Ingo

How to load processes in XML from the forum into RapidMiner: Read this!

it is actually not a big surprise that those differences occur. First, in contrast to most other implementations, the logistic regression learner from RapidMiner is basically a support vector machine with a different loss function. The author of this implementation told me once that the whole optimization approach is a bit different from that known from more traditional implementations. While this make some nifty things possible like the integration of kernel function, this might also lead to different results. At least, the predictions should rely a lot on some parameters as "C" and can hardly be directly compared.

The second difference seems to be the way how the confidences are calculated. Due to the kernel based optimization approach they are derived from the predictions based on the lagrange multipliers, the training examples and the kernel function. On those predictions a probability scaling somewhat similar (but much simpler) to Platt scaling is applied. As long as you read the confidences as what they are (as "confidence" instead of "probability") this is usual fine.

Cheers,

Ingo

How to load processes in XML from the forum into RapidMiner: Read this!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-19-2010 05:34 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-20-2010 03:57 AM

04-20-2010 03:57 AM

Yes, please keep me updated if you get the chance. I could imagine that the real strength of the kernel logistic regression lies in cases where classification tasks are non-linear and an appropriate kernel function is used. The traditional logistic regression on the other hand might outperform in the linear case and is definitely better suited if real probabilities are necessary. But maybe I am completely wrong

Don't forget to optimize at least C since without it the kernel logistic regression is not likely to produce good results anyway...

Cheers,

Ingo

How to load processes in XML from the forum into RapidMiner: Read this!

Don't forget to optimize at least C since without it the kernel logistic regression is not likely to produce good results anyway...

Cheers,

Ingo

How to load processes in XML from the forum into RapidMiner: Read this!