Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Probability Outputs of Logistic Regression (kernalized)"

confuzioconfuzio Member Posts: 4 Contributor I
edited May 2019 in Help
Hello,

using Kernel Logistic Regression in RapidMiner 5.1 I could not figure out by now how to get probability predictions (the estimated probability that this customer will default is e.g. 0.11, not simply: this customer will default). When using "apply model" all I get is 0/1-predictions (RM internally uses threshold p_hat = 0.5?).
As I use the radial kernel I can not simply plug the estimated coefficients in p_hat = exp(Xb)/(1+exp(Xb)).

Can anyone tell my how to get probability predictions? I'd be really grateful!

Here is my process (As I usually work with R I'm not really used to RapidMiner by now):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
    <process expanded="true" height="404" width="592">
      <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
        <parameter key="repository_entry" value="Data"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="179" y="75">
        <parameter key="name" value="default"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="optimize_parameters_grid" compatibility="5.1.001" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="313" y="75">
        <list key="parameters">
          <parameter key="Logistic Regression (2).kernel_gamma" value="[0.0001;2000;5;logarithmic]"/>
          <parameter key="Logistic Regression (2).C" value="[0.00000001;10;5;logarithmic]"/>
        </list>
        <process expanded="true" height="381" width="592">
          <operator activated="true" class="x_validation" compatibility="5.1.001" expanded="true" height="112" name="Validation (2)" width="90" x="179" y="30">
            <parameter key="number_of_validations" value="3"/>
            <parameter key="use_local_random_seed" value="true"/>
            <parameter key="local_random_seed" value="1994"/>
            <process expanded="true" height="399" width="280">
              <operator activated="true" class="logistic_regression" compatibility="5.1.001" expanded="true" height="94" name="Logistic Regression (2)" width="90" x="95" y="30">
                <parameter key="kernel_type" value="radial"/>
                <parameter key="kernel_gamma" value="2000.0"/>
                <parameter key="C" value="10.0"/>
              </operator>
              <connect from_port="training" to_op="Logistic Regression (2)" to_port="training set"/>
              <connect from_op="Logistic Regression (2)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="399" width="280">
              <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_binominal_classification" compatibility="5.1.001" expanded="true" height="76" name="Performance (2)" width="90" x="162" y="30">
                <parameter key="main_criterion" value="AUC"/>
                <parameter key="accuracy" value="false"/>
                <parameter key="AUC" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="log" compatibility="5.1.001" expanded="true" height="76" name="Log" width="90" x="361" y="31">
            <list key="log">
              <parameter key="gamma" value="operator.Logistic Regression (2).parameter.kernel_gamma"/>
              <parameter key="C" value="operator.Logistic Regression (2).parameter.C"/>
            </list>
          </operator>
          <connect from_port="input 1" to_op="Validation (2)" to_port="training"/>
          <connect from_op="Validation (2)" from_port="averagable 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="performance"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store" width="90" x="454" y="112">
        <parameter key="repository_entry" value="valid.param"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
      <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
      <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_op="Store" to_port="input"/>
      <connect from_op="Store" from_port="through" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Tagged:

Answers

  • steffensteffen Member Posts: 347 Maven
    Hello confuzio

    Here is my dummy process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.001">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
       <process expanded="true" height="449" width="882">
         <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="45" y="210">
           <parameter key="repository_entry" value="//Samples/data/Golf"/>
         </operator>
         <operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select_Numerical_Predictors" width="90" x="179" y="210">
           <parameter key="attribute_filter_type" value="value_type"/>
           <parameter key="value_type" value="numeric"/>
         </operator>
         <operator activated="true" breakpoints="after" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="313" y="210"/>
         <operator activated="true" class="logistic_regression" compatibility="5.1.001" expanded="true" height="94" name="Logistic Regression" width="90" x="514" y="210">
           <parameter key="kernel_type" value="radial"/>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="715" y="210">
           <list key="application_parameters"/>
         </operator>
         <connect from_op="Retrieve" from_port="output" to_op="Select_Numerical_Predictors" to_port="example set input"/>
         <connect from_op="Select_Numerical_Predictors" from_port="example set output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Logistic Regression" to_port="training set"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Logistic Regression" from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    which creates confidences aka a rough prob output. What does your process look like ?

    In general it is recommended to post the used process whenever possible. Make it easy for us to help you ;)

    hope this was helpful,

    steffen
  • confuzioconfuzio Member Posts: 4 Contributor I
    Ok, thanks for your process. I figured out how to save the confidences.
    Just one more question: "which creates confidences aka a rough prob output" -- do you simply mean by logistic regression estimated probabilities? I'm confused about the "rough".
  • steffensteffen Member Posts: 347 Maven
    the answer is: Yes, I mean the estimated probabilities.

    Sorry for the confusion

    greetings,

    steffen
  • confuzioconfuzio Member Posts: 4 Contributor I
    Thanks a lot for your help!
Sign In or Register to comment.