Find Threshold threshold NaN

HIshaqHIshaq Member Posts: 1 Contributor I
edited November 2018 in Help
Hello Folks,

I am trying to use the "Find Threshold" operator to find a threshold for some dummy data I have made for High school dropouts. I import the data using the wizard, and have assigned the "label", "prediction" and "confidence" by selecting them from the drop down menus, and are applied through the "set role" operators. What I am doing is that the if the "label" says "no", and the confidence level is above 0.5, I set my "prediction" to "no", i.e. if a person has not dropped out so far, with a confidence of >= 0.5, it is predicted that they will not drop out in the coming year. Here is the XML code:


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Root">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve test8" width="90" x="45" y="120">
        <parameter key="repository_entry" value="//Local Repository/test8"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role" width="90" x="179" y="120">
        <parameter key="attribute_name" value="Dropped out"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role (2)" width="90" x="296" y="120">
        <parameter key="attribute_name" value="Prediction"/>
        <parameter key="target_role" value="prediction"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role (3)" width="90" x="438" y="120">
        <parameter key="attribute_name" value="Confidence"/>
        <parameter key="target_role" value="confidence"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="find_threshold" compatibility="5.3.008" expanded="true" height="76" name="Find Threshold" width="90" x="581" y="120">
        <parameter key="show_roc_plot" value="true"/>
      </operator>
      <connect from_op="Retrieve test8" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
      <connect from_op="Set Role (3)" from_port="example set output" to_op="Find Threshold" to_port="example set"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>


What I was expecting was, like the tutorial, the "find threshold" operator would give me a new, better, threshold based on the data. The threshold value I get is NaN. The ROC is a vertical line along the y-axis, and then a horizontal line on the top. I guess that FP/N must be zero? Why? How do I fix this?

Following is the relevant part of the data I am working with:

Dropped out Confidence Confidence(negative) Prediction
n                 0.75                 0.25                         n
n                 0.82                 0.18                         n
n                 0.43                 0.57                         y
y                 0.1                 0.9                                 y
n                 0.7                 0.3                                 n
n                 0.85                 0.15                         n
n                 0.6                 0.4                                 n
n                 0.89                  0.11                          n
n                 0.46                 0.54                         y
n                 0.39                 0.61                         y
n                 0.7                 0.3                                 n
n                 0.4                 0.6                                 y
n                 0.9                 0.1                                 n
n                 0.81                 0.19                         n
y                 0.69        0.31                         y
n                 0.55                 0.45                         n

I hope I have followed all the required steps in making this post. And I thank you in advance for your help.

Kind regards,
HIshaq

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hishaq, indeed there seems to be a problem with the Find Threshold operator. I have created an internal ticket such that the development team can have a look on it.

    Best regards,
    Marius
Sign In or Register to comment.