Creating and Applying Thresholds

M_MartinM_Martin RapidMiner Certified Analyst, Member Posts: 125 Unicorn
edited December 2018 in Help

Colleagues:  

 

I've used the Preventitive Maintainence Machine Failure data set that comes with RapidMiner to experiment with creating various classification models.  I saved a model I developed to predict machine failure using the "Write Model" operator. This model was the output of a fair amount of optimiizations, feature selection experimentation using "Optimize Parameters", "Cross Validation", and other feature selection related operators.

 

I'd like to use the "Read Model" operator to load the Model I developed and load new data that the model hasn't seen and apply predictions using the beforementioned model - and then set various thresholds using the "Set Threshold" and "Apply Threshold" operators (related to the confidence attributes added by applying the model) to see the effect on prediction outcomes.

 

The file "Create_and_Apply_Threshold_Example_No_Error.png" shows a very simple process (based on the tutorial example) in which I can set and apply thresholds - but only by using a very generic setup with a (knn in this case) learner and no cross validation.  All Attributes are recognized (26 in the data and 3 added by the model) for a total of 29.

 

The file "Create_and_Apply_Threshold_Example_Error.png" is another process in which I load the before mentioned saved model using "Read Model" and apply it to new data - but as the error message shows, the output of "Apply Model" shows only 26 attributes and the Apply Threshold operator returns the error message shown in "Create_and_Apply_Threshold_Example_No_Error_Nr_2.jpg".   For some reason, the attributes (the Fail / No Fail predictions and confidences) added by applying the model against new data are not recognized by the "Apply Threshold" opertator.

 

I went back to my original process in which I created the model and tried to apply thresholds against new data in my original process but I still get the same error messages.  Once again, the attributes (the Fail / No Fail predictions and confidences) added by applying the model against new data within my original process are not recognized by the "Apply Threshold" opertator.

 

The only way I can get the "Apply Threshold" operator to work is within the most simple of processes as mentioned above.  

 

I imagine I am missing a very obvious point as it appears setting and applying thresholds is dead simple to do.  To ensure that alll metadata would be available at run time, I stored the test data, the new data, and the predictive model in my local repositiory before trying to build a process that included setting and applying thresholds using these objects.

 

Thanks for any suggestions and best wishes, Michael Martin

Best Answer

  • M_MartinM_Martin RapidMiner Certified Analyst, Member Posts: 125 Unicorn
    Solution Accepted

    Colleagues:

     

    A comment (from contributor xitignin a post from today (6.August) named "Unable to select attribute subset using select attributes" gave me an idea to try in a process in which I attempt to set and apply predicition confidence thresholds.  

     

    In this process, I read a classification model from disk and apply that model to data the model hasn't seen before.  After applying the model to the new data, I want to set a probability threshold for prediction confidence other than the default confidence.

     

    The Solution: I simply opened the Process, went to the "Process" Menu, and clicked on "Validate Process" as described by xitign.

     

    The warning message within the "Apply Threshold" operator remains, but I can now experiment with a varierty of confidence thresholds and note the effect changeing the confidence threshold has on model predictions.  

     

    RapidMiner no loner aborts the running of the process due to the lack of a Label (predictable attribute) in the new data.

     

    This is one tip I won't forget.  Thanks xitign!   ;-)

     

    Michael

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi Michael,

     

    this pretty much looks strange. I am using thresholds weekly and never encountered such a problem. Any chance you can share the data/model privatly?

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • M_MartinM_Martin RapidMiner Certified Analyst, Member Posts: 125 Unicorn

    Hallo Martin:

     

    Danke fuer Ihre Meldung!

     

    I've tried posting the the data in Excel format, the .rmp file, and the model file to this thread in the forum but I keep getting a message stating the the file contents don't match the expected file type.  I've even tried zipping everything up in a .rar archive, no luck.  

     

    Can I email everything to you privately?  If so, please tell me where I should send everything to.  My email is michael@informationarts.ca

     

     

    Best wishes,

     

    Michael

  • M_MartinM_Martin RapidMiner Certified Analyst, Member Posts: 125 Unicorn

    Hallo Martin:

     

    I put the .mod file (which was saved in .xml format) inside of a Word Document (attached) and that solved the issue I mentioned above.

     

    You would need to copy the contents of the word doc and paste them into an empty text document and save it using a .mod extension under the file name:

     

    mdl_Machine_Failiure_Predictions.mod

     

    Also attached is the data (Excel format) the model generates predictions against, and the .rmp process file with the "Create Threshold" and "Apply Threhold" operators.

     

    What I want to do is experiment with different thresholds and see how these changes impact the number of 'yes' (i.e. failure) predictions.

     

    Thanks for considering all of this when you get a chance.

     

    Best wishes,

     

    Michael

  • M_MartinM_Martin RapidMiner Certified Analyst, Member Posts: 125 Unicorn

    Hallo Martin:

     

    I see that Excel files cannot be posted - attached is the same data in .csv format.  As the process expects Excel you would need to either mofidy the process to the read .csv file or put the .csv file contents into an Excel file.  I am also attaching the two other files (attached to my lat post) to this post.

     

    Best wishes, Michael

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    hi,

     

    i am not able to get this running. Can you send it to me: mschmitz at rapidminer.com

     

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.