Options

Out of spec data

Process_InternProcess_Intern Member Posts: 13 Contributor I
Hi all, 

I am newbie on RM. I was wondering if there is a way on Rapid Miner to sort out the data. I am more advanced on Python and Excel, but comparing to an Excel, I would like RM to check my table on a certain column. In that column, if I have a value out of a specification that I give to my program, I would like Rapid Miner to give me a hint on that, maybe raise a flag or something ttelling me that I have a value out of specs. 

Thanks :smile:

Best Answer

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Hi,
    what do you call "out of specification"? A simple Range? Or something like: It should be real, but is nominal?

    For Outliers i would look for example at Tukey test. For the Conformance i would check "Check Model Conformance".

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Process_InternProcess_Intern Member Posts: 13 Contributor I

    An out of spec in my case is a number out of a simple range. AS an example, I want RM to notice me when my att1 value is under 200. A thing to know is that my att1 is Real type. I tried to generate a macro with an "If" formula but it didn't work. The macro was telling me that it wasn't knowing what my att1 was but I had clearly defined it previously. 
  • Options
    Process_InternProcess_Intern Member Posts: 13 Contributor I
    Thank you. 

    I've figured out with I will do, thank you!
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    one neat trick I've learned is that you can use Normalize on the training set, apply the normalize on the application set and check for [0,1]. This way you can easily check all attributes for [0,1] instead of arbitrary ranges.

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.