Options

R-Extension: "R Script" operator changes ExampleSet values

jbgjbg Member Posts: 3 Contributor I
I set up a simple process that reads a .csv file, then passes the ExampleSet to the R Script operator.  The R Script operator is "empty"(for testing purposes) and does nothing except pass the ExampleSet to a Result node.  

When I view the Result ExampleSet (coming out of the R Script Operator), I discover that certain missing values in my input ExampleSet have been replaced with some undesired values.  This has happened with two attributes: one of the type "text", and one of the type "date".  In both cases, these attribute fields have also been re-designated as "nominal" in the output ExampleSet. 

So the R Script Operator has re-designated some of my attributes to "nominal" and arbitrarily replaced missing values with other data.  Neither of these actions were intended by me.

Is this a (serious) bug or should I have expected this behavior?  Is there a way to control how the R Script operator treats its input ExampleSet?

I am using Rapidminer Studio 6.0 Professional.

Answers

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    unfortunately this is a side effect of the way R is integrated in RapidMiner and cannot be avoided. You will need to manually fix these problems via one of the transformation operators or the Guess Value Types operator.

    Regards,
    Marco
  • Options
    jbgjbg Member Posts: 3 Contributor I
    Thank you for your reply.  I will be able to work around this problem in the manner that you suggested.  However...

    1) Is this behavior highlighted in any RapidMiner documentation?  If I had been able to read about this early on, then I would have saved countless frustrating hours diagnosing why I was getting false results in my RapidMiner process.

    2) Is there a plan in place to correct this?  Should I file an official bug report using the Bugzilla application?

    Thank you.
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    1) I'm afraid not. We are in the process of improving documentation, however because that is an extension, it has not yet been improved. It is on our list though.

    2) There is no way to correct this because normally the R script is used to alter the results. Because of the different internal data models used by RapidMiner and R the problem cannot be fixed in a reasonable manner. So the answer is no, this behavior will not change in the foreseeable future.

    Regards,
    Marco
  • Options
    jbgjbg Member Posts: 3 Contributor I
    Okay.  It is disappointing that this isn't documented yet, and it is disappointing that there are no foreseeable plans to fix it.  But it IS a bug.  Should I file it with the Bug Tracker?

    Also, I am new to this forum, so I don't know proper protocol with regard to marking this thread as "[SOLVED]" or not.  Clearly the problem isn't solved, but I guess there is nothing more to discuss.  Would you like me to keep this thread open, or mark it "solved?"
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    I have already created a ticket for it in our internal issue tracker. Also you can keep this thread open for possible future updates.

    Regards,
    Marco
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    it has been pointed out to me that I misread your original post. I overlooked the part about missing values of nominal attributes being converted to something else. This is obviously a bug and will be dealt with. I cannot provide any timeframe, though.

    Regards,
    Marco
Sign In or Register to comment.