🥳 RAPIDMINER 9.9 IS OUT!!! 🥳

The updates in 9.9 power advanced use cases and offer productivity enhancements for users who prefer to code.

CLICK HERE TO DOWNLOAD

How to use prediction() in formula?

keithkeith Member Posts: 157  Guru
edited November 2018 in Help
Hi again,

How do you use the predictions of a model in a subsequent FeatureGeneration formula to transform the values?

I've developed a model that predicts the log-odds of an event, but I want to transform it back to a probability so I can plot it on a scale that is more familiar.

e.g.  LinearRegression of y on x, then use Model Applier to get prediction(y)

I can use FeatureGenerator to convert the dependent to a probability:

prob_y = exp(y) / (1+exp(y))       

... which in RM would be written as:

/(exp(y),+(const[1](), exp(y))

However, when I try to do the same thing with the predicted values, which are named by the ModelApplier node as "prediction(y)", the parentheses in the name cause errors:

/(exp(prediction(y)),+(const[1](), exp(prediction(y)))

I've tried it with single or double quotes, without success:

/(exp('prediction(y)'),+(const[1](), exp('prediction(y)'))
/(exp("prediction(y)"),+(const[1](), exp("prediction(y)"))

Is there a way to reference a feature name that contains a special character (like parens), or do you have to rename the feature to something "safer"?

Thanks,
Keith

Answers

  • keithkeith Member Posts: 157  Guru
    Following up on my own question...

    I renamed the predicted value from "prediction(y)" to "pred_y".  I am then able to use pred_y in mathematical expressions.  However, I can not create "complex" mathematical expressions, only simple ones.

    e.g.

    I want to be able to do

    pred_y_prob    =    /(exp(pred_y), +(exp(pred_y), const[1]()))

    However, I get an error when trying to do so.

    NullPointerException occured in 1st application of FeatureGeneration
    Process failed: operator cannot be executed.  Check the log messages...

    But if I break it up into three steps within the FeatureGeneration node, it works:

    pred_y_exp      =    exp(pre_y)
    pred_y_plus1  =    +(const[1](), pred_y_exp)
    pred_y_prob    =      /(pred_y_exp, pred_y_plus1)

    The thing that really puzzles me is that my original formulation works if I use the target variable instead of the predicted target variable:

    y_prob    =    /(exp(y), +(exp(y), const[1]()))            #  This works!

    Any ideas?

    Thanks,
    Keith
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 291  RM Product Management
    Hi Keith,

    we experienced such problems ourselves and are currently working on that issue. This however involves a complete re-design of the feature generation which is in fact a bit tedious since the feature generation mechanism in RM is quite complex. Hence, it will probably take a while. But we well keep you informed, once we have finished the work on the feature generation. Until then you would have to use the workaround you discovered yourself. Sometimes there is another trick which avoids an odd behaviour of the feature generation: simple write the data set to a file and reload it before generating the features.

    Regards,
    Tobias
Sign In or Register to comment.