Options

"How to declare filtered values/non relevant values in RM?"

CausalityvsCorrCausalityvsCorr Member Posts: 17 Contributor II
edited June 2019 in Help
Hello,

How to denote in RapidMiner, that certain type of missingness should not be taken into account while making calculations?

The type of this missingness is not missing at random (MAR) nor missing completely at random (MCAR), but instead  the parameter value is missing because logically the value cannot exist (e.g. due to some other parameter). At least in my case “Filter Examples” nor “Filter Parameters” does not help, because if using them, all the data is gone due to high amount of this kind of missing values.

I tried also Declare missing values options (and for it instead of empty cell a code NR, not relevant) but the output from this operator ended up to the need to use normal “replace missing value –processes which leads to biased results.

This kind of feature can be called as “declare filtered value” or “ declare non-relevant values”.

regards P/K
Tagged:

Answers

  • Options
    DocMusherDocMusher Member Posts: 333 Unicorn
    Dear P/K,
    Although I will not be able to give you an answer to your question, you statement on bias is an issue which needs further attention in the RM community. Outliers can obviously also generate bias and should be considered when keeping them. Replace missing values by.. or any other editing RM provides to "clean" your data is as I must admit a new, but up to now not accepted approach for classic scientists. Therefore, we need to define a policy after communication with peers from statistic societies on how this can be used (or not) in scientific research and if used how it should be mentioned and motivated in a paper.
    Cheers
    Sven
Sign In or Register to comment.