Options

How different between declare missing value and replace missing value ?

supremerrysupremerry Member Posts: 1 Newbie
The instructure told me to replace missing value by 0. By the way, he told me to use declare missing value operator. I wonder why don't we use replace missing value operator... 

Best Answers

  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    edited April 2021 Solution Accepted
    Hi @supremerry,

    These two operators have different use. I usually use the declare before replace. But it depends..

    In RapidMiner, missing values are represented by question mark (?). Saying your age attribute was recorded as -1 or 9999 for missing age, you will "declare" that -1 or 9999 in this column should be treated as missing, not a valid age. If you use "replace missing value" on the missing age before declare it, it will treat -1, 9999 as non-missing values in age attribute. Because "replace missing value" will only act on the question marked (?) data.

    It depends on how dirty is the input data. I always explore the data to check any special coding of missing values before handling missing values. Sometimes the invalid data is automatically recognized as missing (represented as "?") during loading step. But sometimes not. If the data is coming from another platform, the missing values are not properly declared, e.g. in SAS, numeric missing values are represented by a single period (.), and character missing values are represented by a single blank enclosed in quotes (' '). While in R, NA is for missing values; in python NaN is for missing.

    HTH!

    YY


  • Options
    ceaperezceaperez Member Posts: 525 Unicorn
    Solution Accepted
    Hi @supremerry

    There are a few differences between operators.
    by one side, the Replace Missing Operator, replace the missing value with a short list of options like medium, maximum zero, etc.
    By other hand, the Declare Missing Values Operator is more flexible and complete, allowing you to replace a a missing value for a numeric value, nominal value, or for an expression, using the powerful expression editor from Rapidminer. 
    Depending on what do you want to do with your missing values and the model that you want construct and the role of the missing values in your model, you may select one or another

    Best
Sign In or Register to comment.