Add Random Missing Data points

btibertbtibert Member, University Professor Posts: 146 Guru
I am sure this is possible, but what is the best way to add missing data to a dataset?  I want add noise and save out the dataset for my class to explore and handle.  

Best Answer

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @btibert

    Is this data related to a general problem or time series problem? If this is a general problem, imputing missing values (operator available) based on an algorithm like KNN is suitable and for time series you can go with replacing missing values operator with mean or replace missing values (series) operator with linear interpolation are suitable.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    i usually go for generate attribute with:
    if(rand()<0.2,MISSING_NUMERICAL,value)

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • btibertbtibert Member, University Professor Posts: 146 Guru
    Thanks Scott.  I suppose I could get there via multiple splits and declare missing value paths (and then append/union), but good to know about the Noise Operator because I was not aware.  Thanks!
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    glad we could help @btibert. @mschmitz @varunm1 nice solutions as well. I always love to see how many ways you can tackle a problem in RapidMiner. :smile:

    Scott
Sign In or Register to comment.