"discarding attributes with many missing values"

dan_agapedan_agape Member Posts: 106 Maven
edited June 2019 in Help

Hi there

Just enquiring if there is a pre-processing operator that discards attributes having more missing values than a specified threshold (given as a percentage for instance).

Thanks!
Dan
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Dan,
    I think you can use the Remove Useless Attributes if the missing values exceed the number of same nominal values.

    Anyway you could post a feature request on our bugtracker, since I think a dedicated "less than x% missing values" filter makes absolutely sense.

    Greetings,
      Sebastian
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    you are right, such an operator would be nice. I have uploaded a process with our new Community Extension which performs exactly the desired task. It is called "Discard Attribute with More than x% Missing Values (Loops + Macros)" and you can download and execute the process with a few clicks after having installed our new myExperiment Community Extension from the help menu of RapidMiner.

    This process loops over all attributes and calculates the fraction of missings for each attribute. If this fration is larger than the fraction defined in the first "Set Macro" operator (macro: max_unknown), the attribute will be removed from the example set.

    Cheers,
    Ingo
  • dragoljubdragoljub Member Posts: 241 Contributor II
    Hey how can we access this operator? Do we have to sign up for any service?



    Thanks,
    -Gagi
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    in fact Ingo uploaded a complete process not a single operator. You can download the Community Extension as usual with the update manager and you don't have to sign into the community itself to download public available processes.

    Greetings,
      Sebastian
  • dragoljubdragoljub Member Posts: 241 Contributor II
    Thanks Guys,

    For some reason I did not see the list of public processes. This will help a lot.

    While this works it seems very cumbersome, is there any way to extract meta data and filter based on number of missing values?  ;)

    Thanks,
    -Gagi
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I guess Ingo wouldn't have posted this process if an easier way existed without coding either on your or our side. If you find an easier solution or if you extend RapidMiner on your own, please keep the community informed about this issue.
    Greetings,
      Sebastian
  • haddockhaddock Member Posts: 849 Maven
    Greets Seb,

    I must be missing something, would transposing the data and applying Ingo's stuff not work?

    Just a thought.

    Ciao
  • wanglu2014wanglu2014 Member Posts: 19 Contributor II

    Thank for your suggestion. However, two problem are met:

    1. community extention is intalled, however, no operator are added.

    2. At https://www.myexperiment.org/workflows/1276/versions/1.html, only txt are downloaded, and can not open as xml.

Sign In or Register to comment.