Options

discard attribute with more than x% missing values operator

dan_agapedan_agape Member Posts: 106 Maven
edited June 2019 in Help
A suggestion: the above operator (see subject) seems to be needed in RM. It is very useful in the data pre-processing step. This simple but essential function is offered in almost any popular DM suite.

Best
Dan

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    you are right, such an operator would be nice. I have uploaded a process with our new Community Extension which performs exactly the desired task. It is called "Discard Attribute with More than x% Missing Values (Loops + Macros)" and you can download and execute the process with a few clicks after having installed our new myExperiment Community Extension from the help menu of RapidMiner.

    This process loops over all attributes and calculates the fraction of missings for each attribute. If this fration is larger than the fraction defined in the first "Set Macro" operator (macro: max_unknown), the attribute will be removed from the example set.

    Cheers,
    Ingo

  • Options
    dan_agapedan_agape Member Posts: 106 Maven
    Hi Ingo,

    Thanks for the prompt reply. The RM team does a great job, and we, the users, thank you for that.

    BTW, that's an excellent thing that most of Weka algorithms are included under a plug in component. However it would be useful perhaps to include all the pre-processing functionality from there, although RM is very strong in this. In particular the operator from the subject would have been included.

    Best
    Dan 
Sign In or Register to comment.