Options

How can I take only the variables with at least 5.000 observations?

ceci_roceci_ro Member Posts: 3 Contributor I
Hello folks, 

I need a hand here...
How can I take only the variables with at least 5.000 observations?
I have too many variables, thank you in advance.


Cecilia 



Best Answers

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi @ceci_ro

    one approach would be using the Quality Measures operator. It calculates measures like missing values for each attribute.
    Then "ExampleSet to Weights" from the Converters extension. Here you can select the attribute name and the measure you need (missing values). 
    Then "Select by Weights" with a copy of the original data and the weights you created. Weight relation = less equals, weight = e. g. 0.2 or whatever is appropriate for your data.

    Regards,
    Balázs 
  • Options
    ceci_roceci_ro Member Posts: 3 Contributor I
    Solution Accepted
    There is an operator that does this function: Toolbox extension, Filter Attributes with Missing Values ​​operator. Simple and beautiful.

Answers

  • Options
    ceci_roceci_ro Member Posts: 3 Contributor I
    Hi @BalazsBarany

    Thank you very much. I think it is works!!

    Great explained :smile:

    Regards, 

    Cecilia


Sign In or Register to comment.