How can I take only the variables with at least 5.000 observations?

ceci_roceci_ro Member Posts: 3 Contributor I
Hello folks, 

I need a hand here...
How can I take only the variables with at least 5.000 observations?
I have too many variables, thank you in advance.


Cecilia 


Best Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 901 Unicorn
    Solution Accepted
    Hi @ceci_ro

    one approach would be using the Quality Measures operator. It calculates measures like missing values for each attribute.
    Then "ExampleSet to Weights" from the Converters extension. Here you can select the attribute name and the measure you need (missing values). 
    Then "Select by Weights" with a copy of the original data and the weights you created. Weight relation = less equals, weight = e. g. 0.2 or whatever is appropriate for your data.

    Regards,
    Balázs 
  • ceci_roceci_ro Member Posts: 3 Contributor I
    Solution Accepted
    There is an operator that does this function: Toolbox extension, Filter Attributes with Missing Values ​​operator. Simple and beautiful.
    BalazsBarany

Answers

Sign In or Register to comment.