Options

Data Cleanup

iasoniason Member Posts: 20 Contributor II
edited November 2018 in Help
Hello all,

This is my first post and my first attempt to work with actual data on Rapidminer, so please excuse any ignorance.

What I am trying to achieve is cleanup my data, imported from csv files.
First of all, I have a lot of missing values, which show up as ? on the tables. I need a way to keep those out.
Secondly, I have some rules (ie att1*att2 < 5000) and I want to filter the data based on that, preferably without adding an extra column.
I can do all that in a spreadsheet and import clean data in RM, but it would save much time if done internally.

Thank you all in advance.

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    First of all, I have a lot of missing values, which show up as ? on the tables. I need a way to keep those out.
    What would you like to filter out? Examples containing any missing value (?) or attributes containing any missing values? For the first, you would use the operator "Filter Examples" with condition "no missing attributes" and for the second you would use the operator "Select Attributes" with filter type "no missing values".

    Secondly, I have some rules (ie att1*att2 < 5000) and I want to filter the data based on that, preferably without adding an extra column.
    Currently the best option probably is to create such an index colum with the operator "Generate Attributes", filter the examples with "Filter Examples" and remove the index column again with "Select Attributes".

    We are actually revising the operator "Filter Examples" for one of the next versions and it will certainly also allow to use expressions like those directly in the operator then.

    Cheers,
    Ingo
  • Options
    iasoniason Member Posts: 20 Contributor II
    Thank you, problem solved
Sign In or Register to comment.