How to handle empty fields problems (Not missing data) in a data set

MasoudGMasoudG Member Posts: 2 Learner I
edited May 2019 in Help
Hello guys.
I have a data set that I collected from 35 companies. one of my attributes is: "do they have this type of plan" and the values will be "Yes" and "No" and my second attribute is "how much is the price of this plan" so for the companies that their first attribute is "Yes" the value would be a number like 30 euros, but for the companies that their first attribute is "No" this filled is empty.
I want to do clustering but because of the empty fields, I can't proceed. I don't want to remove this attribute or any example or even fill up these fields with any missing data techniques, because they are not missing.
is there any technique in Rapidminer to define:  if the first attribute is no then ignored the second attribute for that example?
Thank you very much

Best Answer

Answers

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research

    you have different options here, depending on what you actually want to cluster and how you want to proceed afterwards.
    You could use the Replace Missing Values operator to replace the Value field with something useful (for example 0 or the average price).
    The other option is to first use Filter Examples and filter either for the "Yes" in relevant attribute  or "no_missing_attributes".

    Best,
    David
  • MasoudGMasoudG Member Posts: 2 Learner I
    Hi @David_A
    Thank you very much for your quick response. Actually, I have around 30 different attributes of 35 companies and I want to cluster these companies based on their features.
    1- Replace Missing Values: I don't want to replace any value in these fields since they are not missing. they do not have any value because they do not have this type of plan and i think replacing a value like 0 or average can affect the clustering process.
    2- Filter Examples: I don't want to filter any example because my examples are my companies and my main goal is clustering them, so I need them.
    Do you have any other idea? 
    Thank you in advance.
    Masoud

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    What do you mean exactly by the statement that you can't proceed with clustering because of the empty fields? Are you saying the clustering algorithm is preventing you because those fields are currently designated as missing within RapidMiner?
    If you need to remove all your missing values in order to run the clustering algorithm you want then you can populate them appropriately with a two-step process.  First use Generate Attributes and an expression to say something like PricePlan=if(HavePlan="Yes",Priceplan,"N/A").  This will keep whatever the value is in the price of the plan variable if they answered yes to whether they have the plan, and if they did not answer yes then it will set the value of the price of the plan to "N/A" (or you can make this whatever you want).  Then you can run a subsequent Replace Missing Values and decide how to represent the missing prices where they answered yes to having the plan (for example, with the average price).
    If the fields are not technically missing but simply populated by a space or similar, then you should be fine. 
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.