ARFF files with ? for nominal data

KazminKazmin Member Posts: 2 Contributor I
edited November 2018 in Help
Hi all,
I'm new to rapidminer so I apologize in advance for any stupid comments that I make.

I have an ARFF file on which I am trying to run a Decision Tree. The problem is that one of my nominal variables has only "?" as values and the decision tree algorithm fails with an error message, if I remove that variable beforehand it finishes correctly with the right result. Is there any way to alleviate that problem? I am going to process automatically a lot of those ARFF files which are also automatically generated  so if there is a way to handle the situation more gracefully it would be great.

Thank you very much for the help, it is highly appreciated.
Nikolay

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    there is a bunch of possible solutions:
    As a starter you can use the remove useless attributes to filter out all attributes that have always the same value. This will affect attributes having unknown values all the time, too.
    Another solution would incorporate the replace missing values or the impute missing values operator. You could take a look at their documentation for more information.
    Last but not least you simply could filter out attributes that have missing values with the select attributes operator.

    Which of this solutions suits you best depends on your task and on what you are going to make with the generated model.

    Greetings,
      Sebastian
  • KazminKazmin Member Posts: 2 Contributor I
    Hey Sebastian,
    thank you very much for the quick and helpful reply, it was exactly what I needed.
    Best Regards,
    Nikolay
Sign In or Register to comment.