Options

"Replace Missing Values - help!"

SguishSguish Member Posts: 13 Contributor II
edited June 2019 in Help
Hello everybody,
this is my first post here and I hope I'm just stuck on something trivial.
I'm building a process with a libSVM. My dataset's label attribute (casi_certi) has missing values ("1" and "?"), so I tried using Replace Missing Values to transform it into a binary operator with "1" and "0". There are also other attributes with missing values, and they are all replaced properly - as is the label, if I just connect Replace with the output and run the process and check the results. But if I try and check the dataset at the exa port of Replace, something goes wrong and the label attribute has the right type (Binomial), zero missings, and [1] range instead of [0-1]. This is a problem since libSVM obviously needs at leat two classes/values. I really can't understand what's wrong, so I'm here asking for help - thank you!
Tagged:

Answers

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi...just to clarify - the problem occurs only when you add Set Role to Label?
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    just making sure...there should be no range because you have transformed into binominal (from numeric I presume).  In nominal/polynominal/binominal attributes, numbers are treated like text.

    Scott
  • Options
    SguishSguish Member Posts: 13 Contributor II
    I resolved switching the label attribute type from integer to binomial at the moment of adding the csv file to the repository!
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    aha - I see.  ok it's generally not a good idea to constantly pull data from a csv file - better to load into the repository like you did.  If you want to do it this way, I'd recommend setting the meta data in the Read CSV operator so that it pulls in the attributes in the way you want, as opposed to allowing RapidMiner to choose everything automatically.  You do this in the Parameters pane of Read CSV, and click on the "Edit List" button next to "data set meta data information".  It's an advanced parameter so you may need to turn on "Advanced Parameters" at the bottom of the pane.

    Scott


Sign In or Register to comment.