CSVExampleSource and Numeric?

Stefan_EStefan_E Member Posts: 53 Maven
edited November 2018 in Help
Hi,
I try to load a .csv file with lines as so:
  64977,0.0542555,0.0531881,0.053273,0.0528606,0.0520237
through CSVExampleSource. The first column I declare as label. The rest gets converted to 'Nominal' instead of how I'd expect to Numeric.

I tried NominalNumbers2Numerical, but it does just nothing: the attributes stay nominal.

(I can work with ExampleSource - but what is CSVExampleSource good for then?)

Kind regards                      Stefan

Answers

  • Stefan_EStefan_E Member Posts: 53 Maven
    ... same problem as in my other post on PCA: The dataset contained a line with all missing values except for the label. Once I remove this line, CSVExampleSource correctly recognizes the real values.

    So, next time I'll print out my .csv files and read them in my leisure time before I give them to RM  :o

    Stefan
  • martynsmartyns Member Posts: 15 Maven
    Another one that I have found happening to me recently is that excel/spss often format numbers in their thousands with
    1,234 for example.

    This is a horrible thing to have happen when you save it as a csv!
  • steffensteffen Member Posts: 347 Maven
    Hello
    Stefan_E wrote:

    So, next time I'll print out my .csv files and read them in my leisure time before I give them to RM  :o
    Just a remark: You can use RM as an ETL tool, but this is not its primary focus. For this task other tools are more suitable. Beside... I tried to work with Pentaho Kettle and even this tool (meant as ETL) has problems to handle any kind of corrupted data.
    martyns wrote:

    Another one that I have found happening to me recently is that excel/spss often format numbers in their thousands with
    1,234 for example.
    I guess this is an adjustable option in excel ;)

    regards,

    Steffen
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    just a short comment: If you get your files into excel, you simply might import it from the excel file directly. This way you come around the problems with separator and number formating.

    Greetings,
      Sebastian
Sign In or Register to comment.