"Read Excel without number parsing?"

colocolo Member Posts: 236  Guru
edited May 23 in Help
Hello everybody,

I'm just building a process for extending some data I collected from the web earlier. The previous processes finally created an Excel file containing all the relevant data. There also were some numbers with leading zeros (postal codes, area codes) which were extracted and written as text (certainly because of this Excel didn't remove the leading zeros earlier). Now I want to grab that data again, and load it into my process via "Read Excel". Now guess what happens? All those numbers are parsed as integer, leading zeros are removed and when written via "Write Excel" one fraction digit is added to all those numbers (although they are displayed as integer before). The "Read CSV" operator allows to disable the unwanted parsing, do you have any suggestions what to do best in this case?

Thanks for all your hints and help.
Regards,
Matthias
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi Matthias,
    there have been many changes in the last few versions of RapidMiner regarding the read operators. Which version are you exactly using?

    Greetings,
      Sebastian
  • colocolo Member Posts: 236  Guru
    Hi Sebastian,

    I am currently always using the newest version available through subversion (building and running RapidMiner from eclipse) since there have been some relevant fixes. This should be a (still mislabeled  ;)) 5.0.010 with some additional changes.

    Regards,
    Matthias
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi,
    ok then. Did you try to configure the read excel operator using the wizard? It offers more settings than the parameter itself.

    Greetings,
      Sebastian
  • colocolo Member Posts: 236  Guru
    Hi Sebastian,

    thanks for this suggestion. Using the wizard even the guessing results in type "nominal" for my postal codes, otherwise I would be able to change it manually - this is great. But as it seems the wizard doesn't offer a possibility to take the first row (containing the column headings) as attribute names. Setting the respective parameter afterwards stays without effect. It's a bit confusing, I usually consider wizards as an easy guide through setting parameters, but in this case parameters and wizard seem to be independent from each other. Decisions made in the wizard can not be revised later using the parameters - perhaps usability could be improved here? ;)

    Regards,
    Matthias
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi Matthias,
    inside the wizard you can select the usage of a row. See the second step, first column. You can select a "Name" usage there.

    Greetings,
      Sebastian
  • colocolo Member Posts: 236  Guru
    Hi Sebastian,

    thank you for that hint - I indeed didn't notice that the first column was clickable since this possibility isn't mentioned there (the column heading "Use as" is the only clue - that was not enough for me ;)). I didn't touch the process for some days, so I just tried it. First it seemed to work fine, but the following operators complained about missing attributes. So I used the wizard again and noticed that the columns are labeled correctly up to the last one (which was named attribute_11 although it was properly labeled as the other attributes). I wanted to check the name of the previous attribute and incresed the column width to reveal the full name. While doing this via mouse dragging all the other column labels disappeared and changed to the default labels (attribute_1 - attribute_n).
    I'm using the latest version from SVN, could perhaps someone verify this problem?

    Thanks,
    Matthias
Sign In or Register to comment.