"Read Excel without number parsing?"

colo · August 2010

Hello everybody,

I'm just building a process for extending some data I collected from the web earlier. The previous processes finally created an Excel file containing all the relevant data. There also were some numbers with leading zeros (postal codes, area codes) which were extracted and written as text (certainly because of this Excel didn't remove the leading zeros earlier). Now I want to grab that data again, and load it into my process via "Read Excel". Now guess what happens? All those numbers are parsed as integer, leading zeros are removed and when written via "Write Excel" one fraction digit is added to all those numbers (although they are displayed as integer before). The "Read CSV" operator allows to disable the unwanted parsing, do you have any suggestions what to do best in this case?

Thanks for all your hints and help.
Regards,
Matthias

land · August 2010

Hi Matthias,
there have been many changes in the last few versions of RapidMiner regarding the read operators. Which version are you exactly using?

Greetings,
Sebastian

colo · August 2010

Hi Sebastian,

I am currently always using the newest version available through subversion (building and running RapidMiner from eclipse) since there have been some relevant fixes. This should be a (still mislabeled

) 5.0.010 with some additional changes.

Regards,
Matthias

land · August 2010

Hi,
ok then. Did you try to configure the read excel operator using the wizard? It offers more settings than the parameter itself.

Greetings,
Sebastian

colo · August 2010

Hi Sebastian,

thanks for this suggestion. Using the wizard even the guessing results in type "nominal" for my postal codes, otherwise I would be able to change it manually - this is great. But as it seems the wizard doesn't offer a possibility to take the first row (containing the column headings) as attribute names. Setting the respective parameter afterwards stays without effect. It's a bit confusing, I usually consider wizards as an easy guide through setting parameters, but in this case parameters and wizard seem to be independent from each other. Decisions made in the wizard can not be revised later using the parameters - perhaps usability could be improved here?

Regards,
Matthias

land · September 2010

Hi Matthias,
inside the wizard you can select the usage of a row. See the second step, first column. You can select a "Name" usage there.

Greetings,
Sebastian

colo · September 2010

Hi Sebastian,

thank you for that hint - I indeed didn't notice that the first column was clickable since this possibility isn't mentioned there (the column heading "Use as" is the only clue - that was not enough for me

). I didn't touch the process for some days, so I just tried it. First it seemed to work fine, but the following operators complained about missing attributes. So I used the wizard again and noticed that the columns are labeled correctly up to the last one (which was named attribute_11 although it was properly labeled as the other attributes). I wanted to check the name of the previous attribute and incresed the column width to reveal the full name. While doing this via mouse dragging all the other column labels disappeared and changed to the default labels (attribute_1 - attribute_n).
I'm using the latest version from SVN, could perhaps someone verify this problem?

Thanks,
Matthias

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Read Excel without number parsing?"

Answers