RapidMiner

I cannot read the data correctly. Help.

Wisdom logo Registration now open for RapidMiner Wisdom Americas | New Orleans | October 10-12, 2018   Learn More
Learner I wdeng
Learner I

I cannot read the data correctly. Help.

I come across a task is to read the data from a publication from the government. when I read it with rapid miner, it shows the wrong label and wrong cell content. the file is in CSV format. For example, if I read it in excel, the cell has 2 lines but rapid miner shows 2 cells instead.

https://files.ontario.ca/opendata/2536_bridge_conditions.csv

8 REPLIES
Unicorn
Unicorn

Re: I cannot read the data correctly. Help.

Hi @wdeng

 

It's a bit tricky to understand the issue without knowing how exactly you are expecting the data to look like.

 

I tried to read this file with RM and logically the structure looks ok, except that the heading row seems to be consisting of multiple (bi-lingual) entries (?). Maybe this is where you should go for manual columns renaming. Otherwise, comma delimited values in the columns seem valid to me.

 

Screenshot 2017-11-13 10.53.11.pngScreenshot 2017-11-13 10.52.56.png   

 

UPD: opening file in Excel (Mac version) gives me that: 

 

Screenshot 2017-11-13 11.31.03.png

Unicorn
Unicorn

Re: I cannot read the data correctly. Help.

I am afraid the header of the csv is incorrect. I think that you should edit it to be only one line separated by single commas.

RM Staff RM Staff
RM Staff

Re: I cannot read the data correctly. Help.

Hi @wdeng,

 

indeed it would be better to have a file with better (i.e. consistent) formatting.

 

In the actual case you could use the Read CSV Operator with the Wizard and play around with the settings. Screenshots with my settings are attached. They worked for me to get at least a portion of your data. 

 

Best,

Edin

 

image.png

 

image.png 

 

image.png

------------------------------------------------------------------------------------------------------------
How can I share my RapidMiner Process?
Where do I find the Logfile of RapidMiner Studio?
Where do I find the Logfile of RapidMiner Server?
Where do I find the Logfile of a RapidMiner Server Job Agent?
Where do I find the Logfile of a Process executed on a RapidMiner Server Job Agent?
------------------------------------------------------------------------------------------------------------
Learner I wdeng
Learner I

Re: I cannot read the data correctly. Help.

csv.PNG

I expect the data look like this. Thank you for replying.

Learner I wdeng
Learner I

Re: I cannot read the data correctly. Help.

Once I edit the CSV file, the file will be corrupted. 

Learner I wdeng
Learner I

Re: I cannot read the data correctly. Help.

I did this but rapide miner only shows half of the header. It cuts off at current header.

 

csv.PNG

Highlighted
Unicorn
Unicorn

Re: I cannot read the data correctly. Help.

This particular CSV file is trickier than the rest because of the it's malforming when you load it into RM. There are errors relating to Row 3 and it's header file. I think this might require the writing of a REGEX instead of using just the comma as a default deliminator
Regards,
Thomas

Blog: Neural Market Trends

RapidMiner Tutorial Videos here!
RM Staff RM Staff
RM Staff

Re: I cannot read the data correctly. Help.

Hi @wdeng,

 

Unfortunately, I think fancy regular expressions also won't help you in this case.

You need to have all relevant attributes in one line. Your desired output reflects that you want a combination of the lines 4,5 and 6 as Attribute names. So you need to manually combine them using a file editor. While doing this I recommend to delete the lines 1-3.

Then you should be all set to read in your file with Separator "Comma" and "Use Quotes" checked.

 

Best,

Edin

 

------------------------------------------------------------------------------------------------------------
How can I share my RapidMiner Process?
Where do I find the Logfile of RapidMiner Studio?
Where do I find the Logfile of RapidMiner Server?
Where do I find the Logfile of a RapidMiner Server Job Agent?
Where do I find the Logfile of a Process executed on a RapidMiner Server Job Agent?
------------------------------------------------------------------------------------------------------------