We have TONS of videos to help you learn RapidMiner - from beginner to expert. Click to watch!
We're competing as Team "RapidMiners" in DrivenData's latest competition: "Pover-T". Join now!
Read about how our community works. Meet other newbies. Get your questions answered fast!
I come across a task is to read the data from a publication from the government. when I read it with rapid miner, it shows the wrong label and wrong cell content. the file is in CSV format. For example, if I read it in excel, the cell has 2 lines but rapid miner shows 2 cells instead.
It's a bit tricky to understand the issue without knowing how exactly you are expecting the data to look like.
I tried to read this file with RM and logically the structure looks ok, except that the heading row seems to be consisting of multiple (bi-lingual) entries (?). Maybe this is where you should go for manual columns renaming. Otherwise, comma delimited values in the columns seem valid to me.
UPD: opening file in Excel (Mac version) gives me that:
indeed it would be better to have a file with better (i.e. consistent) formatting.
In the actual case you could use the Read CSV Operator with the Wizard and play around with the settings. Screenshots with my settings are attached. They worked for me to get at least a portion of your data.
Unfortunately, I think fancy regular expressions also won't help you in this case.
You need to have all relevant attributes in one line. Your desired output reflects that you want a combination of the lines 4,5 and 6 as Attribute names. So you need to manually combine them using a file editor. While doing this I recommend to delete the lines 1-3.
Then you should be all set to read in your file with Separator "Comma" and "Use Quotes" checked.