Compete in RapidMiner's 3rd Competition: Fantasy Football. Top prize is $750. Deadline December 19.
Download RapidMiner Studio or Server 8.0 Public Beta. Let us know how you like it! Ends November 27.
Watch RapidMiner's "Getting Started" videos on YouTube. Everything you need to do data science - fast and simple!
I have a file which is basically a .CSV-file but it is missformed and need to be corrected before it can be used. (Just to clarify: there is no option to change this.)
I have tried to clean up this file in RM before any further processing, but have failed.
It can be easily cleaned by applying a sequence of regex's.
Just to give an impression of what I would need to do:
Yes I could do this manually upfront, but this is not the intention as it should be a repetitive process.
An option would be to write a Python procedure.
But maybe there is already something out there.
hi @kludikovsky - so this may be too simple a solution but have you tried just using the regex built into the Select Attributes / Filter Examples operators for getting rid of rows/columns via regex? And if the rows to be deleted are always in the same place, I would use Filter Example Range.
I mightbe a on the wrong path, but all those functions require to have an example set as the input.
I am at the stage before having an example set.
I attach the file.
ok I understand. Yes I would use the Read CSV operator first to "convert" to an example set, use those operators, and then go back to Write CSV if you want. If you really want to make changes on the actual CSV file without converting to example set, I would treat the CSV as a document and then use the text operators. But that sounds pretty icky to me!
Just an idea,
Operator toolbox has an operator called Read Lines (or so?) which gives you a collection of documents with one line of the document each. Afterwards, it's possible to use Loop Collection and Extract information to do line wise parsing.
I understand it's "Split Document into Collection".
But any idea how I would be able to ensure the correct sequence of the lines afterwards
and more important how to combine two consecutive lines into one line?