Options

Grouping lines in input file

JoeydJoeyd Member Posts: 2 Contributor I
edited November 2018 in Help
All,

I'm new to RapidMiner and the forum so apologies if I ask the obvious in the wrong place - I've looked around on the forum and on the net and could not find what I'm looking for.

The issue I have is this.
One of our suppliers sends us reporting on processed transactions where the data is spread over several lines in the file. It looks a bit like this:
HDR;somedata
DAT1;somedata
DAT2;somedata
DAT3;somedata
DAT1;somedata
DAT3;somedata
DAT1 - etc.
The information I need is in the records DAT1 and DAT3. DAT2 is optional. There is no identifier in the data that binds the 3 records together, the fact that DAT1, DAT2 and DAT3 appear in the file sequentially defines them as a group and DAT1 starts a group.
What I do now is preprocess the file in a bash script that concatenates the 3 DAT lines. This is an annoying extra step on a remote server which complicates and slows down reading in the report file.

Is there a process I can set up in RapidMiner that can group data in the input file by record type so I can process them as a single line? Or am I asking something the tool isn't meant for? What I do with the data is match it against transactions we expect to receive and creating some overviews.

Regards,

Joe
Sign In or Register to comment.