Options

Reading non-standard data files structures - pls help

roger_ilesroger_iles Member Posts: 1 Contributor I
edited November 2018 in Help

I  am evaluating RapidMiner as a solution to performing research and applicatioin prototyping. It's important to have an easy way to import data easily and manipulate it into the structure I need it before storing the result to a DB - I need to create this capability to work repetetively for many files. 

However, I have hit an early block, as although I can read in data from a file containing a standard table, I hit issues if the file contains a slightly different structure. Is there a straightforward way to read in csv and excel data when the header structure is either not standard or even repeats (e.g. multiple data sets in one file appended one after another)

 

I have provided one example of one of the data files below, in which one of the columns is time, however there is no date column as the date is instead stored as meta data in the top of the file. I need to add the date to the time to create a date-time column but I can't find a straightforward way to read in the different parts of the data file - meta data and column data - separately and consequently perform the data transformation to create a new table to store to the DB.

 

Any advice would be welcome.

Thanks 

Roger

 

ABC Aircraft Registration      
XYZ Nose Number      
123 Flight Number      
CDE Departure Station      
FGH Destination Station      
31.10.2014 Date        
           
           
Offset AIR/GROUND  GMT (HH:MM:SS) PRESENT POSN LATITUDE (DEG) PRESENT POSN LONGITUDE (DEG) ALTITUDE (FEET)
785 GROUND 18:02:44 11.97018 -8.92304 874
845 GROUND 18:12:44 21.9698 -7.9315 881
905 GROUND 18:22:43 31.96881 -6.93081 892

Answers

  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    see if the attached example and accompanying videos gives you some ideas

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yes, you can comment out the repeating header lines in the Read CSV or XLS wizard. I do this all the time with NOAA weather data. The Read CSV operator is like the swiss army knife of data loaders in RapidMiner, it can handle many other different file formats and encoding too. I used to read in txt files too.

Sign In or Register to comment.