read csv file skip first n lines

Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
edited December 2018 in Product Feedback - Resolved
The Read CSV operator should be given a parameter option to skip the first n lines (often header lines).
While there is already an option to allow for skipping comments, if the lines do not have a comment indicator, that requires users to manually go in and modify the lines in the file, which is not efficient for automated processing of large numbers of files.
Instead, if the operator could automatically skip the first n lines and then take the header from the n+1 row and read all data normally thereafter, it would drastically improve efficiency of working with csv files.
Brian T.
Lindon Ventures 
Data Science Consulting from Certified RapidMiner Experts
0
0 votes

Fixed and Released · Last Updated

Comments

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Brian,

     

    you can put the first n-lines to "Comment:image.png

     

    that should do the trick.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @mschmitz  Thanks, this actually pointed me to the answer.  If you don't want to run the wizard (which I wanted to avoid since it was going to be in a loop using the "file" input rather than pointing to a specific file), I think you can still accomplish the same thing by using the "Annotations" parameter and setting the first lines to comment, like so:

    annotations.PNG

    I was getting hung up before because there is a separate parameter for a comment character, which I didn't want to have to add manually, but I tested using this method and it appears to work, starting the import on the specified line and taking the correct number of columns from that.  So thanks for the pointer!

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    workaround available

Sign In or Register to comment.