Splitting output into multiple (many) csv

MichaelWallMichaelWall Member Posts: 9 Contributor II
edited November 2018 in Help

Hi

 

Question from a newbee. I have a process built in RapidMiner studio that creates an output containing anywhere between 100 and 5000 rows (depending on starting input). I want to write out the output as one csv per row. At the moment I can get the full data set using the Write CSV operator, but that just gives me one file with everything, when I want 1 csv per record. I've tried doing this in post-processing by adding a new section to the Python script that handles the data after it's been through the process, but the formatting of the CSV is causing problems. I really want it to come out of RapidMiner in separate files to maintain the integrity of the results.

 

Any thoughts appreciated?

 

Thanks

Best Answer

  • bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist
    Solution Accepted

    Hi @MichaelWall

    Welcome to RapidMiner community.

    See if the attached process helps you. You can open this process from FIle>>Import Process

    You may need to change path of the csv location

    But here is what it does

    I am going to loop examples(rows), basically one row at a time,

    Inside the loop you filter to current row number and then write that one row to one csv

     

    the filename is the rownumber.csv

     

    If you need to name the file differenty, then that should be possible with additonal operator, but hopefully this will get you started

Answers

  • MichaelWallMichaelWall Member Posts: 9 Contributor II

    Thanks for this, works really well, much faster than the existing process I am replicating. The key thing was to set the iteration macro on the Loop Examples operator to row_number so it indexed through each row.

Sign In or Register to comment.