RapidMiner now offering a 30 day free trial of RapidMiner Studio Large! Learn more

Read TSV & Read Multiple files from S3

Contributor I MichaelWall
Contributor I

Read TSV & Read Multiple files from S3

Hi All,


I have three related questions, I'm tying to read in files from Amazon S3. The Amazon Read S3 operator works fine, but I have three problems:

1) I'm trying to read in tsv files, so I've connected 'Read Amazon S3' to 'Read CSV' but it results in no records. I also tried with 'Read Excel' but that just throws errors. Is there an operator that can handle tsv files?

2) Eventually I want to be able to read all files in an S3 bucket, rather than just selecting one. So is there a way of looping through all the files?

3) Are there any operators I can apply to filter multiple file types through the workflow, so they get routed to the appropriate 'Read' operator? A bucket may have different file formats in it, and the process will throw errors if it try to read in the wrong file extention.





RM Staff
RM Staff

Re: Read TSV & Read Multiple files from S3

Dear Mike,


for 1): Have you tried Read CSV with tab as a delimiter? The default takes ; as a delimiter.

2) Have you had a look at Loop Amazon S3?

3) Loop Amazon has a filter option where you can use .+tsv as a regex to just include tsv files in the loop.




Head of Data Science Services at RapidMiner
ezCater's RapidMiner Journey