RapidMiner

Read TSV & Read Multiple files from S3

Contributor I MichaelWall
Contributor I

Read TSV & Read Multiple files from S3

Hi All,

 

I have three related questions, I'm tying to read in files from Amazon S3. The Amazon Read S3 operator works fine, but I have three problems:

1) I'm trying to read in tsv files, so I've connected 'Read Amazon S3' to 'Read CSV' but it results in no records. I also tried with 'Read Excel' but that just throws errors. Is there an operator that can handle tsv files?

2) Eventually I want to be able to read all files in an S3 bucket, rather than just selecting one. So is there a way of looping through all the files?

3) Are there any operators I can apply to filter multiple file types through the workflow, so they get routed to the appropriate 'Read' operator? A bucket may have different file formats in it, and the process will throw errors if it try to read in the wrong file extention.

 

Thanks

 

Mike

1 REPLY
Highlighted
RM Staff
RM Staff

Re: Read TSV & Read Multiple files from S3

Dear Mike,

 

for 1): Have you tried Read CSV with tab as a delimiter? The default takes ; as a delimiter.

2) Have you had a look at Loop Amazon S3?

3) Loop Amazon has a filter option where you can use .+tsv as a regex to just include tsv files in the loop.

 

Best,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner