how to import multiple files
Hi,
is there any chance to import multiple files at once? I've got about 70 .csv files having the same scheme that I want to import into a rapidminer repository. Actually, I can't figure out how to solve this problem without any user interaction :-/
My quick and dirty workaround is a little ruby script that
first) reads all filenames of a given directory and
second) creates a rapidminer project-file containing lots of readcsv and store operatores.
I guess, that's not the way you're meant to import multiple files
is there any chance to import multiple files at once? I've got about 70 .csv files having the same scheme that I want to import into a rapidminer repository. Actually, I can't figure out how to solve this problem without any user interaction :-/
My quick and dirty workaround is a little ruby script that
first) reads all filenames of a given directory and
second) creates a rapidminer project-file containing lots of readcsv and store operatores.
I guess, that's not the way you're meant to import multiple files

0
Answers
you can use "loop files" operator. Than you will probably need "append" operator to merge all example set from collection to one.
Best,
Vaclav
Create the following process:
Loop Files -> Append -> Write CSV
Click on Loop Files, define the parameter directory to point to the directory where your CSV files (or other files readable by rapidminer) are.
Double click on Loop Files to go into this sub-process.
Create the following sub process:
fil -> Read CSV -> Select Attributes -> out
"fil" and "out" are not Operator objects, they are the connectors on the left and right border of the window that look like knobs.
Click on Select Attributes and select the parameter attribute filter type to either subset or regular_expression.
For subset, click on the Select Attributes... button, and add the attributes (columns) of your CSVs that you want to have in your merged output. Add them in the right list of the window by typing the name and clicking the plus icon.
For regular_expression you can define a list of attriutes (columns) like this: .*attribute1.*|.*attribute2.*|.*attribute3.*
example:
Then you are done.
Merge columns with different names into the same column (attribute):
In case you have columns in your CSV with different naming, like: e-mail, eMail, e_mail you can do the following:
In your existing Select Attributes Object choose regular_expressions. Define a regular_expression that contains all columns you want, and also the variants. If I have the following columns:
I would create the following regular_expression:
This will still create an output with different columns (attribute). To merge the 3 Email columns into one, you have to rename them to be identical. Add a Rename by Replacing Object after the already existing Select Attributes.
On the Rename by Replacing Object select regular_expression as the attribute filter type. Then fill out the fields below like:
regular expression: .*mail.*
replace what: .*
replace by: mail
I attached a gif of my process, to clarify.
very nice explanation, @underlines !
Scott