Newbie Question - Repository Data Access

wirtcal · January 2011

Hey community....

I have one excel file that i want to explore... but it has many fields that i wont to use in some experiments, but i'll use in others...

how can i ignore X,Y,Z fields accessing the same repository (or reading the same excel file).. is this possible?

actualy i save many custom excel files...

other question is with the "set role"...
... when i use the "read csv/excel file", i have to specify what the fields are... i'm using the "set role".. is this the best way?

thanks! and sorry for my english hehehhe

steffen · January 2011

Hello witcal

Regarding your first question:
If you use different subset of features for different tasks and do not want to filter them again and again, then storing the data in various forms in the repository is the way to go. The operators to go are named "Store" and "Read". The reason for this is, that the data in the repository is always in the internal rapidminer data format, so the data has not be to converted again and again (speedup !).

Regarding the second question:
I assume you are referring to the Import-Configuration-Wizard, I generally recommend to use it (it's great !), but frankly, I do not understand your question here.

greetings,

Steffen

btw: Have you read the manual ? (just to be sure, no offense). In case not, here is the link *click*

wirtcal · January 2011

Hey, steffen.. thanks again for support..

I tried to ask, on my second question a thing like this....

In my repository called "HelloMiner", i had imported one csv file with maaaaaaany fields....
The miner asked me what a kind of each field are: regular, id, label, cluester etc.... and i set all as regular

[consider HelloMiner data with A,B,C,D,E,F and G fields]

In my Experiment X, i want to use just the fields A, B, C --- A as id, B as regular and C as label
In my Experiment Y i wanto to use the fields B, C ,E ,F ---- C as integer, F as Label and the others as regular
In my Experiment Z i wanto to use all fields of repository....

Actualy im saving a lot of custom csv files:
For X experiment... a csv file just with A, B C fields... and i'm using:
Read CSV -> Set Role -> (...)

For Y experiment, i saved another csv file just with B,C,E F fields....

... my question is...

can i use just one csv file with all my fields and filter the columns that i need to experiment?

like:
read the FullCSV File or Load Repository data -> Filter useful Columns -> Set Role -> Modeling

i'm with data type problems in some experiments... so i want to ignore some fields

I want to desconsider fields like "phone number"... because sometimes it confuses my models....

one more time Thanks!!

Best Regards...

PS: i have not read the manual [yet]... but i started!

[i hope you understand my english]

steffen · January 2011

Hello witcal

Ok, it seems you want to filter them again and again

.

As far as I understand you, you may find the following operators helpful:

Data Transformation -> Name and Role Modification -> Set Role
Data Transformation -> Attribute Set Reduction and Transformation -> Selection

"integer" etc reflects the type of data stored and is independent of the role. You may want to set this "value type" once for all experiments and the role to "attribute". Then you change the roles as you like.

For data transformation regarding value type there are also a ton of operators available. See Data Transformation -> Type Conversion

I strongly recommend to read the manual and (more important) check the provided example processes. Rapidminer is like Lego (as Haddock once said): See the single units and you can build (nearly) anything.

good luck,

steffen

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Newbie Question - Repository Data Access

Answers