Newbie Question - Repository Data Access

wirtcalwirtcal Member Posts: 16 Maven
edited November 2018 in Help
Hey community....

I have one excel file that i want to explore... but it has many fields that i wont to use in some experiments, but i'll use in others...

how can i ignore X,Y,Z fields accessing the same repository (or reading the same excel file).. is this possible?

actualy i save many custom excel files...

other question is with the "set role"...
... when i use the "read csv/excel file", i have to specify what the fields are... i'm using the "set role".. is this the best way?

thanks! and sorry for my english hehehhe

Answers

  • steffensteffen Member Posts: 347 Maven
    Hello witcal

    Regarding your first question:
    If you use different subset of features for different tasks and do not want to filter them again and again, then storing the data in various forms in the repository is the way to go. The operators to go are named "Store" and "Read". The reason for this is, that the data in the repository is always in the internal rapidminer data format, so the data has not be to converted again and again (speedup !).

    Regarding the second question:
    I assume you are referring to the Import-Configuration-Wizard, I generally recommend to use it (it's great !),  but frankly, I do not understand your question here.

    greetings,

    Steffen

    btw: Have you read the manual ? (just to be sure, no offense). In case not, here is the link *click*


  • wirtcalwirtcal Member Posts: 16 Maven
    Hey, steffen.. thanks again for support..

    I tried to ask, on my second question a thing like this....

    In my repository called "HelloMiner", i had imported one csv file with maaaaaaany fields....
    The miner asked me what a kind of each field are: regular, id, label, cluester etc.... and i set all as regular

    [consider HelloMiner data with A,B,C,D,E,F and G fields]

    In my Experiment X, i want to use just the fields A, B, C --- A as id, B as regular and C as label
    In my Experiment Y i wanto to use the fields B, C ,E ,F ---- C as integer, F as Label and the others as regular
    In my Experiment Z i wanto to use all fields of repository....

    Actualy im saving a lot of custom csv files:
    For X experiment... a csv file just with A, B C fields... and i'm using:
    Read CSV -> Set Role -> (...)

    For Y experiment, i saved another csv file just with B,C,E F fields....

    ... my question is...

    can i use just one csv file with all my fields and filter the columns that i need to experiment?

    like:
    read the FullCSV File or Load Repository data -> Filter useful Columns -> Set Role -> Modeling

    i'm with data type problems in some experiments... so i want to ignore some fields

    I want to desconsider fields like "phone number"... because sometimes it confuses my models....

    one more time Thanks!!

    Best Regards...

    PS: i have not read the manual [yet]... but i started!

    [i hope you understand my english]
  • steffensteffen Member Posts: 347 Maven
    Hello witcal

    Ok, it seems you want to filter them again and again ;).

    As far as I understand you, you may find the following operators helpful:
    • Data Transformation -> Name and Role Modification -> Set Role
    • Data Transformation -> Attribute Set Reduction and Transformation -> Selection
    "integer"  etc reflects the type of data stored and is independent of the role. You may want to set this "value type" once for all experiments and the role to "attribute". Then you change the roles as you like.

    For data transformation regarding value type there are also a ton of operators available. See Data Transformation -> Type Conversion

    I strongly recommend to read the manual and (more important) check the provided example processes. Rapidminer is like Lego (as Haddock once said): See the single units and you can build (nearly) anything.

    good luck,

    steffen
Sign In or Register to comment.