🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Combine two files

mbsmbs Member, KB Contributor Posts: 211  Guru
hi 

Is it possible to combine two data(train+test) and make a new data which the train part has label and the test with out label?

thank you
Tagged:
Tghadially

Best Answer

  • varunm1varunm1 Posts: 840   Unicorn
    Solution Accepted
    You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/

Answers

  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    Hello @mbs

    If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction. 

    If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    sgenzer
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    edited October 9
    @varunm1

    thank you for the answer but it is a bit complex so could you please send me an example ( process)?

    you saw my data before B)

    thank you
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    Hello @mbs

    Here is the dummy data created and the .rmp file that you can import to rapidminer and see. The append (superset) operator is in the opertor toolbox that you need to install from the marketplace.



    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    IngoRMsgenzer
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    @varunm1


    it has read excel operator and again i got error :'(
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    combine both data plz

    thanks
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    You can import data into a repository and then drag and drop those instead of reading excel operators. If this doesn't work out, my suggestion is to create a single excel file with both train and test data and then import them to rapidminer. You can then apply filter to divide training and test datasets. I attached a dummy excel file with a new attribute that defines either that sample belongs to training (Tra) or testing (Tes). This column is used to separate the data (Filter examples). 


    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    edited October 9
    thank you 

    does it has any label?

    label is important for my work
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    edited October 9
    The data I created has a label for training and missing values for testing. If you have labels in testing data that is fine, you are filtering out test data using the column that says either the data belongs to training or testing. See the "Data_type" column in the excel sheet attached in the previous post, that column specifies which sample the data belongs to. Once you separate them and use the apply model, it will take care of testing.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    Tghadially
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    @varunm1

    hi 

    still i see error  :(

    the name of some empty column in my data is error :/
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    @mbs looks like something weird happening in your excel file. The hidden spaces may be causing an issue for your data import. Not sure though.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    edited October 12
    @varunm1


    i dont have any space in data.

    i try it in my friend lap top (RM version 9.2) but still it has problem :'(
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    Did you try my excel files to check, if it is having an error with these as well?
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    your file is ok but mine still has problem
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    edited October 12
    Unfortunately, I cannot help much with this without your files. I thought about most of the options. My understanding is that the issue is a formatting error in that particular excel file. By the way, did you create these excel files manually, or did you get them from another system or software as an output?

    Also, try creating dummy names for the column name and see how it works.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    @varunm1

    thank you for your help

     i did this file manually and copy it to other excel and fix it but now half of my data has label and the other part doesnt have label . so in this situation what is your suggestion?

    regards 

    mbs
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    You want to use the unlabelled data for prediction right. You can separate labeled and unlabelled data in rapidminer using filter examples operator. I mentioned this in my earlier post in this thread. You can use the labeled data for model building and unlabelled data to make predictions from that model.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    i know but if you remember i had problem with two data which i told you so because of that i combine them

    now is there any other way for that?

    thank you
  • mbsmbs Member, KB Contributor Posts: 211  Guru
    yes yes 

    Finally it works o:)

    thank you very much @varunm1

    varunm1
Sign In or Register to comment.