Combine two files

[Deleted User][Deleted User] Posts: 0 Learner III
hi 

Is it possible to combine two data(train+test) and make a new data which the train part has label and the test with out label?

thank you
Tagged:

Best Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Solution Accepted
    Hello @mbs

    If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction. 

    If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Solution Accepted
    You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Answers

  • [Deleted User][Deleted User] Posts: 0 Learner III
    edited October 2019
    @varunm1

    thank you for the answer but it is a bit complex so could you please send me an example ( process)?

    you saw my data before B)

    thank you
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @mbs

    Here is the dummy data created and the .rmp file that you can import to rapidminer and see. The append (superset) operator is in the opertor toolbox that you need to install from the marketplace.



    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    @varunm1


    it has read excel operator and again i got error :'(
  • [Deleted User][Deleted User] Posts: 0 Learner III
    combine both data plz

    thanks
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    You can import data into a repository and then drag and drop those instead of reading excel operators. If this doesn't work out, my suggestion is to create a single excel file with both train and test data and then import them to rapidminer. You can then apply filter to divide training and test datasets. I attached a dummy excel file with a new attribute that defines either that sample belongs to training (Tra) or testing (Tes). This column is used to separate the data (Filter examples). 


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    edited October 2019
    thank you 

    does it has any label?

    label is important for my work
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited October 2019
    The data I created has a label for training and missing values for testing. If you have labels in testing data that is fine, you are filtering out test data using the column that says either the data belongs to training or testing. See the "Data_type" column in the excel sheet attached in the previous post, that column specifies which sample the data belongs to. Once you separate them and use the apply model, it will take care of testing.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    @varunm1

    hi 

    still i see error  :(

    the name of some empty column in my data is error :/
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    @mbs looks like something weird happening in your excel file. The hidden spaces may be causing an issue for your data import. Not sure though.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    edited October 2019
    @varunm1


    i dont have any space in data.

    i try it in my friend lap top (RM version 9.2) but still it has problem :'(
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Did you try my excel files to check, if it is having an error with these as well?
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    your file is ok but mine still has problem
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited October 2019
    Unfortunately, I cannot help much with this without your files. I thought about most of the options. My understanding is that the issue is a formatting error in that particular excel file. By the way, did you create these excel files manually, or did you get them from another system or software as an output?

    Also, try creating dummy names for the column name and see how it works.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    @varunm1

    thank you for your help

     i did this file manually and copy it to other excel and fix it but now half of my data has label and the other part doesnt have label . so in this situation what is your suggestion?

    regards 

    mbs
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    You want to use the unlabelled data for prediction right. You can separate labeled and unlabelled data in rapidminer using filter examples operator. I mentioned this in my earlier post in this thread. You can use the labeled data for model building and unlabelled data to make predictions from that model.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    i know but if you remember i had problem with two data which i told you so because of that i combine them

    now is there any other way for that?

    thank you
  • [Deleted User][Deleted User] Posts: 0 Learner III
    yes yes 

    Finally it works o:)

    thank you very much @varunm1

Sign In or Register to comment.