Options

test and train data set

abeetbhat1995abeetbhat1995 Member Posts: 6 Contributor I
edited December 2018 in Help

should i make two data sets if i want to use algorithms ..and if i want to make dataset on my own should i create a single excel file or two excel files having one of them as training dataset and the other one as test data set and what difference should i keep in training dataset and the test dataset if these are two different files 

Tagged:

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @abeetbhat1995,

     

    1.You can create : 

     - one excel file with the training set in the sheet n°1 and the test set in the sheet n°2 (in this case in the 2 Read Excel operators,

    don't forget to specify the number of the sheet).

    or

     - two excel files (one for the training set and the second for the test set)

     

    2. Your training set and test set have to contain the same attributes and your training set have to contain the label in addition.

    Example : 

    training set :                                            test set : 

    Att1 Att2 Att3 label                                  Att1 Att2 Att3 

    a      b      c       2                                     z      y      x 

    j       k      l        3                                     t      u      v

    m     n      o       4                                    g      h      i 

     

     

    3. an example of simple fictive process :

     Training_test.png

    Regards,

     

    Lionel

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You may want to look at the training video series on modeling and validation on this page: https://rapidminer.com/training/videos/

     

    RapidMiner has a lot of built-in functionality around model validation that you should take advantage of.  Cross-validation in particular is an approach that is considered "best practice" and should be part of your workflow.  It does not require you to split your labeled data into separate training and testing sets.  

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.