Getting training and testing sets from KennardStoneSampling operator

pengiepengie Member Posts: 21 Maven
edited November 2018 in Help

I have a dataset that I wish to split into a training set and a testing set. I wish to use the KennardStoneSampling operator but it seems like that will only provide me with the training set. How do I get the remaining compounds which were not selected as the testing set?


  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    if I got you correct, you want to do the sampling algorithm something its not intended for. If you have one dataset, you might sample with the KennardStoneSampling, so that an equi distributed smaller sample remains. Thus, it selects some examples from the input set and returns them as output set. If you want to split your exampleSet into training and test set, you should use the SimpleValidation Operator. Take look into the operator description to understand how it works. You then probably will test a classifier's performance in combination with the sampling best, if you sample the training data but not the test data!

  • pengiepengie Member Posts: 21 Maven
    Thanks for the reply. I was hoping that I missed out on some operators but it seems like RapidMiner does not have the functionality that I want.

    Basically, in my field of research, one method to derive a training set and testing set from a dataset is to use the Kennard and Stone algorithm. The algorithm will select a set of distributed objects which can serve as a training set. The remaining objects which are not selected will be less distributed than the ones that were selected but will be similar to those selected. Hence, these objects will be useful as a testing set to gauge the performance of the model.

    I guess I have to look at the source code of KennardStoneSampling operator and see how I can modify it to be like the SimpleValidation operator.
Sign In or Register to comment.