🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Getting training and testing sets from KennardStoneSampling operator

pengiepengie Member Posts: 21  Maven
edited November 2018 in Help
Hi,

I have a dataset that I wish to split into a training set and a testing set. I wish to use the KennardStoneSampling operator but it seems like that will only provide me with the training set. How do I get the remaining compounds which were not selected as the testing set?

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,527   Unicorn
    Hi,
    if I got you correct, you want to do the sampling algorithm something its not intended for. If you have one dataset, you might sample with the KennardStoneSampling, so that an equi distributed smaller sample remains. Thus, it selects some examples from the input set and returns them as output set. If you want to split your exampleSet into training and test set, you should use the SimpleValidation Operator. Take look into the operator description to understand how it works. You then probably will test a classifier's performance in combination with the sampling best, if you sample the training data but not the test data!


    Greetings,
      Sebastian
  • pengiepengie Member Posts: 21  Maven
    Thanks for the reply. I was hoping that I missed out on some operators but it seems like RapidMiner does not have the functionality that I want.

    Basically, in my field of research, one method to derive a training set and testing set from a dataset is to use the Kennard and Stone algorithm. The algorithm will select a set of distributed objects which can serve as a training set. The remaining objects which are not selected will be less distributed than the ones that were selected but will be similar to those selected. Hence, these objects will be useful as a testing set to gauge the performance of the model.

    I guess I have to look at the source code of KennardStoneSampling operator and see how I can modify it to be like the SimpleValidation operator.
Sign In or Register to comment.