Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Getting training and testing sets from KennardStoneSampling operator
Hi,
I have a dataset that I wish to split into a training set and a testing set. I wish to use the KennardStoneSampling operator but it seems like that will only provide me with the training set. How do I get the remaining compounds which were not selected as the testing set?
I have a dataset that I wish to split into a training set and a testing set. I wish to use the KennardStoneSampling operator but it seems like that will only provide me with the training set. How do I get the remaining compounds which were not selected as the testing set?
0
Answers
if I got you correct, you want to do the sampling algorithm something its not intended for. If you have one dataset, you might sample with the KennardStoneSampling, so that an equi distributed smaller sample remains. Thus, it selects some examples from the input set and returns them as output set. If you want to split your exampleSet into training and test set, you should use the SimpleValidation Operator. Take look into the operator description to understand how it works. You then probably will test a classifier's performance in combination with the sampling best, if you sample the training data but not the test data!
Greetings,
Sebastian
Basically, in my field of research, one method to derive a training set and testing set from a dataset is to use the Kennard and Stone algorithm. The algorithm will select a set of distributed objects which can serve as a training set. The remaining objects which are not selected will be less distributed than the ones that were selected but will be similar to those selected. Hence, these objects will be useful as a testing set to gauge the performance of the model.
I guess I have to look at the source code of KennardStoneSampling operator and see how I can modify it to be like the SimpleValidation operator.