Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Target Shuffling
Is there any quick way to implement "Target Shuffling" in RM?
In a target shuffling model evaluation, performance should be measured for the actual dataset as well as for a number of datasets with randomly rearranged label values.
Using random labels is not enough. The actual labels should be used and assigned to different examples.
In a target shuffling model evaluation, performance should be measured for the actual dataset as well as for a number of datasets with randomly rearranged label values.
Using random labels is not enough. The actual labels should be used and assigned to different examples.
0
Answers
The way I understand it is as following:
Step 1. You train a classifier and observe that is has X percent accuracy.
Step 2. You then randomize your labels, train another classifier, and observe that is has Y percent accuracy.
Step 3. You repeat Step 2 multiple times and find Z = best(Y).
When X is sufficiently better then Z, you claim that the model underlying X is not caused by noise.
This is correct?
Sorry for the late reply, I was out of office for a while.
Target Shuffling works as you describe. The only restriction is that step 2 should take care not to bias the label distribution. That is why the original set of labels is used with randomized order (hence the term shuffling instead of randomizing).
Such shuffled dataset can be easily constructed using R or even excel, but how could one implement the whole proccess in RM and get one final result?
Some times the top-n random models are required for comparison and some dataset similarity measures. This is to avoid using too many repeats in step 3, in relation to the number of examples, and have a large number of datasets that are not truly shuffled.
I am struggling with the same problem of implementing target shuffling in RM.
Did you get a response or manage to find a solution?
Thanks,
Amnon
I implemented Target Shuffling in RM.
I saved it as a Building Block for easy inclusion in projects.
The enclosed code is for a building block. Save it in a file called [tt]Target Shuffling.buildingblock[/tt] your repository directory.
I hope you find it useful.
I'll be happy to get any comments.
Sincerely,
Amnon Khen