Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[SOLVED] Automatic dataset shuffle
Dear All,
I have 5 different datasets (from 5 different user).
I wish to do "user-cross-validation".
Meaning, I wish to test on user n, and train on all other users, for n = 1, ..., 5.
Any way to do this automatically?
I can retrieve all 5 data sets, but after this, I should "dynamically" join them.
Best regards,
Wessel
I have 5 different datasets (from 5 different user).
I wish to do "user-cross-validation".
Meaning, I wish to test on user n, and train on all other users, for n = 1, ..., 5.
Any way to do this automatically?
I can retrieve all 5 data sets, but after this, I should "dynamically" join them.
Best regards,
Wessel
0
Answers
And then use 'linear sampling' option?
Best regards,
Wessel
OK, unfortunately there is no easy out-of-the-box-with-a-single-operator method for this. But - because of the almighty tool-box power of RapidMiner - we can try to mimic a cross-validation with your desired behaviour!
There are actually several methods for this. One could work like this. You append all of your data-sets, but add a special attribute, let us say 'set_id', for every single attribute before. This attribute contains the number of the exampleset (1,2,3,...,k). After this you can loop k-times and filter the train- and test data with the help of this attribute. After you calculate the performance you can build an average.
Here is an example of such an process with 5 identical iris datasets: If you find a more elegant or remarkable way to achieve this, feel free to post it here.