Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Different results with leave-one-out X-Val
Hello everyone,
first of all thanks for your fantstic data mining tool RM.
I'm using version 5.0.008 und I've got a problem:
When I do a X-Validation with leave-one-out on my data set (random seed in main process is still 2001) I get different results.
I've found out that when I don't set the leave-one-out option you can now set and unset the "use local random seed" option.
Okay so far. When I set the "use local random seed" with let's say 1000 and now set the leave-one-out option again I get a result of 69% accuracy.
But if I leave the 'use local random seed" unset and now set the leave-one-out option again I get about 74% accuracy?
How can that be? ???
It seems a bit absurd to me as these option mustn't even come in effect since the leave-one-out option is set...(?)
Any suggestions or am I doing sth wrong?
Thanx in advance & best regards,
Sasch
first of all thanks for your fantstic data mining tool RM.
I'm using version 5.0.008 und I've got a problem:
When I do a X-Validation with leave-one-out on my data set (random seed in main process is still 2001) I get different results.
I've found out that when I don't set the leave-one-out option you can now set and unset the "use local random seed" option.
Okay so far. When I set the "use local random seed" with let's say 1000 and now set the leave-one-out option again I get a result of 69% accuracy.
But if I leave the 'use local random seed" unset and now set the leave-one-out option again I get about 74% accuracy?
How can that be? ???
It seems a bit absurd to me as these option mustn't even come in effect since the leave-one-out option is set...(?)
Any suggestions or am I doing sth wrong?
Thanx in advance & best regards,
Sasch
0
Answers
would you be so kind to provide your process? I will check it then. Please include it in the code area of the #-button.
Greetings,
Sebastian
it's another data set with other accuracies but it shows the same effect
in fact you are right. This behavior results from the way the cross-validation sets are built: Instead of treating the case with x=n different, there are simply built n random sets all consisting of one single example. The result is the same, unlike you are using an algorithm incorporating randomness like the LibSVM does.
Hence the XValidation then consumes the first random numbers of the global random number sequence, the LibSVM behaves different, because receiving different numbers...
Greetings,
Sebastian
thank you for your detailed answer. That was my second thought that it depends on the SVM.
But now how shall I deal with it?
Any idea to get that behaviour out of the process?
Have a nice day,
Sasch.
this depends on what you are going to achieve. Why does this behavior disturb you anyway?
Greetings,
Sebastian
that's because me and my group we're trying to achieve best results (accuracies) in classifiaction of our data.
First we're doing a grid search for the best parameters for the SVM (gamma and C) and after applying these we're doing a x-val again.
(I know about overfitting the model but in this case it doesn't matter...)
And at that point I noticed the effect with the leave-one-out option.
By the way, we've also got the same problem like in thread topic http://rapid-i.com/rapidforum/index.php/topic,214.msg831.html#msg831 but the solution given there doesn't work at all. (But that also doesn't matter.)
So I just wanna know which accuracy I should choose, because I don't know which one's the right one.
we need to know this in order to finish our study...
Thanks so much for your patience,
Sasch.
I guess it doesn't matter As long as you optimize without a valid performance estimation, the accuracy in a following validation would increase anyway. So go ahead with the higher value. But down to the point: You can't say. It's just an estimation and the differences seem to come from the randomness of the process itself...So you might repeat it several times with varying randomseeds / settings and average to get a valid estimation...
Greetings,
Sebastian
that's a good idea. Thanks again for your answers and your suggestions.
Regards,
Sasch.