RapidMiner

Same process different results

Contributor

Same process different results

What could be the reason for different users obtaining completely different results (e.g. classification matrix) even though they use the exact same process, same algorithms and the parameters are exactly the same? 

 

Also, I have noticed, I ran an estimation model with linear regression, got rmse; then I added a new model to the same process (used the Multiply operator; didn't change any parameters), and once I ran the process with two models, the rmse of the linear regression has changed. What might be the reason for that?

4 REPLIES
Highlighted
RMStaff

Re: Same process different results

Hi,

 

have a look at your GLM operator. it should have a boolean for reproducibility. If you run a learner in parallel you might have the case that some computation is faster or slower. This might change the overall outcome a bit. The same is true for our X-Validation. Just disable all parallelism.

 

Best,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Elite III

Re: Same process different results

You should also make sure that the local random seed is set (any arbitrary number will do) if you want complete reproducibility, either between users or even the same user running the process in different sessions.

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Contributor

Re: Same process different results

Thank you. How do I check that?

Elite III

Re: Same process different results

You will need to "show advanced parameters" in your parameters window using the link at the bottom.  Then you will see an option to check a box to use a local random seed, and when you check that, you will be given a box to enter the seed number.  It will look like this:

local random seed.PNG

 

You will need to do this for any operator that is using any pseudo-random processes (like sampling).  So at a minimum it will be in your cross-validation operator, and you might have other operators (like Sample) in your process that would need it as well.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts