What could be the reason for different users obtaining completely different results (e.g. classification matrix) even though they use the exact same process, same algorithms and the parameters are exactly the same?
Also, I have noticed, I ran an estimation model with linear regression, got rmse; then I added a new model to the same process (used the Multiply operator; didn't change any parameters), and once I ran the process with two models, the rmse of the linear regression has changed. What might be the reason for that?
have a look at your GLM operator. it should have a boolean for reproducibility. If you run a learner in parallel you might have the case that some computation is faster or slower. This might change the overall outcome a bit. The same is true for our X-Validation. Just disable all parallelism.
You should also make sure that the local random seed is set (any arbitrary number will do) if you want complete reproducibility, either between users or even the same user running the process in different sessions.
You will need to "show advanced parameters" in your parameters window using the link at the bottom. Then you will see an option to check a box to use a local random seed, and when you check that, you will be given a box to enter the seed number. It will look like this:
You will need to do this for any operator that is using any pseudo-random processes (like sampling). So at a minimum it will be in your cross-validation operator, and you might have other operators (like Sample) in your process that would need it as well.