Options

Same process different results

HikeFanHikeFan Member Posts: 3 Contributor I
edited November 2018 in Help

What could be the reason for different users obtaining completely different results (e.g. classification matrix) even though they use the exact same process, same algorithms and the parameters are exactly the same? 

 

Also, I have noticed, I ran an estimation model with linear regression, got rmse; then I added a new model to the same process (used the Multiply operator; didn't change any parameters), and once I ran the process with two models, the rmse of the linear regression has changed. What might be the reason for that?

Tagged:

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    Hi,

     

    have a look at your GLM operator. it should have a boolean for reproducibility. If you run a learner in parallel you might have the case that some computation is faster or slower. This might change the overall outcome a bit. The same is true for our X-Validation. Just disable all parallelism.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You should also make sure that the local random seed is set (any arbitrary number will do) if you want complete reproducibility, either between users or even the same user running the process in different sessions.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    HikeFanHikeFan Member Posts: 3 Contributor I

    Thank you. How do I check that?

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You will need to "show advanced parameters" in your parameters window using the link at the bottom.  Then you will see an option to check a box to use a local random seed, and when you check that, you will be given a box to enter the seed number.  It will look like this:

    local random seed.PNG

     

    You will need to do this for any operator that is using any pseudo-random processes (like sampling).  So at a minimum it will be in your cross-validation operator, and you might have other operators (like Sample) in your process that would need it as well.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.