RapidMiner best practices for multiple customers/models

divya_dasdivya_das Member Posts: 13 Newbie
edited April 22 in Help
Hi,

What are some of the best practices which we can follow when we use RapidMiner server for multiple customers. 
Suppose I have a RapidMiner server and the training and prediction processes are exposed as web services. The processes have been parameterized(using macros) to dynamically handle training/prediction for different customers. There are two models per customer.
1. How to manage when the number of models becomes huge. Currently, we have one folder per customer.
2.  How to handle the load when we have multiple prediction processes being invoked. Should we use multiple RapidMiner servers? Is there any scaling mechanism ( auto-scaling) to scale the training/prediction process horizontally?

Regards,
Divya

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,351  Community Manager
    hi @divya_das

    I'm not the resident expert on RM Server deployment solutions but here are some thoughts for you...

    1. When you say the number of models become huge, do you mean that all these models are in production or some of them are legacy models? If it's the latter, I would certainly archive older models in some storage solution - the models are just file objects that you can archive anywhere.

    Otherwise if it were me I'd start making subfolders (you can even automate their creation using Create Directory) I guess...can you help me understand why there are so many models when there are only two per customer?

    2. For scaling yes, you only need one RM Server but you should create unique job agents to scale outwards (see this docs page for an overview of scaling architecture in RM Server) and to handle the load, you can upgrade to High Availability load balancing.

    Scott
  • divya_dasdivya_das Member Posts: 13 Newbie
    Hi Scott,

    Thanks for the ideas. I will go through the articles you have mentioned.
    We are planning for multiple use cases that will be in production. Say one for linear regression, one for logistic regression etc. We will have one model per customer for each use case. For now, we are planning to keep one folder for each customer. But, what if we have 100 customers, then we will end up with 100 folders.  

    Thanks,
    Divya
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,059  RM Data Scientist

    i don't see any reason why 100 folders would be an issue? It's in my opinion the most straight forward way to do this. Any other solution (like a HashMap of models) requieres additional load when loading it.

    Best,
    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzerdivya_das
  • divya_dasdivya_das Member Posts: 13 Newbie
    Thanks Martin. I will go with the folder approach, one folder for each customer.
    sgenzer
Sign In or Register to comment.