Options

RapidMiner Server 8 will be out soon. What’s in it for you?

jpuentejpuente Employee, Member Posts: 53 RM Product Management
edited December 2018 in Knowledge Base

New ways: scale out

 

The most obvious way in which RapidMiner environments will change with this new release is the option to scale out. If your computational needs exceed that of a single machine, now you can deploy multiple RapidMiner Job Agents across multiple machines or VMs and leverage all your resources. Now your environment can scale both vertically and horizontally.

Job Agents are connected to queues and are constantly polling and asking for something to do. This way, RapidMiner can work in a grid-like fashion, sending jobs to free resources that can work on them.

 

Adding some structure: the new queues

 

That grid-like architecture would be a basic configuration with all the available nodes connected to the same queue. But that’s not the only option. In RapidMiner 8.0, queues have acquired a new meaning.

 

 

queues.png

 

 

 

 

Each Job Agent can pick up jobs from only one queue, but multiple Job Agents can connect to each queue. With this ‘one queue to many agents’ relationship one can effectively configure sub-clusters that can serve different purposes. This is a great tool for administrators to achieve good resource management.

 

For example, different teams can have their own sub-clusters, but they can also share a common one. Or, within a group, there might be a standard queue and a high-priority one where only certain users or applications are allowed to send jobs.

 

Another option is to split the cluster depending on the needs of the user processes. Typically, one would send big training processes to a large machine with enough memory (“training queue”), while lightweight scoring processes go to another sub-cluster with less memory, but maybe more CPUs to take advantage of parallelization (the “scoring queue”).

 

One could also have Job Agents specialized in certain extensions with particular needs, like Keras (Deep Learning), which has specific installation pre-requisites.

 

Reliability and Fault tolerance

 

By the way, this is all about having local or remote dedicated resources for processes, which gives us an interesting and powerful feature: now that everything runs independently and no process going amok will affect others, we get a highly reliable and robust system.

Another interesting side effect is an increased fault tolerance, especially in the execution pieces (the Job Agents). They are set to be fault tolerant by default as soon as more than one Job Agent is connected to each queue. If, for any reason, any one of them fail, another Job Agent will continue picking up jobs from the same queue and users will not be affected. Only the job that fails will be lost.

ft.png 

Future outlook

RapidMiner Server 8.0 is just a first step. We still have a lot in store for future releases, like full high availability, a centralized configuration and improved UI. Stay tuned!

Tagged:

Answers

  • Options
    781194025781194025 Member Posts: 32 Contributor I

    wish some of the cool stuff was available to us poor folks :<

  • Options
    jpuentejpuente Employee, Member Posts: 53 RM Product Management

    The main focus this time is on the Server, which I agree is more interesting for company users. However, the new Server architecture is also available with the free license (with parallelization limitations). And there are a few good things in Studio and Radoop too. Take a look at Studio's release notes:

    http://docs-beta.rapidminer.com/studio/releases/8.0/

    Decision trees have now regression capabilities and a few new features. Also the fuzzy search is there for everyone to try (we're working on better search in general, by the way).

     

     

Sign In or Register to comment.