Get ready for RapidMiner Server 8!
Why a major version?
RapidMiner Server 8.0 comes with a brand-new architecture, delivering some exciting changes: full scalability, more useful queues, improved resource management, a nicer UI and a lot of changes that will open many new use cases. This represents a big leap in what RapidMiner Server can do for you.
So, what does the new architecture look like?
Each blue box represents a separate machine. The big box to the left represents the central RapidMiner Server, which provides the web UI and receives all the user requests. All this is done by the RapidMiner Server central node:
- Scheduling of user jobs (processes)
- User, queue and permissions management.
- Execution of processes scheduled to the local queue (this queue is optional).
- Execution of processes scheduled through web services, web apps or triggers.
The Job Agents (in dark blue) are the new kids on the block. They can be deployed in remote machines or locally in the central node and have to be configured to point to one queue each. The Job Agents check their queue and pick up any pending jobs, spawn a Job Container and execute the job (the user process designed in Studio). This means increased scalability and resource sharing and management among users or projects.
Although scalability is the main new feature, it's still possible to run RapidMiner Server in a single machine with one (or more) local Job Agents executing the jobs. Even if you run everything in a single machine, the new architecture will provide better fault tolerance and improved reliability.
More about each component. What should I install?
There are two components:
- RapidMiner Server
You will be able to download it from our website. The installation process is equivalent to what you know from the older Server versions. During the installation, you will be able to select whether you want to have a local Job Agent or not. And, if so, what resources will be dedicated to it (memory and CPUs).
If you already have a Server running, you can migrate it to the new version. There will be two migration options:
- You can migrate all your queues for an equivalent, but scaled-out environment. In that case, you’ll need to deploy and configure the Job Agents manually.
- You can select the single-queue option and the installer will “collapse” all your queues into one, so you can keep working in a single machine as usual.
Potentially, you could have several local queues and Job Agents, but each will take up its share of memory. That kind of configuration could be good if you have a big machine that you want to split in a logical way to share its resources among user groups or applications.
- RapidMiner Job Agent
You can download the Job Agent from our downloads page. It is a zip package that you need to decompress wherever you want it to run. You need to edit its configuration file to point it to the right Server and queue, but alternatively, every time you create a new queue in the Server’s UI, a new link will appear to download the configuration and you can directly copy and paste in the Job Agent’s folder.
How does it work?
When a user schedules a process from Studio or from the Server's UI, the process is placed into the corresponding queue. Any of the Job Agents connected to that queue can pick up the work and run the process. The RapidMiner Server (and the user, through the UI or Studio) gets notified and logs become available.
The process is fully executed in the Job Agent. It connects to the repository, external data sources or whatever is needed for the process independently. There is no data flow from the Server to the Job Agents.
Queues and Scheduling
Differently from what happened in previous versions, queues are now linked to Job Agents. Queues have user permissions and sending a process/job to a queue determines which Job Agents will work on it or how many resource will be available. Many processes can be run in parallel if there are enough free resources, but a single process is always run by a single Job Agent.
If no free resources are available when a process is scheduled, it waits in the queue until it's picked up by an available Job Agent.
What doesn't change
Only processes launched or scheduled from Studio or from the Server's GUI are executed in the Job Agents. Jobs requested through Web Services, Web apps or triggers are not affected by the architecture change and they will continue to run in the central RapidMiner Server.
In summary
In a nutshell, these are the most noticeable differences from the old version:
- Multiple Job Agents can be installed in multiple machines for process execution. You can scale your environment as much as you need.
- Queues have now a clear role in resource management. Each Job Agent is configured to only one queue (but each queue can be connected to multiple Job Agents). Job Agents are configured to use certain resources (memory and processes). Those resources become available for jobs scheduled in the corresponding queue. Therefore, queues are a means to share and limit the system's resources among users.
- The new queues allow you to have dedicated machines or resources for groups of users or specialize in different use cases: training, scoring, text analytics, etc.
- Better resource control also provides a more orderly environment: jobs will run on any Job Agent with free resources connected to the queue. If there are no free resources, jobs are queued.
- Logs stay inside (possibly remote) Job Agents. They can be retrieved from the central RapidMiner Server as long as the Job Agent is running.
- First steps taken towards a fresh new UI design of Studio. Improved process list with filters and the possibility to stop running processes. There is a new UI for creating queues.
- Extensions have to be manually deployed on every Job Agent. Each Job Agent may have a different list of extensions, so it’s possible to create dedicated Job Agents for a particular use case.
- Executions are run on separate JVMs and even separate machines. All processes are fully independent and the whole system becomes much more robust and tolerant to problems in individual processes. This makes the whole system much more fault tolerant.
Future outlook
RapidMiner 8.0 is a big step for scalability and management, but we are just getting started! Take a look at this other post in our company blog. There are more architectural issues that we want to address, like moving the web-services executions to the Job Agents, improving latency and performance, going for a fully highly available environment, and much more. Stay tuned!
Answers
Hi Jesus:
Thanks for the article about 8.0. I read the installation instructions, but I could not tell if it is OK to have 7.6 and 8.0 (Studio and Server) on the same machine, given how system metadata is handled. Can installations of 7.6 ans 8.0 be isoloated from each other?
I can guess that you shouldn't run 7.6 and 8.0 at the same time, but can 7.6 and 8.0 co-exist on the same machine, or is it better to put 8.0 on it's own machine?
Thanks for conisdering this question, and best wishes,
Michael Martin ;-)
Hi Michael,
Thanks for your interest. Studio 7.6 and 8.0 can run on the same machine without any problem. I would recommend not to run both at the same time, because they both depend on the same .Rapidminer folder, however.
The Servers can also run in the same machine as long as:
- The use a different database (or database schema at least)
- They are installed in different folders
- They use different ports (or never run at the same time)
Other than that, it's ok to have the beta on the same machine where you have your Server (for test environments obviously, we wouldn't recommend such setting in a production environment).
Hope that helps.
Jesús
Thanks, Jesus! Will install and start working with 8.0 in the next day or so.
Best wishes, Michael ;-)