The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Parallel Processing - Multiple server instances
Dear Community,
I am currently working on my MSc research thesis, which is based on Rapidminer and cloud deployment models. Therefore, I will probably be bugging you a bit as I research and questions arise .
One of the questions I intend to answer is: Can Rapidminer support parallel processing or have multiple instances (different servers deployed) mining collaboratively the same data set?
I would appreciate if you can help me answer these questions and point me in the right direction to explore various options. While the Rapidminer platform may not directly support this set up, some work arounds or custom configuracions or data mining strategies might.
Any help or feedback is very welcome.
Thanks in advance,
Nicolas
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi Nicolas,
Studio and Server share the same execution engine. Therefore, parallel processes in Studio also work in parallel on Server.
Sharing data mining algorithms between multiple independent systems is an active research topic. For most machine learning algorithms, it's not worth the effort, as network communication latencies would make the entire process much more inefficient than executing a smaller number of processes on one system.
There are exceptions like neural network building using TensorFlow with the Deep Learning extension. This can use multiple systems with or without GPUs to work one problem.
There are also related operations that are very easy to parallelize. E. g. you can score large datasets in parallel with existing models on multiple servers. You can use Radoop with a Hadoop cluster to parallelize preprocessing, filtering, etc. and build a few model types.
Regards,
Balázs7
Answers