The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"RapidMiner in a Cluster"
Hi, I'm Pedro,
I would like to know how is the development to support parallelized processing in a cluster. I have more than 100GB of text to process and a cluster with 32 machines available.
Regards.
I would like to know how is the development to support parallelized processing in a cluster. I have more than 100GB of text to process and a cluster with 32 machines available.
Regards.
Tagged:
0
Answers
one quite unsolved problem on machine learning is, that all the algorithms to build models are nearly unparallelizable. And since they are at least quadratic in runtime not applicable on all your data, even with 3000 machines.
But if you want to classify this amount of text, then you should train your model on one machine. The application of this model is highly independent and can be done even without any cluster structure by simply starting the application process on a subset of the data.
Greetings,
 Sebastian