The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
It's really up to you, choosing a larger size will simply allow your process to run more quickly if the dataset is large. That's true unless you get "out of memory" errors when you run it locally, in which case you may need a larger size to ensure it finishes at all.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I did some experimenting with this on AWS using an instance with 36 vCPUs. That configuaration is basically a dual cpu Intel server with nine actual cores, 18 threads per cpu and lots of memory.
What stood out was that I could only ever get Rapdiminer Studio to use one CPU (max 18 threads) in this case. It was not that fast either. I decided that the cloud was not for me after that experience.
regards,
Alex
Hi Alex, and thanks for sharing the results of your testing. I am not sure when it was done, but it is the case that how many cores RapidMiner uses varies depending on the operators that are being utilized. RapidMiner has recently made progress in taking advantage of parallel processing by making more of the most commonly-used processing-intensive operators able to parallelize their work. See this recent announcement, for example, about the changes to the cross-validation operator which was just released earlier this month: https://rapidminer.com/new-parallel-cross-validation/
So if your AWS testing was a while ago, you might want to redo it at some point in the future to take advantage of the newer operators. I have tested the new cross validation operator and it is definitely faster than the prior version.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts