The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Hardware Recommendations for Running RapidMiner
Folks,
I will be starting a text mining project and hope to use RapidMiner. The data could get fairly big.
*****Does anyone have hardware recommendations for running RapidMiner? I may be able to run on a reasonably good UNIX server.
****Is there more value in using extra memory for example? Or processor? Or Storage?
I will be using the Community Edition at this point, so I presume that processing will be done in-memory rather than in database in which case fast storage wouldn't offer any benefits. So my guess is memory and processor are the things to look at. But there may be a scale beyond which benefits would not be increasing aswell.
(I posted about this last week but didn't get responses. I've tried to rephrase my query in a clearer way.)
Regards,
John.
I will be starting a text mining project and hope to use RapidMiner. The data could get fairly big.
*****Does anyone have hardware recommendations for running RapidMiner? I may be able to run on a reasonably good UNIX server.
****Is there more value in using extra memory for example? Or processor? Or Storage?
I will be using the Community Edition at this point, so I presume that processing will be done in-memory rather than in database in which case fast storage wouldn't offer any benefits. So my guess is memory and processor are the things to look at. But there may be a scale beyond which benefits would not be increasing aswell.
(I posted about this last week but didn't get responses. I've tried to rephrase my query in a clearer way.)
Regards,
John.
0
Answers
u can run RM on fairly complex problems on a laptop.
some problems may need 2345GB of ram.
some will do with 200MB.
some problems may need all the time in the universe some 1s.
basically fritmore is right, the hardware recommendations heavily depend on the tasks at hand. However, some general things can be said:
- most RapidMiner processes can only use one CPU
- if you have large amounts of data, you should consider a machine with a reasonable amount of RAM
- executing long-running processes on your workstation/laptop is at best inconvenient.
That said, I would recommend to setup RapidAnalytics on your high-performance server. RapidAnalytics offers a repository, i.e. your data and processes are stored on the server, but you can access them in the usual way from within RapidMiner as if they were in your local repository. That way you can:
- design the processes on your personal laptop/workstation at home or at work
- execute the processes on the RapidAnalytics server with one click from within RapidMiner
- access the results as usual from within RapidMiner
Then it is no problem to shutdown your laptop while a process is running, since it's executed on the server, or to design the next process while the previous one is running.
RapidAnalytics can execute several processes at the same time and thus use multiple CPUs. The only limit is the available RAM. So first thing for your server should be a reasonable amount of RAM, second thing fast CPUs.
You probably also want to store your data in a database. That one could run on the same machine as RapidAnalyitcs, or on another machine. For this I can't give any recommendations without knowing your specific use case and budget.
Best,
Marius
As I understand it, RapidMiner does all of its processing in-memory so the database setup wouldn't affect performance i.e. the data will input into memory through one of the read-in nodes and for all of the procesing from then on, the database is out of the equation. Is it fairly accurate to say this?
John.