The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Best Practice for RapidMiner workstation
severin_glaeser
Member Posts: 3 Learner III
Dears,
I am a mere System Administrator, so I don't know nothing about data mining and stuff.
One of our miners will get a replacement for his workstation next year. And I want to know what would be best for him.
What will increase his performance most?
A) more RAM (currently 16GB, -> up to 64GB is possible)
more Cores (currently 4C8T)
C) more better Cores (currently i7 might be Xeon)
D) more MHz
E) something completely different
0
Answers
Dear Severin,
there is no one-size fits it all answer. In general the two things one want to have is more RAM and more threads on the CPU. There are some algorithms which are by default memory intensive (e.g. FP-Growth) were i would definitly going for more RAM. Also the question is - what is the future in terms of data usage. Will he scale up and learn on more data? If yes, RAM.
Otherwise i would go for CPUs. With the recent update all important operators run in parallel. Twice the amount of cores decreases the runtime by a factor of 2.
Knowing your use case, I would argue for 32GB Ram and more threads.
Best,
Martin
Dortmund, Germany
Dear Severin,
I completely agree with Martin. But here are some additional thoughts:
With more threads you will most likely see the biggest speed-up (as long you license also supports them).
Of course a better CPU will always have some effect, but not so much that I would say it always has to be bleeding edge.
The best amount ofRAM is a bit more tricky to decide. As Martin said, some algorithms require per se more RAM. Also if you run multiple threads, each need some memory on its own. So if your memory is already near its limit, adding more threads won't help much.
Another potential factor is of course the data access. If your processes requires a lot of file or database access, a slow connection or hard drive can be a seriously bottleneck. So in this case, investing in an SSD might be smart.
Best,
David
Thank you for your answer.
So I think I will order a notebook with Xeon CPU and 64GB RAM (fully equipped Lenovo Thinkpad P51, I always wanted to order that.)
Hi everyone,
I think this thread is a good place to share the experience with mycorporate RM setup which might not be pretty common, but actually has come very handy for me. @severin_glaeser -- I don't know your network configuration / requirements / policy, but maybe this could be interesting.
So far, I am running my RM studio on a virtual machine under Windows, on a dedicated server with two 2,4Ghz Xeons and 16M RAM (enough for me now but easily extendable if needed). My laptop is a MacBook which then connects to VM via RDP. As the whole configuration stays in the same network, there's absolutely no lagging or visible latency when working with RDP, even with VPN connection.
Initially this configuration was offered solely for strict DB security reasons (I cannot connect to DB from any local workstation which is also connected to the internet), but at the end it proved to be very efficient for number of reasons:
So far this might not be an answer to your question, but rather a different look on posiible configuration.
Vladimir
http://whatthefraud.wtf
Thank you for your best practice.
I also thought about that idea (get a Poweredge R630, install Windows 10 ( I dont know if Studio will work on server OS) and run it in the datacenter (next to the DB, so almost no latency) and so on.
But on the other hand I always wanted to have a fully equipped P51...
We will see, I will talk with the big data guys :-)