which operator allows me to use all four logical processors for parallel computing?

jsramirezgo · March 2019

I Got a Rapid Miner Enterprise Medium License (I am able to use four logical processors), however, I dont know how to use it. I mean, I dont know which operator in Rapid Miner allows me to deploy parallel computing by using more than one logical processor.

I really appreciate the help because this is part of an academical research and I need urgent help in order to continue my experiment.

Thanks!

sgenzer · March 2019

hi @jsramirezgo - most of the main modeling operators (and many of the ETL ones) are already optimized for parallelization and there is literally nothing you need to do. When you run processes (or Auto Model), you should see your processors all kick in. If you do NOT see them kick in, please let us know.

Scott

jsramirezgo · March 2019

Hi sgenzer, thanks a lot for your reply.

I got two questions regarding your reply:

1. how can I see whether the processors all kick in?

2. the operator "execute process" that runs multiple process can be named as a parallel computing operator?

Thanks!

sgenzer · April 2019

hi @jsramirezgo -

1. to see your CPU usage really depends on your computer. On my Mac I look at Activity Monitor. I think there are more techie ways to do this but I will let others chime in who know more. Maybe @Telcontar120? @rfuentealba? @lionelderkrikor?

2. The "Execute Process" operator will simply execute a process somewhere else in RapidMiner. If that process is optimized for parallelization, it will run parallelized.

Hope that helps?

Scott

lionelderkrikor · April 2019

Hi @jsramirezgo,

To answer to Scott's question, I'm using Windows as OS, but
I'm using a Rainmeter skin which is displayed on the desktop of my computer and which displays the "CPU usage" (with % Core 1 , % Core 2, % Core 3 etc.) and the "RAM usage" in %.

Regards,

Lionel

Telcontar120 · April 2019

You can also use the Log operator to capture CPU execution time and memory utilization.

sgenzer · April 2019

🤦‍♂️ #occamsrazor thx @Telcontar120 @lionelderkrikor

jsramirezgo · April 2019

Hello guys.

thanks a lot for your replies. About checking the processors I realised how to do it, thank you!

regarding scott’s answer, I think I have a new question that would help me to resolve this finally:

How can I know If a process is optimized for parallelization?

thanks!

sgenzer · April 2019

oh that's easier - just change this parameter (in preferences) and look at the execution time

jsramirezgo · April 2019

Excellent information Scott. I finally solved my questions.

Thank you very much!

jsramirezgo · April 2019

Sorry Scott, I realised it didn't work for me. In theory I understood what you said and I set up the parallel execution in preferences. Also, my hardware has 8 logical processors, however, when I did my test with the parameter "worker threads for active process=0" I got a certatin execution time, but, when I tested with the parameter "worker threads for active process=4" I got the same execution time.

I think the idea of parallel computing is to reduce time, but both escenarios had the same execution time. what was wrong? Do I need an specific operator?

I really appreciate your help. Sorry for bein persistence. Just want to clarify my doubts with RM.

jczogalla · April 2019

Hi @jsramirezgo!

Setting the preference to 0 means that all available/allwoed processors should be used. If you want to test multiple cores versus one core, set the preference to 0 (or 4) and 1 respectively.

Also, for the loop operators that are parallelized, there is an option "enable parallel execution" which lets you decide if you want to execute the loop iterations in parallel or not.

Hope this helps!

Jan

jsramirezgo · April 2019

Hey Jan!

thanks a lot! Quite clear. Questions solved.

regards.

rfuentealba · April 2019

Hello,

I am late to this reply, hence I'll add a few more things, not closely related with parallel execution of operators but it does with processes.

I have a MacBook Pro for RapidMiner Studio, and I really don't care about parallel execution on it. However, for my world domination projects, I use 4 MacBook Pro's with 12-core i9-9900, each one with an agent configured to run up to 11 parallel tasks. If you have such a setup, use Nagios. It uses the SNMP protocol to monitor the status of the machines, and due to the nature of that protocol, it doesn't affect much of the network throughput.

Now be sure that I'm not running one task in parallel in all these computers (is it feasible? I need it badly) but many different enqueued processes. More often than not when I need this kind of power, I divide my processes and use the Schedule Process operator to cascade, or an API with data through RapidMiner Server.

A real case for this: let's say I have 32000 pages from a website that you need to apply NLP. I do convert these to examples and perform a Loop Examples, pass the entire data on a POST to the API and finish the process. This creates 32000 requests to the RapidMiner Server, and the results are solved with 44 processes. In my last development project, 42 tasks served by all 4 computers could solve nearly 1200 pages per minute, taking only 30 minutes. I did that with my old good MacBook Air and it took 7 hours to complete the same task.

If anyone has a better suggestion for me, I'm all ears.

Too bad the MacBooks aren't mine

Just my two cents.

All the best,

Rodrigo.

sgenzer · April 2019

holy cow @rfuentealba looks like you have quite the rig!

SGolbert · April 2019

Hi @rfuentealba

Without knowing much about the use case, I would try to reduce the number of calls to the server. Each call generates a tremendous overhead (if the number of calls remains large, consider using the scoring agent).

One option is to do the web crawling on a separate process (possibly with an external tool), save the pages to a file or in the repository and then have RM Server process the files/dataset on one or more scheduled processes.

Let me know if this helps, if you tell us more maybe we come up with more ideas

Regards,

Sebastian

SGolbert · April 2019

Hi @jsramirezgo

IMHO the parallalism inside a given process is handled quite well by RapidMiner, you don't need to do anything (that is a great advantage compared to doing data science in a programming language). The kind of parallelism that would be most useful to you is running processes at the same time.

Do you know that you can run processes in the background?

That way you can keep working while your experiments run, pretty neat!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

which operator allows me to use all four logical processors for parallel computing?

Answers