The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Parallelization Operator
Hi!
Though I only recently started writing RapidMiner Operators I want to tackle the parallization issue. I want to write a OperatorChain that runs it's inner Operators in parallel. As I know that parallel operation brings aroung much trouble I want to do it work around. All IOObjects shall be duplicated and each inner Operator run in a seperate RapidMiner instance.
I have got two questions:
1. Would anybody besides me use such an Operator?
2. Have I missed anything? or Does anybody see a problem why my idea wouldn't work?
Greetings,
Michael
Though I only recently started writing RapidMiner Operators I want to tackle the parallization issue. I want to write a OperatorChain that runs it's inner Operators in parallel. As I know that parallel operation brings aroung much trouble I want to do it work around. All IOObjects shall be duplicated and each inner Operator run in a seperate RapidMiner instance.
I have got two questions:
1. Would anybody besides me use such an Operator?
2. Have I missed anything? or Does anybody see a problem why my idea wouldn't work?
Greetings,
Michael
Tagged:
0
Answers
this is quite an interesting idea. However, most process I use (as an example) have a very sequential character, i.e. operator i+1 waits from the output of operator i or work on different subsets of the data (e.g. crossvalidation). So ... which use cases do you have in mind when you speak of duplication ?
regards,
Steffen
I'm sorry, but I did discribe it wrong. I changed my earlier pos right now.
What I had in mind was dupplicating all Operator input so that I do not have to syncronize anything. Each Operator than shall be executed in its own RapidMiner instance. Therefore I would duplicate the Operators, too. Just to not get into trouble creating the new process instances. Of course every Operator should be executed only once. In fact only one duplicate of each operator.
The primary usecase i had in mind was grid parameter optimization. When you want to optimize your parameters and one rans from 1 to 10, you could split it up in two operators with once paramers from 1 to 5 and the other one from 6 to 10. Probably you would have to compare the two results by hand but that would do.
Best regards,
Michael
ah me stupid. Here is another thought (I dont want to discourage you, I just like to be the devil's advocate):
If you just want to copy all operators and input etc. you could just start rapidminer twice (from console, with different parameter ranges for GridParameterOptimization) and write the parameters and final performance vectors out to compare them manually (in a separate rapidminer process).
kind regards,
Steffen
your absolutly right. That was exactly the reason why I asked that question. I personally would prefere having all parts of my process in one process file.
And perhaps somebody can think of an other even more useful use case.
Greeting,
Michael