"Rapidminer random crashes"
I am having some severe problems with random crashes. I added a loop parameters operator to incrementally increase the population size of an optimize selection (evolutionary,parallel) operator. Nothing extreme, just nine increments from 5-50 and I am using six threads. There is 50% available memory while running the process so memory is not the issue.
The problem is that sometimes it works fine and then other times it crashes with an error. I don't get an error if I run the process without looping parameters. Here is a snapshot of the exception if that helps:-
Kind regards,
Alex
Exception: java.util.ConcurrentModificationException
Message: null
Stack trace:
java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
java.util.LinkedList$ListItr.next(LinkedList.java:888)
java.util.AbstractList.hashCode(AbstractList.java:540)
com.rapidminer.example.SimpleAttributes.hashCode(SimpleAttributes.java:84)
com.rapidminer.example.set.AbstractExampleSet.hashCode(AbstractExampleSet.java:366)
com.rapidminer.example.set.SplittedExampleSet.hashCode(SplittedExampleSet.java:174)
com.rapidminer.tools.ReferenceCache$TransparentWeakReference.(ReferenceCache.java:139)
com.rapidminer.tools.ReferenceCache$TransparentWeakReference.(ReferenceCache.java:131)
com.rapidminer.tools.ReferenceCache$Reference.(ReferenceCache.java:71)
com.rapidminer.tools.ReferenceCache$Reference.(ReferenceCache.java:59)
com.rapidminer.tools.ReferenceCache.newReference(ReferenceCache.java:204)
com.rapidminer.operator.ports.impl.AbstractPort.setData(AbstractPort.java:72)
com.rapidminer.operator.ports.impl.OutputPortImpl.deliver(OutputPortImpl.java:56)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:112)
com.rapidminer.operator.Operator.execute(Operator.java:1002)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:50)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:798)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:234)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:335)
com.rapidminer.operator.validation.SlidingWindowValidation.estimatePerformance(SlidingWindowValidation.java:141)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:285)
com.rapidminer.operator.Operator.execute(Operator.java:1002)
com.rapidminer.operator.executor.ParallelUnitExecutor$OperatorExecution.run(ParallelUnitExecutor.java:59)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Answers
Yes,
i know these errors. Best to make a feature request to RapidMiner that the data sets become thread safe
To avoid that you usually can sacrifice some memory and time by inserting Materialize Data operator as first operator in the subprocess. If that does not work: I would rather paralelize the outer loop and use the standard, serial optimize selection operator. Unfortunately for that you would need our Jackhammer extension as you otherwise don't have a parallel loop. See below link for more details.
Greetings,
Sebastian
Hi Sebastian,
I tried many things to solve the problem. I finally took out the loop parameters operater and setup a subprocess for each parameter. In my case that meant creating 10 sub processes which was not too bad. I have run the new process many times now without any crashes using the optimize selection evolutionary parallel without any issues.
It seems that combining a loop parameters operater with a threaded operator inside may work in some cases and crash in others. I will certainly make a feature request on this issue as it should work better.
Thank you for making me aware of your Jackhammer extension. I will take a look.
Kind regards,
Alex
I finally managed to get my process stable. Even separating out the loop parameters operator and creating individual subprocesses did not solve the problem completely. I was still getting random crashes. Random in the sense that it would run three or four times and then crash on the fifth.
Putting a Materialize Data operator inside the optimize parameters evolutionary seems to solve the problem. The side effect is greater cpu usage. I had to drop from six threads to four. Even with four threads I was hitting > 80% CPU usage. Processing times have gone up significantly (about 2x)
To be honest, this fix seems more like a hack than anything else but I have run out of ideas. The parallel processing extention seems very hit and miss. I could easily justify adding CPU's to this problem but I am concerned about scalability. Is anyone running processes on workstations with with greater than 20 threads? Are there any other issues to consider?
Kind regards,
Alex
Dear Alex,
any chance you can sent me the process and maybe even the data via pm or mail? I am happy to forward this to our development to have a look on it. Of course i cannot promise anything.
~Martin
Dortmund, Germany
Hi,
I can tell you that we are already aware of this particular issue and looking into it.
Regards,
Marco
Thanks Marco and Martin,
I will note any errors if they pop up and send them in if that will be helpful.
Alex