The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Process failed exception, any clue?
confusedMonMon
Member Posts: 14 Learner III
I've created a process model that works fine on a sample dataset. However, when I run the process on my whole dataset it gets failed. I'm not sure is it because of the size of the processed files/documents? is there any size limit for the procssed documents in rapidminer? or is it something wrong with the process itself? The exception I'm getting:
Thank you
- Exception: java.lang.StackOverflowError
- Message: null
- Stack trace:
- sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
- sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
- sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
- java.lang.reflect.Constructor.newInstance(Constructor.java:423)
- java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
- java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1005)
- com.rapidminer.studio.concurrency.internal.AbstractConcurrencyContext.collectResults(AbstractConcurrencyContext.java:206)
- com.rapidminer.studio.concurrency.internal.StudioConcurrencyContext.collectResults(StudioConcurrencyContext.java:33)
- com.rapidminer.studio.concurrency.internal.AbstractConcurrencyContext.call(AbstractConcurrencyContext.java:141)
- com.rapidminer.studio.concurrency.internal.StudioConcurrencyContext.call(StudioConcurrencyContext.java:33)
- com.rapidminer.Process.executeRootInPool(Process.java:1349)
- com.rapidminer.Process.execute(Process.java:1314)
- com.rapidminer.Process.run(Process.java:1291)
- com.rapidminer.Process.run(Process.java:1177)
- com.rapidminer.Process.run(Process.java:1130)
- com.rapidminer.Process.run(Process.java:1125)
- com.rapidminer.Process.run(Process.java:1115)
- com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)
- Cause
- Exception: java.lang.StackOverflowError
- Message: null
- Stack trace:
- java.util.regex.Pattern$Branch.match(Pattern.java:4606)
- java.util.regex.Pattern$GroupHead.match(Pattern.java:4660)
- java.util.regex.Pattern$LazyLoop.match(Pattern.java:4849)
- .........................
Thank you
Tagged:
0
Best Answer
-
David_A Administrator, Moderator, Employee-RapidMiner, RMResearcher, Member Posts: 297 RM ResearchHi,text processing can be quite memory expensive. Simply keeping the different strings in memory has always some overhead and even filtering with simple regex requires additional memory. So it's really hard to say at which point it will break in your example, all I can say is, that it does not look like a general bug on our side.I would suggest to try if you can process the text from one file (the largest one). If this works, your memory in general is sufficient for the task and you can work on reducing the memory footprint of your process.For example:1) disable the parallel execution of the loop files operator (with that, not several files are loaded at once)2) store some intermediate results (for example all the created documents after "Read Documents") and then do the regex filtering for each document independentlyBest,David7
Answers
David
I've tried it again with (1) 1.5 GB of available memory, on a different machine,
(2) Get rid of all Filter Tokens operators with complex regex.
(3) Change the match condition in the Filter Tokens operator to contains with a single word.
Still have the same problem. Any clue how to fix it? Thanks
I tried to run the process again from another machine that has 24 GB of available memory, got rid of the parallel operators and regular expressions, and still got the same exception. At one point I tried to run it with only one filtering operator and unfortunately got the same exception.
However, what I did to make it work is extracting the zipped files and filter them based on their type before getting them processed by rapidminer. I used to have nested loops as part of my process model. Now it seems to work fine.
Thank you again for your help and suggestions.
As I couldn't run Rapidminer for the whole dataset at once, because it has tens of folders (and the results will spread over tens of files), I'm looking for a way to automate this. So, just wondering if there is a way to run a rapidminer process using the command prompt on different small datasets and save the results as we go, instead of fixing the read folders and save files parameters manually from the GUI.
Many thanks