The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Memory usage in EvolutionWeighting
What data does RM need to retain as it processes multiple generations in an EvolutionaryWeighting node? I have a process that is trying to optimize attribute weights for a NearestNeighbor model, and I'm finding that after a relatively small number of generations (as low as 15), all the memory allocated to RM has been used, and the java process freezes.
This is on 32-bit Win XP, with a few thousands records in an example set, and about a dozen attributes. The relevant process snippet is:
EvolutionaryWeighting (50 generations, 5 population_size, using intermediate weights file, tournament selection, keep_best_individual)
-MemoryCleanUp (attempt to limit memory usage)
-OperatorChain
--XValidationParallel (5 validations, shuffled sampling)
---NearestNeighbor (k=10, weighted vote)
---OperatorChain
----ModelApplier
----Performance
---ProcessLog (logging generation, best, performance)
My hope was that I'd be able to leave the process running for a long time, potentially days, to allow it to evolve throughout the search space for a best answer, but the reality right now is that RM stalls out. No error message, just a lack of processing. Even the system time display fails to update. My guess is some sort of Java memory management, but I don't know if the amount of memory in use should grow so high or not.
Any ideas?
Thanks,
Keith
This is on 32-bit Win XP, with a few thousands records in an example set, and about a dozen attributes. The relevant process snippet is:
EvolutionaryWeighting (50 generations, 5 population_size, using intermediate weights file, tournament selection, keep_best_individual)
-MemoryCleanUp (attempt to limit memory usage)
-OperatorChain
--XValidationParallel (5 validations, shuffled sampling)
---NearestNeighbor (k=10, weighted vote)
---OperatorChain
----ModelApplier
----Performance
---ProcessLog (logging generation, best, performance)
My hope was that I'd be able to leave the process running for a long time, potentially days, to allow it to evolve throughout the search space for a best answer, but the reality right now is that RM stalls out. No error message, just a lack of processing. Even the system time display fails to update. My guess is some sort of Java memory management, but I don't know if the amount of memory in use should grow so high or not.
Any ideas?
Thanks,
Keith
0
Answers
I made the experience that without using the ProcessLog operator my feature selection / weighting processes need much less memory ;D
The table view of the Process Log shows 3 values recorded for every member of the population within a generation. I.e. with 10 generations, and a population of 5, it records a total of 10*5*3 = 150 values. Doesn't seem like it's the cause of the memory growth that I'm seeing, unless it is actually creating a lot more data behind the scenes that isn't made visible.
I will certainly try turning off logging to see if that helps. However, I'd prefer not to disable process logging, as its one of the few ways to get any visibility into how RM is progressing during a lot process run.
I'm not sure if it is the XValidationParallel operator itself, or the fact that the learner/applier/performance combination ends up being run so many more times that causes the memory usage to grow as high as it does. And I'm still not sure if this is expected behavior or a bug. But at least I know more about what's causing it, and for now I can run the model without cross-validation.
But nevertheless there also might be a second reason for the high memory usage: there seems to be a GUI related memory leak in cases where results are displayed at a breakpoint or at the end of the process. In many cases, the resources are not freed (at least not for a long time) after the results are displayed once. The whole team is currently profiling RapidMiner and searching for this leak but as I have to admit we did not have any success yet. I will let you know as soon as we found the reason and will deliver an updated version as soon as possible.
For now, I assume that removing the ProcessLog was probably related to this GUI memory leak. And the additional amount of used memory for the parallel cross validation is quite normal.
We will keep you updated about that.
Cheers,
Ingo
One further clarification: RM is running on a single-CPU, dual-core system, so the parallel xval should be broken up into 2 subprocesses. After the XValidationParallel node is completed, should the memory usage return to its pre-XVal level, or will there be some portion of that memory that RM retains? What I was seeing was a gradual increase in memory usage as XValidationParallel was called multiple times (even after removing ProcessLog), not just a temporary spike during the execution of the node. And this was true even though I included a MemoryCleanup node in the EvolutionaryWeighting inner loop.
Thanks again,
Keith