The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Problems loading and applying model
Hi
I am trying to read a model from file and then apply it to a new dataset. The Model is quite large (Weka Random Forest with 3500 trees). Whenever I try doing that I get the following error message:
> UserError occured in 1st application of ModelLoader (ModelLoader)
> G Apr 6, 2010 2:26:35 PM: [Fatal] Process failed: Could not read file
> '/home/ruhe/wd/SetC_3500zip.mod': Cannot read from XML stream, wrong
> format: GC overhead limit exceeded
I already tried writing the model using all 3 possible options given in the ModelWriter. Well the model is quite large (about 1GB) but I have it running on a machine with 27GB RAM ion a small set (20000 examples, 14 attributes) so I guess this is not a memory problem.
Timbo
I am trying to read a model from file and then apply it to a new dataset. The Model is quite large (Weka Random Forest with 3500 trees). Whenever I try doing that I get the following error message:
> UserError occured in 1st application of ModelLoader (ModelLoader)
> G Apr 6, 2010 2:26:35 PM: [Fatal] Process failed: Could not read file
> '/home/ruhe/wd/SetC_3500zip.mod': Cannot read from XML stream, wrong
> format: GC overhead limit exceeded
I already tried writing the model using all 3 possible options given in the ModelWriter. Well the model is quite large (about 1GB) but I have it running on a machine with 27GB RAM ion a small set (20000 examples, 14 attributes) so I guess this is not a memory problem.
Timbo
0
Answers
Please consider this link: http://rapid-i.com/wiki/index.php?title=Memory_Issues
regards,
steffen
thanks for your reply. Thanks you also for the link although it does not really help. The settings are such that the 27GB RAM are really used. I am pretty sure about that as building that model took about the same amount of memory. If the error message is due to a lack of heap space nontheless I might be in trouble...
Timbo
ok, it is weird that you are able to save the model but not to load it on the same machine (sorry, didnt notice this detail before, I shouldnt answer posts with a fuzzy mind ).
When I got some time to spare, I'll perform some experiments with the Weka-Random-Forest to reproduce the problem. In the meantime: Could you repeat the experiment using the RandomForest implementation of rapid-i ("Random Forest") and tell us whether it worked or not.
regards,
steffen
I did the same thing using the rapidminer RandomForest using the option "information_gain" as it produced the same results as the Weka RF when tested with smaller numbers of trees. Saving, reading and applying the model went fine. The only problem is that the results produced in such a way are completely weird. All examples in the set are classified as "1" with confidence(1)=1.0. The actual number of examples being "1" is only about 50%. To me this looks a lot like some kind of overtrainig. This is a bit surprising as Leo Breiman states in his paper that Random Forests can not be overtrained at all.
I'll try to perform some more tests and post the results but due to the large number of trees that might take til monday.
Timbo