The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"[SOLVED] Amount of expected memory usage with Read CSV"
Folks,
I'm running RapidMiner 5.0 on a 64bit windows machine. All I'm doing is reading in a CSV file consisting of ~500 attributes and 10,000 samples; all doubles. It's approximately 65M on disk. When I run it (not connecting the output to anything) the process takes up 3.6G of memory. This seems excessive for the small amount of data I'm reading in. Is there something I;m missing? I did a search on the forum and found a couple of other questions like this but no answers.
Any help is appreciated!
Thanks,
Dave
I'm running RapidMiner 5.0 on a 64bit windows machine. All I'm doing is reading in a CSV file consisting of ~500 attributes and 10,000 samples; all doubles. It's approximately 65M on disk. When I run it (not connecting the output to anything) the process takes up 3.6G of memory. This seems excessive for the small amount of data I'm reading in. Is there something I;m missing? I did a search on the forum and found a couple of other questions like this but no answers.
Any help is appreciated!
Thanks,
Dave
Tagged:
0
Answers
I can't give you any actual numbers, but yes, RapidMiner is quite memory hungry in some cases. However, sometimes it does not actually use all the memory that it claimed from the system; that's a particularity of all Java based programs. (that means that probably if you load additional data, the memory consumption does not increase much, but RapidMiner might reuse some of the memory it already claimed).
Btw, RapidMiner 5.0 is at least 2 years old. You can download the current version 5.2.9 from our website.
Best, Marius
Thanks,
Dave
I just did some testing here.
I created two .csv files, both had 500 attributes, one had 6000 examples, the second one 30000 examples. (I tried with 60000 examples, but after I filled the file, Notepad++ refused to open it (too big)). So I let RapidMiner open both, the first one (55MB) needed about 150MB of memory, the second one (275MB) needed about 750MB memory. Both were opened by the latest RapidMiner development version without any problems (I have 8GB of RAM on this machine). Note that these were .csv files filled with only double values.
Now for the .csv files with strings:
500 attributes, 6000 examples, each string consisted of 26 chars: 77MB file, RM needed ~1GB to load the data.
500 attributes, 30000 examples, each string consisted of 26 chars: 386MB file, RM needed ~3.5GB to load the data.
This leads me to this:
1) Please upgrade RapidMiner to the latest version.
2) If you still run into these kind of problems, please consider using a more appropriate way of storing big amounts of data, e.g. a database or if you can't switch from .csv, try using multiple files. A 500MB .csv file is not the most efficient way of doing things - I couldn't even open it via Notepad++ on my machine.
Regards,
Marco
I did upgrade just to stay up to date and saw the same general pattern.
Thanks again!
Dave
we're happy that you found the cause for your problems. But I have to say that: basically the first reply to your first post refers to the java garbage collection mechanism
Anyway: happy mining!
-Marius