The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Normalization uses lots of memory?
Hello,
I am using RM for feature selection in my large data set and the process always runs out of memory during the range normalization operator.
Here are some specs of the data set:
- 7200 examples
- 155335 numerical attributes
- double_sparse_array for data management
- 2.1Gb CSV, about 3.5Gb in memory after the "Read AML" operator
The machine has 16Gb available and -Xmx16000m is used for java to use it. The memory profile seems ok (<6Gb) until "range transformation normalization", there the memory usage explodes and the process dies. For now I managed to avoid this by sampling about 2500 examples for further processing.
Does anybody have experiences in this kind of problems?
Greetings, Harald
I am using RM for feature selection in my large data set and the process always runs out of memory during the range normalization operator.
Here are some specs of the data set:
- 7200 examples
- 155335 numerical attributes
- double_sparse_array for data management
- 2.1Gb CSV, about 3.5Gb in memory after the "Read AML" operator
The machine has 16Gb available and -Xmx16000m is used for java to use it. The memory profile seems ok (<6Gb) until "range transformation normalization", there the memory usage explodes and the process dies. For now I managed to avoid this by sampling about 2500 examples for further processing.
Does anybody have experiences in this kind of problems?
Greetings, Harald
0
Answers
I assume that you data is sparse since otherwise you wouldn't have checked the sparse array data management. Without seeing the actual data I cannot be sure, but I could imagine that during the transformation, the data is changed and data becomes much less sparse then and hence the additional memory is necessary. Did you have tried to select the parameter "create view" for the normalization operator? In that case, the data is transformed on the fly and this should cause no additional memory usage.
Cheers,
Ingo
UPDATE: Works perfectly!
Greetings, Harald