The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
HBOS memory issue in rapidminer studio 9.3.001
Hi all,
There was a previous thread about this issue, but that did not solve my problem.
I have a dataset with 13 features. I use HBOS from the analmaly extension version 2.4.001.
If i sample my dataset down to 100 items and then apply HBOS, studio will still run out of memory. It uses up to the max of 30Gb and then stops with an error after several minutes. It seems studio spends more time on garbage collection than on the actual algorithm.
Any helpful suggestions are welcome.
Kind regards,
Maarten
There was a previous thread about this issue, but that did not solve my problem.
I have a dataset with 13 features. I use HBOS from the analmaly extension version 2.4.001.
If i sample my dataset down to 100 items and then apply HBOS, studio will still run out of memory. It uses up to the max of 30Gb and then stops with an error after several minutes. It seems studio spends more time on garbage collection than on the actual algorithm.
Any helpful suggestions are welcome.
Kind regards,
Maarten
Tagged:
0
Answers
i think there was a known issue with Date-Time attributes. Can you please check if your data set contains dates?
Best,
Martin
Dortmund, Germany
It seems something in the dataset is triggering a problem. Also if i place a select attributes component before HBOS and select 1 attribute, the HBOS component will still show all 10 attributes when using the 'single' option. Also if i remove attributes using the selector in HBOS and apply changes, it will once again user all 10 attributes.
Dortmund, Germany
Sorry I 'm just coming to confirm that the bug is due to the missing values and that the only thing to do is what you have done : impute missing values.
In deed, after reflexion and introspection, I think that a such algorithm (outlier detection) can not natively handle missing values,
so the best strategy here (and in general too) is to impute the missing values, what you have done...
Of course the tricky part is to find the best algorithm or method (mean, median etc.) to impute the missing values...
It's the result of my humble reflexion on this subject but I'm not an expert in outlier/anomaly detection and I will be happy if someone
can add some thoughts and/or correct me if I'm wrong.
Regards,
Lionel
Dortmund, Germany
An educated guess is that like it’s not missing the condition for handling null values but that the null condition does not close where it should.
Just to make sure, is the code on GitHub? I can take a look later today.
All the best,
Rod.
build.xml indeed mentiones 2.4.001 as version.
Of course i cannot be sure that the source where the current extention was built from.