HBOS memory issue in rapidminer studio 9.3.001

MaartenK · September 2019

Hi all,

There was a previous thread about this issue, but that did not solve my problem.
I have a dataset with 13 features. I use HBOS from the analmaly extension version 2.4.001.
If i sample my dataset down to 100 items and then apply HBOS, studio will still run out of memory. It uses up to the max of 30Gb and then stops with an error after several minutes. It seems studio spends more time on garbage collection than on the actual algorithm.
Any helpful suggestions are welcome.

Kind regards,

Maarten

MaartenK · September 2019

I did some more experimenting and in believe this to be a bug in the anomaly extention HBOS component. If i add the HBOS to the process freshly it will run once. After applying any changes to the model the above behaviour occurs.

MartinLiebig · September 2019

Hi @MaartenK,
i think there was a known issue with Date-Time attributes. Can you please check if your data set contains dates?
Best,
Martin

MaartenK · September 2019

Hi mschmitz The dataset did contain date fields. However i replaced them with numerical fields using DateToNumerical. The dataset now contains 1 label (polynominal), 3 integers, 2 polinominal and 5 reals. It contains 100 items.

It seems something in the dataset is triggering a problem. Also if i place a select attributes component before HBOS and select 1 attribute, the HBOS component will still show all 10 attributes when using the 'single' option. Also if i remove attributes using the selector in HBOS and apply changes, it will once again user all 10 attributes.

MartinLiebig · September 2019

Hi,

can you please try to add a 'materialize' operator right infront of HBOS? That may work.

Best,

Martin

MaartenK · September 2019

Thanks for the swift response. This did not solve my issue. Meanwhile I asked permission to share the dataset with you. It is an educational dataset. When permission is granted I can share the model and dataset with you for reproduction.

MaartenK · October 2019

I tried to do some more steps to reproduce the issue. It seems the problem may be triggered by missing values in the dataset. Pls find attached 2 models and 2 datasets. The sample with 100 items containing missing values triggers the memory issue. The sample with 100 items containing no missing values is processed in a split second.

lionelderkrikor · October 2019

Hi @MaartenK,

Sorry I 'm just coming to confirm that the bug is due to the missing values and that the only thing to do is what you have done : impute missing values.

In deed, after reflexion and introspection, I think that a such algorithm (outlier detection) can not natively handle missing values,
so the best strategy here (and in general too) is to impute the missing values, what you have done...
Of course the tricky part is to find the best algorithm or method (mean, median etc.) to impute the missing values...

It's the result of my humble reflexion on this subject but I'm not an expert in outlier/anomaly detection and I will be happy if someone
can add some thoughts and/or correct me if I'm wrong.

Regards,

Lionel

MartinLiebig · October 2019

This is a clear bug. The extension is not from RM, but open soruce. i've tried to get it to gradle for.. 30min and didn't make it. So it's tough for us to add the check for missings.

Any Java guru here to help? Maybe @rfuentealba ?

rfuentealba · October 2019

Hello,

An educated guess is that like it’s not missing the condition for handling null values but that the null condition does not close where it should.

Just to make sure, is the code on GitHub? I can take a look later today.

All the best,

Rod.

MaartenK · October 2019

Thanks for the support. Sourcecode seems to be here. https://github.com/Markus-Go/rapidminer-anomalydetection
build.xml indeed mentiones 2.4.001 as version.
Of course i cannot be sure that the source where the current extention was built from.

MaartenK · October 2019

In the meantime i mailed with Markus Goldstein. He let me know that he currently has a student working on new operations and will have him take a look into the HBOS preconditions afterwards.

sgenzer · October 2019

@MaartenK if you could please connect me with Markus I would appreciate it!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

HBOS memory issue in rapidminer studio 9.3.001

Answers