Issue found in feature weight of RandomForest for regression

marcin_blachnik · September 2020

It seems that there is an issue or a bug in the feature_weights returned by RandomForest operator, but only for regression. I found that problem on one dataset but I reconstructed it on IRIS dataset for which features a3 and a4 are the most important but according to the regression RandomForest these two features are the least important.
I evaluated other implementations of RandomForest for regression which returns correct weights (weights which are expected).

Best regards
Marcin

Telcontar120 · September 2020

I had submitted a big quite some time ago regarding the RandomForest weights. It looks like it may still be uncorrected and this is another example of the same underlying issue.

marcin_blachnik · September 2020

Hi

I'm surprised that such requests are ignored. Many use RandomForest weights as a feature importance indicator and make serious decisions based on it.

It would be also nice if someone from RM would answer "thank you, we will analyze the reported issue" but there is no response.

Below I attach another process where it can be seen that the attribute with pure noise is the second most important variable according to RapidMiner implementation of RandomForest (the most important also seems to be attribute selected by chance). Because the trees are simple (5 trees of depth 5) one can count how many times each attribute appeared as a decision node. The noise variable is the least important.

MartinLiebig · September 2020

Hi,

I have the odd feeling, that the weights generation does not take the number of examples into account, but just sums the gain node. Would this explain the behaviour?

~Martin

marcin_blachnik · September 2020

HI

I haven't checked the source code but I have a feeling that the problem is deeper. In the example from my previous post, where the Random Forest consists of 5 trees it can be seen that the noise attribute A5 appears only twice in the trees, while A3 and A4 appear the most often. For classification, the weights work correctly so I think that may be related to the criterion and its properties.

Never the less it would be great if RM correct it in the upcoming release.

Best regards

gmeier · January 2021

Hi @marcin_blachnik,

thank you for the bug report. We found the problem and fixed it. It will be part of the next release.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Issue found in feature weight of RandomForest for regression

Answers