The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Implementation of Random Forest (versus Decision Forest?)
Hi all,
I'm wondering how the RapidMiner RandomForest classifier is implemented. It seems to me that there are significant differences to the version of Breiman (BREIMAN, L: Random Forests Machine Learning, 45, 5–32, 2001).
Main features of Random Forests are:
- each tree grows on his individual bootstrap sample set
- at each node of the trees, a defined number of features is randomly selected and evaluated for the best split
Is the RapidMiner RandomForest classifier working like that? Are individual trees grown on bootstrap samples? And I suppose the number of features is rather determined for the hole tree, not for each node (?). If so it would rather resemble the "Decision Forest" of Ho ( Ho, T.K. 1998: The Random Subspace Method for Constructing Decision Forests. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 8, AUGUST 1998).
The WEKA-version of the Random Forest classifier seems to follow Breiman's concept I guess, so this could be the choice anyway, however: the "Weight by Tree Importance"-operator which I would like to use does not work with the WEKA-version.
Thanks in advance.
Ollestrat
I'm wondering how the RapidMiner RandomForest classifier is implemented. It seems to me that there are significant differences to the version of Breiman (BREIMAN, L: Random Forests Machine Learning, 45, 5–32, 2001).
Main features of Random Forests are:
- each tree grows on his individual bootstrap sample set
- at each node of the trees, a defined number of features is randomly selected and evaluated for the best split
Is the RapidMiner RandomForest classifier working like that? Are individual trees grown on bootstrap samples? And I suppose the number of features is rather determined for the hole tree, not for each node (?). If so it would rather resemble the "Decision Forest" of Ho ( Ho, T.K. 1998: The Random Subspace Method for Constructing Decision Forests. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 8, AUGUST 1998).
The WEKA-version of the Random Forest classifier seems to follow Breiman's concept I guess, so this could be the choice anyway, however: the "Weight by Tree Importance"-operator which I would like to use does not work with the WEKA-version.
Thanks in advance.
Ollestrat
0
Answers
Would be great to get some help as I am completely lost on the code level.
I have forwarded your question to one of our developers, maybe they can tell us more.
However, at least about the bootstrapping I can say something: No, this would at least only happen with a very small probability for decent data sets. Bootstrapping basically only means sampling with replacement where the sample size most often is the size of the original data set. If you use a sample ratio of 1 for a data set consisting of n examples, you will end up with n examples but several of them might be used more than once. Actually about 63% of the original set will be used, the rest is not part of the sample (but probably will be for another tree).
I am not sure about where random attribute sets are used (I think it's per node but it also might be per tree). Maybe one of our developers can look it up (I would do this myself but I am currently not in my office...)
Cheers,
Ingo