The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Weights in a weka random forest model
misanthropic789
Member Posts: 8 Contributor II
First off, apologies if any of this sounds ignorant. I'd appreciate any insight, as I am making the transition from working with SPSS doing logistic regression prediction models to exploring other types of models in rapidminer.
In a logistic regression model you are given the coefficients, which can be used to approximate the importance of each element in the model. You can use them to tell whether, for example, predictor A is 3x as important as predictor B. My management has gotten used to seeing that type of information.
I am now working with rapidminer and have hit upon a weka random forest model that is substantially better at predicting (per 10-fold x-validation testing) than my original logistic regression model. However my management is asking for some comparable indicator of comparative importance. Can anyone suggest an appropriate way to get that type of information? What objects? What should I be looking for?
Rebecca
In a logistic regression model you are given the coefficients, which can be used to approximate the importance of each element in the model. You can use them to tell whether, for example, predictor A is 3x as important as predictor B. My management has gotten used to seeing that type of information.
I am now working with rapidminer and have hit upon a weka random forest model that is substantially better at predicting (per 10-fold x-validation testing) than my original logistic regression model. However my management is asking for some comparable indicator of comparative importance. Can anyone suggest an appropriate way to get that type of information? What objects? What should I be looking for?
Rebecca
0
Answers
a big disantvantage of Random Forests is infact that it does not create an easily interpretable model (with respect to information about the data). It basically creates a number of decision trees on different random subsets of the input data. On application it applies all trees and returns the result of a majority vote of all trees. Thus is it de-facto impossible to learn anything from the model, even if it often produces good prediction results.
Best, Marius
My current plan is to include a Weight by Information Gain step in the run to get some comparative information for that purpose, even if it isn't quite the same as the final model. That should satisfy management while still giving me a chance to use the best model for my project.