The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Gradient Boosted Tree Algorithm performance
I am working with Gradient boosted tree (GBT), and it performs better (5-Fold CV) on most of my datasets with high metrics like AUC (1.0), kappa (0.971), etc. I can correlate the results with the capabilities of GBT like regularization and sequential learning. I even set aside 30 percent data for testing after five-fold cross-validation and got kappa (0.974) for this unseen data.
My question is, are there any cautions or factors that need to be considered while using and interpreting results of a GBT and how good is GBT in real applications?
Thanks
My question is, are there any cautions or factors that need to be considered while using and interpreting results of a GBT and how good is GBT in real applications?
Thanks
Regards,
Varun
https://www.varunmandalapu.com/
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Tagged:
0
Best Answers
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornGBTs are great in terms of predictions. In terms of interpretability, I think they are somewhat harder because the trees are boosted and not independent (so less interpretable than an Random Forest, in my view). But as long as you are using other ways to communicate model results (including some of the great tools in RapidMiner like simulation and explaining predictions) then they are fine.
You did mention AUC of 1.0 and that is pretty much perfect separation, so also make sure that you don't have any data leakage or sample contamination going on. Nothing is worse than deploying a model in production and watching its performance collapse!5 -
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist@varunm1 ,sorry i am a bit busy. But to clarify: Are you sure that each ID is really independed from the other? These are really different customers or different machines etc? These are NOT correlated examples like the same customer in different years or an item generated in the same batch than others?Best,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5
Answers
Dortmund, Germany
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Dortmund, Germany
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
https://towardsdatascience.com/when-cross-validation-fails-9bd5a57f07b5
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
This is especially used when there are multiple samples per subject in the dataset.
Thanks a lot for your support.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Ingo
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks for your response, looks like it is doing the same. I tested the CV with a split on batch attribute, the performance metrics are the same as process provided by Ingo. Any suggestion on doing similar cv with different folds (5 or 10) rather than testing on an individual batch. This is because once I select the CV with "split on batch attribute" the option for the number of folds disappears.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Sorry if it is confusing.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
you can go for Generate Attribute with
batchid = id%5
then use Set Role to make this the role "batch" and use the batch option of x-val.
BR,
Martin
Dortmund, Germany