The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Classification accuracy (stacking)

keesloutebaankeesloutebaan Member Posts: 2 Learner I
edited November 2019 in Help
Hey there,

I am currently working on a polynomial classification project. The goal is to reach the highest possible accuracy.
I found out that the 'deep learning' and the 'gradient boosted trees' operator work really well.
Now, I want to find out if stacking can improve the performance. However, I tried a few combinations but every time, the performance drops.
Can someone maybe tell me if there are any important rules to take into account when it comes to stacking? When is it helpful and what settings are then required?
Thanks a lot

Best Answer

Answers

  • keesloutebaankeesloutebaan Member Posts: 2 Learner I
    edited November 2019
    Thanks so much, that saves me a lot of time! In that case, I will try to improve my GBT by tuning the parameters. Also, I noticed that the use of 'bagging' improves the performance. Do you maybe know what are the most important GBT parameters to play with? I started with number of trees, but there are a lot more.
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,
    as with any tree method, you can apply prepruning and postpruning.
    Prepruning applies to the decision before creating a new split. maximal depth and min rows would restrict these, giving you a less complex (and maybe less overfitted) tree.
    Postpruning is deciding after a split has been created. min split improvement would apply a statistical test on each split result and decide if it was worth it. This again reduces the tree complexity.
    That said, GBT (like random forest) is meant to reduce the overfitting problem of decision trees, so it is entirely possible that your model won't become better by changing these settings. (Because it's already coping well with possibly overfitted trees.)
    For other options, see the documentation. They might be very data dependent.
    Regards,
    Balázs
Sign In or Register to comment.