The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Classification accuracy (stacking)
keesloutebaan
Member Posts: 2 Learner I
Hey there,
I am currently working on a polynomial classification project. The goal is to reach the highest possible accuracy.
I found out that the 'deep learning' and the 'gradient boosted trees' operator work really well.
Now, I want to find out if stacking can improve the performance. However, I tried a few combinations but every time, the performance drops.
Can someone maybe tell me if there are any important rules to take into account when it comes to stacking? When is it helpful and what settings are then required?
Thanks a lot
Tagged:
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi,
the idea behind ensemble models like Stacking is that they improve the performance of non-perfect learners. But they can also create more complex, overfitted models.
Both GBT and to some extent Deep Learning are already complex ensemble models.
Stacking could only improve upon them if they had some systematic bias or error source, the errors were different, and the stacking model could somehow identify the right model for most cases that are predicted differently.
If any of these assumptions is not true, as is likely in your case, stacking or another model combination won't improve the result.
Regards,
Balázs6
Answers
as with any tree method, you can apply prepruning and postpruning.
Prepruning applies to the decision before creating a new split. maximal depth and min rows would restrict these, giving you a less complex (and maybe less overfitted) tree.
Postpruning is deciding after a split has been created. min split improvement would apply a statistical test on each split result and decide if it was worth it. This again reduces the tree complexity.
That said, GBT (like random forest) is meant to reduce the overfitting problem of decision trees, so it is entirely possible that your model won't become better by changing these settings. (Because it's already coping well with possibly overfitted trees.)
For other options, see the documentation. They might be very data dependent.
Regards,
Balázs