The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Aggregation / compression instead of forecast / prediction
nicugeorgian
Member Posts: 31 Maven
Hi,
I have a data set with both nominal and numerical attributes and a numerical label.
I'm trying to fit some regression tree on this set.
I would like to use the regression tree as an aggregation / compression of the data set rows and not as a forecast. Concretely, my regression tree is not going to be applied/shown to unseen data! So, the overfitting would not be problem in this case! Of course, I should avoid ending up with so many tree leaves as rows in the data set (that wouldn't be an aggregation anymore )
The goal is, however, that the trained model (the regression tree) "predicts / reflects" as much as possible the training data.
Would the regression tree (Weka W-M5P) be the best solution for this problem? If yes, how shall I choose the algorithm's parameters?
I think it would be better if I select the option "no-prunning" ...
Any ideas?
Thanks!
I have a data set with both nominal and numerical attributes and a numerical label.
I'm trying to fit some regression tree on this set.
I would like to use the regression tree as an aggregation / compression of the data set rows and not as a forecast. Concretely, my regression tree is not going to be applied/shown to unseen data! So, the overfitting would not be problem in this case! Of course, I should avoid ending up with so many tree leaves as rows in the data set (that wouldn't be an aggregation anymore )
The goal is, however, that the trained model (the regression tree) "predicts / reflects" as much as possible the training data.
Would the regression tree (Weka W-M5P) be the best solution for this problem? If yes, how shall I choose the algorithm's parameters?
I think it would be better if I select the option "no-prunning" ...
Any ideas?
Thanks!
0
Answers
if the regression tree is the best algorithm depends on your needs. If you want an understandable model, choose it. Otherwise different alternatives are possible and possibly better. But you might to have to transform your data then, because LinearRegression or SVMs don't support nominal values.
The best parameters for learners depend on your data, so you have to try it out.
Greetings,
Sebastian