The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Auto Model data set split using choice (e.g. linear sampling)
Hello, I wonder if it is possible to indicate e.g. linear sampling split for the training and test data set generation within the module "Auto Model".
Somehow the predicted values are far to good, so that the it would be better for my data set to use linear sampling to split the data set.
Of course it would be possible to do so after Auto Model using the stored process, but for convince it might better to chose first hand.
Thank you.
Somehow the predicted values are far to good, so that the it would be better for my data set to use linear sampling to split the data set.
Of course it would be possible to do so after Auto Model using the stored process, but for convince it might better to chose first hand.
Thank you.
Tagged:
0
Best Answer
-
ceaperez Member Posts: 541 UnicornHi @behnish,The Auto model perform a lot of operations automatically using a standard good practices for ML. Each model created using these good practices has a lot of parameters and its unmanageable from a panel.the best solution is to run a Auto model and then go into the model and adap itRegards.0
Answers
Hello, it looks like the Auto model is designed to extract interleaved training and test sets at a ratio of 0.6 to 0.4 over the whole example set range. The model gives then a very good regression with my dataset.
Creating the Model based on training and testing data sets using linear sampling (0.9 -0.1) resulted in an about 4 times worse performance. This indicates that the model needs further steps to get more generalized and the importance of the training set preparation.
Thus, it would be still nice to have a choice for data set splitting in the Auto model.
In addition, the problem remains how to further optimize the Model to get more generalized. One way could be to run the Model using a variety of data set splitting to optimize the Model parameters or to add random noise levels into the data, like in Image recognition approaches.