Auto Model data set split using choice (e.g. linear sampling)

tomMEM · April 2021

Hello, I wonder if it is possible to indicate e.g. linear sampling split for the training and test data set generation within the module "Auto Model".
Somehow the predicted values are far to good, so that the it would be better for my data set to use linear sampling to split the data set.
Of course it would be possible to do so after Auto Model using the stored process, but for convince it might better to chose first hand.
Thank you.

ceaperez · April 2021

Hi @behnish,

The Auto model perform a lot of operations automatically using a standard good practices for ML. Each model created using these good practices has a lot of parameters and its unmanageable from a panel.

the best solution is to run a Auto model and then go into the model and adap it

Regards.

tomMEM · April 2021

Hello @ceaperez, thank u for the prompt response. Indeed, the Auto model gives a great overview about models and feature sets. Then that is the way to do it - adapt it afterwards.

Best. T

tomMEM · April 2021

Hello, it looks like the Auto model is designed to extract interleaved training and test sets at a ratio of 0.6 to 0.4 over the whole example set range. The model gives then a very good regression with my dataset.

Creating the Model based on training and testing data sets using linear sampling (0.9 -0.1) resulted in an about 4 times worse performance. This indicates that the model needs further steps to get more generalized and the importance of the training set preparation.

Thus, it would be still nice to have a choice for data set splitting in the Auto model.

In addition, the problem remains how to further optimize the Model to get more generalized. One way could be to run the Model using a variety of data set splitting to optimize the Model parameters or to add random noise levels into the data, like in Image recognition approaches.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Auto Model data set split using choice (e.g. linear sampling)

Best Answer

Answers