RapidMiner AutoModel customize validation set

christos_karras · October 2019

I would like to make a feature request for RapidMiner AutoModel: it should be possible to customize the way the training and validation data is split. I often work with time series data, and in this type of data there are frequently correlations between rows that are close in time. AutoModel is splitting the training and validation set randomly, which means that information from the validation set leaks into the training set because of the correlation between nearby rows. Therefore, AutoModel always overestimates how good the model will be on new data. AutoModel should allow selecting an alternative training-validation splitting method, for example Linear sampling. Also, for cases where the built-in methods are not adequate, it should be possible to specify a custom validation set to AutoModel, to provide the flexibility to use any method to split the datasets before trying them in AutoModel.

varunm1 · October 2019

Hello @christos_karras

Can you provide more details on what kind of feature you are looking for in auto model?

christos_karras · October 2019

Hi @varunm1, see my edited description (the original comment was saved before I finished writing the details). Thanks

varunm1 · October 2019

Hello @christos_karras

Thanks, I looked at it. Actually, the current auto model is not intended for time series data as you already mentioned the reasons, I am not going in-depth. But still, you can use the auto model process for time series if you add appropriate operators like windowing before the data is fed to the ML model. This needs manual customization by going into the process after auto model completes running the process. You can always open the process in the auto model and carefully make changes in the process. In many instances, I change the split to cross-validation and @Noel uses auto model process for time series forecasting using windowing operators. It is a bit challenging at first as there are many connections in the auto model that needs to be taken care of while customizing manually, but the operator arrangement in 9.4 auto model is far better than earlier versions, thanks to @IngoRM for that.

@IngoRM might inform you if there are any plans for time series. I think he will definitely have some.

Just my 2c.

christos_karras · October 2019

Yes, as you explained I have to resort to customizing the process generated by AutoModel, which takes longer compared to simply using AutoModel. While having more in-depth support for time series built-in directly AutoModel would be great, in the short term I think adding the ability to customize the validation set would be an easy way to make it more useful for time series data, or any other case where a random split is not adequate.

I understand that AutoModel is not intended to completely replace a customized process, but rather it is just a way to get started faster. However, if AutoModel overestimates the performance of some model types, it may lead to taking the wrong direction for further customizations. For example, with an incorrect training-validation split, AutoModel could determine that a random forest is the best option, but then when I try with a customized process I could find that a random forest is not so good and it would have been better to use a linear model.

IngoRM · October 2019

Stay tuned on Auto Model for time series

And on the second point of trying out the model on another independent data set quickly: the new Model Ops (Deployments) view in RM 9.4 makes this really simple now. Check out the videos below:

https://academy.rapidminer.com/learn/video/rapidminer-model-operations-introduction

https://academy.rapidminer.com/learn/video/rapidminer-model-operations-deployment

https://academy.rapidminer.com/learn/video/rapidminer-model-operations-management

I recommend to watch them all three, but the second one would cover the "Scoring" functionality which is what you would need to do...

Hope this helps,
Ingo

Noel · October 2019

@IngoRM Can't wait for Auto Model for time series!!

Noel · October 2019

@christos_karras (& @varunm1)-

One needs to be quite careful editing the exported Auto Model process to "convert" for use with time series. Personally, I did not appreciate all the interconnections that exist therein. (And I think there was a wholesale change in 9.4.)

varunm1 · October 2019

@Noel

@IngoRM will always keep us in hot seat with new releases

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

RapidMiner AutoModel customize validation set

Open for Voting · Last Updated March 2020

Comments

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing