ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
RapidMiner AutoModel customize validation set
christos_karras
Member Posts: 50 Guru
I would like to make a feature request for RapidMiner AutoModel: it should be possible to customize the way the training and validation data is split. I often work with time series data, and in this type of data there are frequently correlations between rows that are close in time. AutoModel is splitting the training and validation set randomly, which means that information from the validation set leaks into the training set because of the correlation between nearby rows. Therefore, AutoModel always overestimates how good the model will be on new data. AutoModel should allow selecting an alternative training-validation splitting method, for example Linear sampling. Also, for cases where the built-in methods are not adequate, it should be possible to specify a custom validation set to AutoModel, to provide the flexibility to use any method to split the datasets before trying them in AutoModel.
Tagged:
1
Comments
Can you provide more details on what kind of feature you are looking for in auto model?
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thanks, I looked at it. Actually, the current auto model is not intended for time series data as you already mentioned the reasons, I am not going in-depth. But still, you can use the auto model process for time series if you add appropriate operators like windowing before the data is fed to the ML model. This needs manual customization by going into the process after auto model completes running the process. You can always open the process in the auto model and carefully make changes in the process. In many instances, I change the split to cross-validation and @Noel uses auto model process for time series forecasting using windowing operators. It is a bit challenging at first as there are many connections in the auto model that needs to be taken care of while customizing manually, but the operator arrangement in 9.4 auto model is far better than earlier versions, thanks to @IngoRM for that.
@IngoRM might inform you if there are any plans for time series. I think he will definitely have some.
Just my 2c.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I understand that AutoModel is not intended to completely replace a customized process, but rather it is just a way to get started faster. However, if AutoModel overestimates the performance of some model types, it may lead to taking the wrong direction for further customizations. For example, with an incorrect training-validation split, AutoModel could determine that a random forest is the best option, but then when I try with a customized process I could find that a random forest is not so good and it would have been better to use a linear model.
Ingo
One needs to be quite careful editing the exported Auto Model process to "convert" for use with time series. Personally, I did not appreciate all the interconnections that exist therein. (And I think there was a wholesale change in 9.4.)
@IngoRM will always keep us in hot seat with new releases
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing