ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Feature Request: Batch validation with optional fold numbers
varunm1
Member Posts: 1,207
Unicorn
Dear All,
I have a simple feature request if possible could be added in the cross-validation operator. Currently, we have a "Batch Validation" option that helps to set different batches and divides folds based on the number of batches. I am looking for an enhancement that helps control the number of folds created using these batches.
For example, if I have data related to 100 subjects and each subject has 10 samples, there will be 1000 samples of data. If I need to do a Leave Once subject out Cross-validation, I need to set 100 batch ID's (one for each subject) and do a batch validation in Cross-validation operator. If I need to try only 5 batches where 20 students belong to each batch, I need to generate attribute again with 5 batch ids, instead of this, we can provide an option where it uses the 100 batch ID's created first as an index and divide the 5 subsets based on that.
This will help switch between Leave one batch out and groupKfold validations.
I have a simple feature request if possible could be added in the cross-validation operator. Currently, we have a "Batch Validation" option that helps to set different batches and divides folds based on the number of batches. I am looking for an enhancement that helps control the number of folds created using these batches.
For example, if I have data related to 100 subjects and each subject has 10 samples, there will be 1000 samples of data. If I need to do a Leave Once subject out Cross-validation, I need to set 100 batch ID's (one for each subject) and do a batch validation in Cross-validation operator. If I need to try only 5 batches where 20 students belong to each batch, I need to generate attribute again with 5 batch ids, instead of this, we can provide an option where it uses the 100 batch ID's created first as an index and divide the 5 subsets based on that.
This will help switch between Leave one batch out and groupKfold validations.
Regards,
Varun
https://www.varunmandalapu.com/
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Tagged:
2
Comments
function value_to_fold = batch_validation(batch_attribute, fold_count, seed): unique_values = unique(batch_attribute) randomly_permuted_values = randperm(unique_values, seed) value_to_fold = map() fold = 0 for value in randomly_permuted_values: value_to_fold.put(value, fold) fold = modulo(fold+1, fold_count)function training_set, testing_set = training_testing_split(exampleSet, batch_attribute_to_map, testing_fold): training_set = set() testing_set = set() for row in exampleSet: is_training = True is_testing = True for batch_attribute, value_to_folds in batch_attribute_to_map: assigned_fold = value_to_folds.get(exampleSet[row, batch_attribute]) if assigned_fold != testing_fold: is_testing = False if assigned_fold == testing_fold: is_training = False if is_training: training_set.add(exampleSet[row,:]) if is_testing: testing_set.add(exampleSet[row,:])which returns training and testing sets. Note that when more than one batch attribute is used, samples are not assigned into training_set or testing_set but rather into training_set, testing_set or not_used_in_this_split - this is the tax we have to pay for estimating model's generalization ability over multiple attributes at once.I apologize for the length of this post. But the ability to quickly get an estimate of the model's generalization ability across some id-like attribute (and possibly across time) is so useful that I felt compelled to write this post.
Dortmund, Germany