The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Getting training and testing sets from KennardStoneSampling operator
Hi,
I have a dataset that I wish to split into a training set and a testing set. I wish to use the KennardStoneSampling operator but it seems like that will only provide me with the training set. How do I get the remaining compounds which were not selected as the testing set?
I have a dataset that I wish to split into a training set and a testing set. I wish to use the KennardStoneSampling operator but it seems like that will only provide me with the training set. How do I get the remaining compounds which were not selected as the testing set?
0
Answers
if I got you correct, you want to do the sampling algorithm something its not intended for. If you have one dataset, you might sample with the KennardStoneSampling, so that an equi distributed smaller sample remains. Thus, it selects some examples from the input set and returns them as output set. If you want to split your exampleSet into training and test set, you should use the SimpleValidation Operator. Take look into the operator description to understand how it works. You then probably will test a classifier's performance in combination with the sampling best, if you sample the training data but not the test data!
Greetings,
 Sebastian
Basically, in my field of research, one method to derive a training set and testing set from a dataset is to use the Kennard and Stone algorithm. The algorithm will select a set of distributed objects which can serve as a training set. The remaining objects which are not selected will be less distributed than the ones that were selected but will be similar to those selected. Hence, these objects will be useful as a testing set to gauge the performance of the model.
I guess I have to look at the source code of KennardStoneSampling operator and see how I can modify it to be like the SimpleValidation operator.