The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Combining (classified) example sets
Hello,
I have a pretty straightforward classification task, and I'm experimenting with a variety of classifiers (thanks for making this so easy!).
Unfortunately, because of overlap in the features of my training data, I can not use straight cross-validation—if I were to, some data from my training would leak into the test set. So: I've created five splits of my data, training and test pairs which have no overlap. I've set up five replicated model learning and application, so now I have the classified output of these five models.
Here is my question: What block can I use to merge the resulting example sets so I can have one overall performance measure? Using the "Append" set operation does't work because the attributes aren't matched (is this because the example sets include both real and categorical?).
Cheers,
Rony
I have a pretty straightforward classification task, and I'm experimenting with a variety of classifiers (thanks for making this so easy!).
Unfortunately, because of overlap in the features of my training data, I can not use straight cross-validation—if I were to, some data from my training would leak into the test set. So: I've created five splits of my data, training and test pairs which have no overlap. I've set up five replicated model learning and application, so now I have the classified output of these five models.
Here is my question: What block can I use to merge the resulting example sets so I can have one overall performance measure? Using the "Append" set operation does't work because the attributes aren't matched (is this because the example sets include both real and categorical?).
Cheers,
Rony
Tagged:
0
Answers
I have about 2000 or so training examples which encode features from sometimes overlapping periods of time. I have some custom code which chooses randomized training sets then removes any examples from the held out testing set which overlap temporally (and hence have some of the same features) the training. Keeping track of the indices of overlapping training data would be hell.
It seems like this would be something that would come up, and I even found a module in the javadoc: com.rapidminer.operator.preprocessing.join.ExampleSetMerge, but not the corresponding block in the GUI.
Cheers,
Rony