The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Splitting data
Hi,
I have what I consider a simple problem but due to poor understanding or perhaps poor documentation I cannot figure out how to:
Split a dataset of say 1000 observations into two separate datasets of say 700 and 300 observations respectively. That is, a operator that has two outputs and one input...
Is this done with the "Split Data" operator? If so, what are these "partitions" I need to define?
The split should be random, preferably with a predefined seed for reproducibility.
-frankie
I have what I consider a simple problem but due to poor understanding or perhaps poor documentation I cannot figure out how to:
Split a dataset of say 1000 observations into two separate datasets of say 700 and 300 observations respectively. That is, a operator that has two outputs and one input...
Is this done with the "Split Data" operator? If so, what are these "partitions" I need to define?
The split should be random, preferably with a predefined seed for reproducibility.
-frankie
0
Answers
Yes you can do it easily in RM. Take a look at the code below. It uses the operator "Split Data". It splits the iris dataset into 2 partitions: 70/30%. This info is fed to RM clicking the "Edit Enumeration" button. Notice you could have k partitions by adding k ratios.
If you select the option "local random seed" the partitions will be the same in repeated trials.
Hope this helps.