The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Creating equally sized clusters that are representative for the population
Kristjan_Mar
Member Posts: 2 Learner I
Hi all,
I have a set of data (population) with individuals that have signed up to be a part of a group. When they signed up they gave some background information, leaving me with 5 variables that I am mostly focusing on.
What I want to do is create 4 equally sized groups that are as representative for the whole population as possible. That is, I want to create 4 homogenous groups.
Also, I have some other columns in the dataset that are important in handling/using the dataset. I would like this information to be included in each of the groups (subsamples) so that they still match the respondent that they should belong to.
In short: How can I create four homogenous subsamples that are representative of the population, using only selected variables from the dataset?
Cheers, K
I have a set of data (population) with individuals that have signed up to be a part of a group. When they signed up they gave some background information, leaving me with 5 variables that I am mostly focusing on.
What I want to do is create 4 equally sized groups that are as representative for the whole population as possible. That is, I want to create 4 homogenous groups.
Also, I have some other columns in the dataset that are important in handling/using the dataset. I would like this information to be included in each of the groups (subsamples) so that they still match the respondent that they should belong to.
In short: How can I create four homogenous subsamples that are representative of the population, using only selected variables from the dataset?
Cheers, K
Tagged:
0
Best Answers
-
MarcoBarradas Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, Member Posts: 272 UnicornHi @Kristjan_Mar it seems you need to create 4 stratified samples of your data.
For that you need to use the Split Data operator with sampling type stratified.
Hope that helps you.0 -
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornI think I am confused about your wording of your intended outcome here---"as representative of the whole population as possible" and "homogeneous" are typically not synonymous. If you want the groups to be as representative of the whole as possible, you basically want random subsets, which you can accomplish easily by Split Data and choosing sampling type of shuffled. You would only need to select the sampling type of stratify if you first choose a nominal attribute as your label to stratify on, and you want to make sure that each resulting partition contains the same proportions of these label classes. I suggest you have a look at the tutorial and help explanation of the Split Data operator. (You can use Select Attributes prior to the split to only bring in the 5 attributes that you are interested in if you only want to look at those).
0
Answers