The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Group-based sampling

chrisleongchrisleong Member Posts: 4 Contributor I
edited November 2018 in Help
I have a data set of users, each who attend a school. I want to filter out the data so that only the data from a random number of schools is displayed. My current method involves using `Select attribute` to select a list of schools (once per attending student), applying `Remove duplicates`, applying sampling the schools, then joining with the original dataset.

Four operations seems rather complicated for such a simple operation, so I was wondering if there was a better way.

Answers

  • frasfras Member Posts: 93 Contributor II
    Maybe operator "sample (stratified)" is your friend. Attribute "school" must have role "label".
  • chrisleongchrisleong Member Posts: 4 Contributor I
    Thanks for that suggestion. It's helpful because I didn't know how to choose the attribute for a stratified sample before. Unfortunately, what I am trying to do is pick random groups (schools) and select all users in these groups - as opposed to taking a sub-sample with the same distribution (among schools) as the full set.
  • chrisleongchrisleong Member Posts: 4 Contributor I
    I also had to add a Filter Examples, with no missing attributes as otherwise it mightn't select the desired number of schools

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Chrisleong,

    there are so many "easy operations" around that we decided not to implement all of them into their own operator, as long as it is possible to achieve the same result with a combination of other operators. So what you are doing is the exactly correct way.

    Best regards,
    Marius
Sign In or Register to comment.