The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Cluster backetball players based on their performance...

LilCLilC Member Posts: 24 Learner I
edited July 2020 in Help
I have a dataset with many players and their performance for the season.
My goal is to cluster them into 3 or more groups based on their performance, like high, average, low performance etc.. 
The attributes are like positions, ave points, steals, mistakes, blocks, running distance etc.... 

It probably will be some analysis to do with k-means I guess. But I don't think I will need all attributes to do the clustering. And the other task is to find out which few attributes can be used to split the players.

I am still very new to RapidMiner. And thanks for all the help from you guys.
If anyone can point me the direction to achieve it, that will be great. And I am open to any extensions.
Thanks.

Best Answers

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Solution Accepted
    If you were to use k-means then you'd need numerical attributes. Make sure that you select attributes that are independent of each other. While k-means is not a linear model you could use Correlation Matrix to establish independence of attributes - ignore the matrix but look at the weights - the higher the weight, the more (linearly) independent of other attributes (and vice versa). While there are may other way of weighing attributes, one great thing about doing it this way is that you do not need to define a label in this process (we are not predicting anything)

Answers

  • LilCLilC Member Posts: 24 Learner I
    edited July 2020
    Thanks again for the explanation and all the help. One more thing, after I used k-means, I did saw some video shows Cluster Distance Performance can be used to evaluate the clustering. Is there an illustration for 'Avg. within centroid distance' or 'Davies Bouldin'? You know like the rule of thumb correlation coefficients. 
    Or is that the result needs to be below 1 to make the clustering a 'good' one?
      
Sign In or Register to comment.