The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Clustering k-means

3erthe3er3erthe3er Member Posts: 3 Learner I
Hello everyone, 
I am looking for a way to cluster data. With the tools I am using, I cannot directly find the right number of k, so the data is put into the number of clusters I have set k to. 

Is there any way/tool  I can find the right number of clusters without knowing it beforehand? 
And what kind of function should I use to check the result? / to check the robustness? 

I have read that the X-means cluster attribute should help to find the right number of clusters. 
I see a display on the right-hand side that makes an "assumption", but in my case this is incorrect and does not match the data set. 
Surely there must be an iterative/mathematical function that solves this problem? 

To clarify once again, the number of clusters into which my data set is clustered after the analysis is kmin. I am looking for an automatic method to find the right number of k. 
Maybe my selection of attributes is wrong? 

Thanks to everyone for the help. I appreciate it very much!


P.S Perhaps k means is also not the right choice? 
Any help is very much appreciated!! 😊

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi there,

    finding the number of clusters for a clustering algorithm is somewhat its toughest part.
    XMeans is already a way how to get a good estimate for k. There are some heuristics out there, most prominently the Ellbow method. But there is even a paper argueing you shouldn't use it: https://arxiv.org/pdf/2212.12189.pdf

    Also be careful with the normalization of your data. I see you do not use a normalize operator so it might create results you don't want. Same for the one-hot encoding you use.

    BR,
    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.