The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Minimal k for x-Means?
Muhammed_Fatih_
Member Posts: 93 Maven
in Help
Dear community,
my question looks like the following: Does x-means always take the minimal given k as optimum?
I tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60 on my data. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) every time. Is it normal that x-Means always picks the minimal number of k?
Best regards!
my question looks like the following: Does x-means always take the minimal given k as optimum?
I tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60 on my data. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) every time. Is it normal that x-Means always picks the minimal number of k?
Best regards!
Tagged:
0
Answers
The situation you stated can happen if you don't have too many examples for clustering, or they are simply too similar to one another so the X-means always resorts to the simplest clustering scheme.
In such case it is better to normalize the data beforehand. This will ensure all the attributes arrives at the same scale before the algorithm is applied.
For e.g. attribute1 has data range 0-100 and attribute2 has vector range 0-1. Now in this case attribute1 gets more weightage than attribute2. But if you apply normalise both attributes will covert to 0-1 scale.
Rapidminer Operator to be used : "Normalize"
thank you for your response. I tried the "Normalize" operator. But it doesn't help. I got the same result as before - hence, the x-means operator again picked the given k-min parameter. I don't know if this is a "normal" behaviour of x-means.
Does anyone have any other opinions?
Best regards!
Thank you for your answer.
Does this mean that X-means or rather AIC/BIC penalties that are implemented in the corresponding operator are only able to operate on specific datasets? What does "It really comes down to your dataset." mean in detail?
Best regards!