The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
X-Means always same behavior
nelsonthekinger
Member Posts: 5 Contributor II
Hello Experts!
I'm trying to use X-means due to its advantages against K-means, but Im not getting the proper result.
I tried K-means to evaluate 6 files from 3 categories and with a k = 3 it worked perfectly.
than i try to apply the Xmeans from 2 to 60 and I get always 2 clusters.
I though it could be because of having few files so I tried again with 53 files from 3 categories,
and the result were the same. K-means(k=3) successful, X-means (k = 2 - 60) the same 2 clusters.
I've tried many configurations but the most use are
measure type: NumericalMeasures
numerical measure: CosineSimilarity
clustering algoritm: KMeans
the rest is default.
I'm Clueless about the reason any help is appreciate!
I'm trying to use X-means due to its advantages against K-means, but Im not getting the proper result.
I tried K-means to evaluate 6 files from 3 categories and with a k = 3 it worked perfectly.
than i try to apply the Xmeans from 2 to 60 and I get always 2 clusters.
I though it could be because of having few files so I tried again with 53 files from 3 categories,
and the result were the same. K-means(k=3) successful, X-means (k = 2 - 60) the same 2 clusters.
I've tried many configurations but the most use are
measure type: NumericalMeasures
numerical measure: CosineSimilarity
clustering algoritm: KMeans
the rest is default.
I'm Clueless about the reason any help is appreciate!
0
Answers
I have the same problem. I tried x-means with kmin=2 and kmax=60 and for my data the right result is 4 klusters, xmeans worked and give a result - 2 klusters. And the same result for different data that i tried.
Who can help me?)
i tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) in each time. Is it normal that x-Means always picks the minimal number of k?
Best regards!
Dortmund, Germany
The situation you stated can happen if you don't have too many examples for clustering, or they are simply too similar to one another so the X-means always resorts to the simplest clustering scheme.
In such case it is better to normalize the data beforehand. This will ensure all the attributes arrives at the same scale before the algorithm is applied.
For e.g. attribute1 has data range 0-100 and attribute2 has vector range 0-1. Now in this case attribute1 gets more weightage than attribute2. But if you apply normalise both attributes will covert to 0-1 scale.
Rapidminer Operator to be used : "Normalize"