The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

optimal number of clusters in fuzzy c-means

farzanefarzane Member Posts: 6 Learner I
edited August 2020 in Help
Hi
I'm using fuzzy c-means to cluster a few text data. How can I find the optimal number of clusters? is intar_cluster_distance a good measure? 
Tagged:

Answers

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited August 2020
    I assume that you are talking about Fuzzy C-Means operator from the Information Selection extension? The key to finding an optimum k is create an optimisation loop, e.g. using Optimize Parameters (Grid), which could vary the cluster numbers vs some performance measure.

    If you are interested only in the final cluster allocation then we have lots of possible solutions for you. However, as Fuzzy C-Means is not returning the centroid table (such as k-Means), you will not be able to use Davis-Bouldin measurement from Cluster Distance Performance. However, you can rely on the commonly used Item Distribution Performance (e.g. Sum of Squares measure) and plot it against k to use the "elbow method" of finding the "optimum" cluster number. Alternatively, you could use a combination of Data to Similarity and Cluster Density Performance to optimise the average cluster density.

    Note however that the whole idea of using Fuzzy C-Means to utilise the fuzzy membership of examples in each cluster. If this was the aim to consider all possible cluster memberships then there are no obvious performance measures available in RapidMiner, you could create your own measure by weighing different clustering performance indicators with cluster membership confidence factors.

    Information Selection extension also provides two performance operators worth investigating here - one is calculating within cluster distance variance, unfortunately it does not take into consideration the fuzzy cluster membership.

    Jacob
  • farzanefarzane Member Posts: 6 Learner I
    @jacobcybulski
    Thank you so much. the problem has been solved :)
  • endirizalfendirizalf Member Posts: 1 Learner II
    Hi, @farzane
    which solution did you use? can you explain to me, please?? 

    you can mention me in this discussion or send to my email endirizal.f@gmail.com.

    thankyou for your help

    Endirizalf

Sign In or Register to comment.