The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Distance to cluster centre for every data point
Hi guys
Not a big expert in clustering and couldn't find suitable solution on the forum, so here's the question.
When I perform clustering, is there a simple RapidMiner way to obtain the exact distances to each cluster centre for each and every example in the dataset?
For example, if I have cluster1 and cluster2, and cluster1 contains examples v1, v2, v3, how could I find out which one from v1, v2, v3 is the closest (most representative example) or farthest (least representative example) from cluster1 center?
Thank you
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi,
Can't you do Extract Cluster Centroids + Cross Distance?
BR,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1
Answers
Hi @mschmitz
Yes I can This seems to be a solution, though not very obvious.
But this way I guess I am geting indexes of examples (document column) for each cluster number (request column), correct?
So I will need then to somehow match these indexes with original examples if I want individual distances and not only min / max?
Vladimir
http://whatthefraud.wtf
Hi,
well you get the distance to each centroid. So you would need to throw an aggregate afterwards to figure out the closest cluster centroid.
Cheers,
Martin
Dortmund, Germany
Clear @mschmitz
But is there a reason these distances were not included in the default output example set for clustering operators?
Vladimir
http://whatthefraud.wtf
@kypexin,
you mean all distances or the lowest?
All distances would increase the memory quite a lot. I can see some reason to get the distance to the assigned cluster as a kind of "confidence"? Is that what you ask for?
BR,
Martin
Dortmund, Germany
@mschmitz not ALL distances, but as you said, for each example a distance to its 'parent' cluster only. And yes, this can serve as an analog for confidence parameter.
Vladimir
http://whatthefraud.wtf
@kypexin
good question. Especially because at least kmeans specifically calculates the number... @sebastian_land wrote it - so maybe he knows?
And maybe @sgenzer can make a ticket out of this
BR,
Martin
Dortmund, Germany
@mschmitz
ok, nice. Seems I have just thrown in some little idea
Vladimir
http://whatthefraud.wtf
I certainly can. This is a feature request, not a bug - correct?