The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"K-Means clustering"
Hello Everyone,
I wanted to ask what is the best distance measure to use when we have a mix of numeric and categorical attributes?
Any help would be appreciated.
Thank You
I wanted to ask what is the best distance measure to use when we have a mix of numeric and categorical attributes?
Any help would be appreciated.
Thank You
Tagged:
0
Answers
this answer to this question depends on the type of data you want to use. Let's assume you go for the mixed measure. In this case the distance between nominal values is set to 0 for equal strings and set to 1 for all other combinations. For numerical values euclidean distance is computed. Please note that you should always normalize and select proper attributes first. This ensure that all numbers share the same scale and id-like attributes are removed. If you want to choose a numerical measure you need to transform nominal values to numerical ones first. The method of choice for this is highly depended on the attribute's content.
Cheers,
Helge