The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Similarity between non identical Numbers
Hello
I have a problem with the data to similarity block when I feed it with text column that has numbers like phone numbers it only gives similarity of 100% between identical numbers but other than that all similarity values are 0
Any Ideas how I can make it detect the similarity for example between "7788" and "7722"
Thanks and best regards
I have a problem with the data to similarity block when I feed it with text column that has numbers like phone numbers it only gives similarity of 100% between identical numbers but other than that all similarity values are 0
Any Ideas how I can make it detect the similarity for example between "7788" and "7722"
Thanks and best regards
1
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornYou need to set the distance metric according to the way you want to measure similarity between nominal values, which is not necessarily intuitive. If your data has both numerical and nominal data and you are using the default "mixed Euclidean distance" parameter, then nominal values that are the same have a distance of zero but all other values have a distance of 1. If you filter your dataset to look only at nominal attributes and then switch your measure type to "nominal" then you will get several other options for measuring nominal distances, which you can look up on Wikipedia to understand how they work exactly (but they will generally provide values other than simple 1/0 match logic).7