The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to compare which similarity measurement gives better results?
For my text document data sets, i have done 'Data to Similarity' using Cosine, Jaccard, Dice etc similarities. My goal is to determine which similarity measurement gives better results for my input data set. How do i do the comparative check?
Tagged:
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornI don't think there is a simple answer to this question. Each of these distance metrics measures distance in a slightly different way. You can read about the exact calculations on wikipedia or other sites. You need to select which one corresponds most closely to the way that you are thinking about similarity between your texts. In a supervised learning problem you can make this parameter subject to optimization and determine the "best" answer based on overall model performance, but if you are simply computing similarity for its own sake, then there is no way for RapidMiner to tell you which one is the "best" for that comparison.
5
Answers