The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Similarity between mutiple tables
Hi,
Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation. In the second evaluation the variables will be evaluated on weight. For example, a phone number is a better unique key, than a firstname. At the end, the customer representation need to be evaluated as followed (0.9*2)+(0,8*7) = .... if the threshold is above the 0.8 (for example) it will consider as usefull and integrate the rows. I Tried to perform the similarity (with a couple of similarity measures) measure In rapid miner, but I received extreme values ( <0 or >1).
(currently, I cannot post any screenshots, since I am new)
What do I wrong?
Cheers, Robin
Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation. In the second evaluation the variables will be evaluated on weight. For example, a phone number is a better unique key, than a firstname. At the end, the customer representation need to be evaluated as followed (0.9*2)+(0,8*7) = .... if the threshold is above the 0.8 (for example) it will consider as usefull and integrate the rows. I Tried to perform the similarity (with a couple of similarity measures) measure In rapid miner, but I received extreme values ( <0 or >1).
(currently, I cannot post any screenshots, since I am new)
What do I wrong?
Cheers, Robin
Tagged:
0
Answers
Best Regards,
Edwin Yaqub
Scott
Currently, I made the following process in rapid miner:
I used the same data set, 1 with the correct data and the other with manipulated data (same columns). To start with the first cross distance test I selected the "initials" attribute. Within the cross distance operate I selected "nominal measures" and "JaccardSimilarity". I received the following results::
Results:
I was expecting results such as: 0,43, 0,33 etc, see below an real example: