The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Permutations with string distance for near duplicates detection
Dear Rapid-Miners,
A RapidMiner newbie needs your help :-)
We are trying to use RapidMiner to analyse a huge dataset stored in an Oracle Database. The data represents organisation (companies) information such as addresses, emails, description, market sector...
The idea would be to compute the similitude (using string distance functions ?) of each of these companies with each other (permutations ?)... The goal being to find the near duplicates in the database.
Would RapidMiner be able to achieve such task ? If yes, how should I procede ? Any help would be really appreciated
Thanks a lot
Thibault
A RapidMiner newbie needs your help :-)
We are trying to use RapidMiner to analyse a huge dataset stored in an Oracle Database. The data represents organisation (companies) information such as addresses, emails, description, market sector...
The idea would be to compute the similitude (using string distance functions ?) of each of these companies with each other (permutations ?)... The goal being to find the near duplicates in the database.
Would RapidMiner be able to achieve such task ? If yes, how should I procede ? Any help would be really appreciated
Thanks a lot
Thibault
0
Answers
for an outline of where to start check out my videos on my blog, in particular Text Analytics #4 (links in signature)
But you are of course right. We offer consulting service to help you with such project setups as this is a little bit too large for starting with RapidMiner from scratch on such a big problem.
Greetings,
Sebastian