The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Fuzzy Match of Strings
I'm trying to work through a problem in Rapidminer. I'm trying to find approximate matches of strings of one dataset in another dataset. Is there a way we can perform a fuzzy match on a string in rapidminer? Any help will be appreciated!
0
Answers
Hi,
there is an operator to calculate the Levenshtein Distance in operator toolbox.
Best,
Martin
Dortmund, Germany
hi @prachi138 - yes I'd recommend trying the Levenshtein Distance operator as a good starting point. If you can post some examples of what you're doing, that can help us give you more guidance.
FYI I'm moving this to the Studio help forum.
SG
Hi,
Thank you for your prompt reply! I'm trying to use the Levenstein distance in Rapidminer. However, I see that the second port requires a document input. On using Data to document, I have the IOObject Collection generated which is not accepted as input. Can you let me know which exact operator I should be using for 'Process documents' operator? Thank you once again!
Hi @prachi138
Which operator you are talking about?
'Generate Levenstein Distance' operator has only one input port, which is an example set. Then you need to 1st and 2nd string attributes to compare, and the operator will calculate distance for all examples, making it a separate attribute in the output dataset:
Vladimir
http://whatthefraud.wtf
Is it possible to perform a many to many comparison across datasets for the fuzzy matching operator? Basically have 5 columns of 5 different data sources and see where it matches? Currently, I’m comparing 2 attributes of 2 different data sources at once. Looking forward to your inputs.