The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Concatenate words in comments
Hello there!
We are currently writing a research project on microtransactions using natural language processing.
We have a Excel file containing 450.000 comments.
As to capture as many comments related to microtransactions, we would like to concatenate som variations of the spelling e.g.
Microtransactions = "micro transactions", "micro-transactions", "microtransact" etc...
We would very much like it to return all the 450.000 comments, though with the words concatenated as explained above.
How do we best achieve this?
Thanks a lot!
0
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Have you tried the Levenshtein Distance from operator toolbox extension? This could help you find the similar strings.
Suppose you have processed the 450000 comments with tokenize inside text mining operators, like "process documents", you will get a wordlist like this
Then you convert wordlist to data and generate pairs of keywords then apply the levenshtein distance on the pair-wised keywords.
I did a lagging on wordlist for a quick demo. But for n keywords, you will basically need n*(n-1)/2 pairs of keywords for distance calculation. Data to similarity operator will help you to expand data into pairwised format in a quick way.