The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Replace Token Solution for abbreviates that are part of other words
I am working on a text analysis. There are abbreviations in my original text, such as cust or cust. for customer. I can put the replace token operator before the tokenize operator and enter multiple replacements such as replace cust space with customer and cust. with customer, but I am curious if there is a way to do it after the tokenization because it has "grouped" the cust abbreviations together. I did try placing the replace operator after tokenization but it replaced all occurrences of cust with customer, including the full word customer. Any thoughts/ideas? thank you for your help.
0
Best Answer
-
kayman Member Posts: 662 Unicorn@lionelderkrikor probably forgot the dot. There are 2 ways to deal with this, either with + or *
(cust).+$ means you need to have cust followed by at least one character
(cust).*$ means you need to have cust and optional additional characters.
So the last one is probably safer to use5
Answers
So adding cust* would stem customer, customers etc to cust (or whatever you choose) , but it would do the same with customs or custody so be careful.
Replacing using regex might be more secure, just ensure you use (word) boundaries in that case, so \bcust\b would replace only cust to customer, and leave all other words containing cust untouched.
To perform your task use the following regex in the Replace operator :
Regards,
Lionel