The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
how to exclude words from returns of a regex in a replace with dictionary operator
hi everybody,
after many hours on internet, I must acknowledge I can't find any solution in order to exclude words from results a REGEX.
I use regular expressions in a spreadsheet connecte to a "replace with dictionary" operator.
Some REGEX capture too many words.
one example:
- the REGEX: (?i)\b(([l|d]['])*ap+(l*i*e*o*|cation)*s*)\b
- returns all words I need (app, applie, application, l'app, d'application, etc.)
- but also "apple", "appel", "l'appel", etc
I failed in my different tries with the "look behind" expression...
it's ok while I split the problem and create two REGEX (one for "app" another for application and of variants of both):
but the goal was to find a smarter way within one REGEX
see example set and regex in this google sheet : https://docs.google.com/spreadsheets/d/14hyPlwrPLxDv-F4yAVOXH8wlN-RMtumnZYOZh1gOOPs/edit?usp=sharing
thanks for your help!
after many hours on internet, I must acknowledge I can't find any solution in order to exclude words from results a REGEX.
I use regular expressions in a spreadsheet connecte to a "replace with dictionary" operator.
Some REGEX capture too many words.
one example:
- the REGEX: (?i)\b(([l|d]['])*ap+(l*i*e*o*|cation)*s*)\b
- returns all words I need (app, applie, application, l'app, d'application, etc.)
- but also "apple", "appel", "l'appel", etc
I failed in my different tries with the "look behind" expression...
it's ok while I split the problem and create two REGEX (one for "app" another for application and of variants of both):
(?i)\b(([l|d][' ]*)*ap+s*)\b |
(?i)\b(([l|d][' ])*ap+l+(i|ie|ic+ation|oc+ation)*s*)\b |
see example set and regex in this google sheet : https://docs.google.com/spreadsheets/d/14hyPlwrPLxDv-F4yAVOXH8wlN-RMtumnZYOZh1gOOPs/edit?usp=sharing
thanks for your help!
0
Best Answer
-
kayman Member Posts: 662 UnicornYeah, you could capture most of these also with some adaptations, like this :
(?i)\b([ld]')?ap+([lie]+)?(cation)?s?\b
but you'll also get again unwanted ones as apple etc.
Anyway, it is always better to have a few simple regex replacements in your dictionary than one overly complex one as the computational requirements are much higher for the latter and it would slow down your process also.
Also here the golden rule remains : Just keep it simple1
Answers
(?i)\b([ld]')?(ap+([lie]+cations?)?)\b
thank you for your help, you're always on board!
unfortunately, the solution doesn't fit all cases I've put in the excel file.
I need to capture "ap", "aplie" applie, appli, apps.. etc.
people write this word (in french) with so many misspellings...
splitting with two regex still looks better till now.
best,