The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Replacing whole words with dictionnary

EL75EL75 Member Posts: 43 Contributor II
Hi Rapid miner community,
I don't find the solution to replace whole words after a "read excel" operator. If I use a "Replace (dictionary)" operator linked with an excel file, words are partially substituted - as they are not tokenized - and sometimes part of the word is substituted and aggregated with the rest of the word. for instance, if in my dictionary I have many entries for the misspelling form of the word « application »  (e.g app, apple, etc.) the result can be « applicationlicationncation » ... The reason is that, in my data set, I have many terms misspelled therefore I'd like to use such process to substitute the common misspelling forms. 
Inside the « text processing »  operator, after tokenization I could do it, but there’s no operator to handle this (as far I’ve seen). the « replace token could do the job, but I have to enter one by one all the entries that  I presently have in my misspelling dictionary..
thanks for your help !
Tagged:

Best Answer

Answers

  • kaymankayman Member Posts: 662 Unicorn
    use regex wordboundaries. For instance \bapp\b will only match words that are exactly app, when it is in the middle, end or beginning of a sentence. 
  • EL75EL75 Member Posts: 43 Contributor II
    edited November 2020
    Thanks Kayman, for your response, i've tried it, I duplicated my excel sheet - see file enclosed - but it the operator REPLACE considers \b as part of the words and not as a REGEX.. so that the operator just don't find the word and replace nothing. And as I have many misspelling ways for "application" 

    best regards
Sign In or Register to comment.