Options

how to exclude words from returns of a regex in a replace with dictionary operator

EL75EL75 Member Posts: 43 Contributor II
edited December 2020 in Help
hi everybody, 
after many hours on internet, I must acknowledge I can't find any solution in order to exclude words from results a REGEX.
I use regular expressions in a spreadsheet connecte to a "replace with dictionary" operator. 
Some REGEX capture too many words.
one example:
- the REGEX: (?i)\b(([l|d]['])*ap+(l*i*e*o*|cation)*s*)\b
- returns all words I need (app, applie, application, l'app, d'application, etc.)
- but also "apple", "appel", "l'appel", etc

I failed in my different tries with the "look behind" expression...
it's ok while I split the problem and create two REGEX (one for "app" another for application and of variants of both):
(?i)\b(([l|d][' ]*)*ap+s*)\b
(?i)\b(([l|d][' ])*ap+l+(i|ie|ic+ation|oc+ation)*s*)\b

 but the goal was to find a smarter way within one REGEX :)

see example set and regex in this google sheet : https://docs.google.com/spreadsheets/d/14hyPlwrPLxDv-F4yAVOXH8wlN-RMtumnZYOZh1gOOPs/edit?usp=sharing

thanks for your help!
Tagged:

Best Answer

  • Options
    kaymankayman Member Posts: 662 Unicorn
    Solution Accepted
    Yeah, you could capture most of these also with some adaptations, like this : 

    (?i)\b([ld]')?ap+([lie]+)?(cation)?s?\b

    but you'll also get again unwanted ones as apple etc.

    Anyway, it is always better to have a few simple regex replacements in your dictionary than one overly complex one as the computational requirements are much higher for the latter and it would slow down your process also.

    Also here the golden rule remains : Just keep it simple

Answers

  • Options
    kaymankayman Member Posts: 662 Unicorn
    edited December 2020
    Try with this

    (?i)\b([ld]')?(ap+([lie]+cations?)?)\b
  • Options
    EL75EL75 Member Posts: 43 Contributor II
    Hi kayman,
    thank you for your help, you're always on board!
    unfortunately, the solution doesn't fit all cases I've put in the excel file.
    I need to capture "ap", "aplie" applie, appli, apps.. etc.
    people write this word (in french) with so many misspellings...
    splitting with two regex still looks better till now.
    best,
Sign In or Register to comment.