How to hide text after matching and before keyword in regular expressions?

Refresh

February 2019

Views

47 time

1

I would like to match any user comment until KEYWORD. Also I would like to skip variable unimportant text after last comment before keyword.

import re`

string = '''
COMMENTS:  
first comment /user_x  
second comment
two lines /user_y
Here is some unimportant text.  
KEYWORD:
Don't match comments or anything else after first keyword like this /user_x  
KEYWORD: <- again
Also ignore same keyword which could appear serveral times.
'''

My result doesn't skip the unimportant text.

pattern = re.compile(r'(?<=COMMENTS:)(.+?/(user_x|user_y))+?(?:.+?)(?=KEYWORD:)', flags=re.DOTALL)
match = re.search(pattern, string).group(0)

print(match)

I would like to have the following output:

first comment /user_x  
second comment
in two lines /user_y

What am I doing wrong? Thanks a lot

1 answers

0

You may use

pattern = re.compile(r'COMMENTS:\s*((?:(?:(?!KEYWORD:).)+?/(?:user_x|user_y))+).+?KEYWORD:', flags=re.DOTALL)
match = re.search(pattern, s)
if match:
    print(match.group(1))

The output does not contain the irrelevant line any longer:

first comment /user_x  
second comment
two lines /user_y

See the Python demo

Details

  • COMMENTS: - a literal substring
  • \s* - 0+ whitespaces
  • ((?:(?:(?!KEYWORD:).)+?/(?:user_x|user_y))+) - Capturing group 1 (match.group(1) will hold this value if there is a match): one or more repetitions of
    • (?:(?!KEYWORD:).)+? - any char, one or more but as few as possible, that does not start the KEYWORD: char sequence
    • / - a / char
    • (?:user_x|user_y) - user_x or user_x
  • .+?KEYWORD: - a KEYWORD: after any 1 or more chars, as few as possible.

See the regex demo.