17

I need to find, process and remove (one by one) any substrings that match a rather long regex:

# p is a compiled regex
# s is a string  
while 1:
    m = p.match(s)
    if m is None:
        break
    process(m.group(0)) #do something with the matched pattern
    s = re.sub(m.group(0), '', s) #remove it from string s

The code above is not good for 2 reasons:

  1. It doesn't work if m.group(0) happens to contain any regex-special characters (like *, +, etc.).

  2. It feels like I'm duplicating the work: first I search the string for the regular expression, and then I have to kinda go look for it again to remove it.

What's a good way to do this?

1 Answer 1

21

The re.sub function can take a function as an argument so you can combine the replacement and processing steps if you wish:

# p is a compiled regex
# s is a string  
def process_match(m):
    # Process the match here.
    return ''

s = p.sub(process_match, s)
Sign up to request clarification or add additional context in comments.

2 Comments

Ah and I figured out what to do about if I do want to replace a string that may contain regex symbols in it.. re.escape(s) takes care of that.
@Imray : p is the compiled regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.