3

How can I design a regular expression that will capture all the characters between 2 strings? Specifically, from this big string:

Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]

I want to extract all the characters between [^title= and ], that is, Fish consumption and incidence of stroke: a meta-analysis of cohort studies and The second title.

I think I will have to use re.findall(), and that I can start with this: re.findall(r'\[([^]]*)\]', big_string), which will give me all the matches between the square brackets [ ], but I'm not sure how to extend it.

1 Answer 1

5
>>> text = "Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]"
>>> re.findall(r"\[\^title=(.*?)\]", text)
['Fish consumption and incidence of stroke: a meta-analysis of cohort studies', 'The second title']

Here is a breakdown of the regex:

\[ is an escaped [ character.

\^ is an escaped ^ character.

title= matches title=

(.*?) matches any characters, non-greedily, and puts them in a group (for findall to extract). Which means it stops when it finds a...

\], which is an escaped ] character.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.