2

I need to extract the date in format of: dd Month yyyy (20 August 2013). I tried the following regex:

\d{2} (January|February|March|April|May|June|July|August|September|October|November|December) \d{4}

It works with regex testers (chcked with several the text - Monday, 19 August 2013), but It seems that Python doesn't understand it. The output I get is:

>>> 
['August']
>>> 

Can somebody please understand me why is that happening ?

Thank you !

3
  • If @smerny answered your question, please hit the check-mark next to his answer to accept it. Commented Aug 19, 2013 at 21:56
  • I am interested to know the method in re you are using. Commented Aug 20, 2013 at 3:17
  • import re date = "20 August 2013" print re.match("\d{2} (January|February|March|April|May|June|July|August|September|October|November|December) \d{4}", date).group() This seems to be working fine for me. Commented Aug 20, 2013 at 3:19

2 Answers 2

3

Did you use re.findall? By default, if there's at least one capture group in the pattern, re.findall will return only the captured parts of the expression.

You can avoid this by removing every capture group, causing re.findall to return the entire match:

\d{2} (?:January|February|...|December) \d{4}

or by making a single big capture group:

(\d{2} (?:January|February|...|December) \d{4})

or, possibly more conveniently, by making every component a capture group:

(\d{2}) (January|February|...|December) (\d{4})

This latter form is more useful if you will need to process the individual day/month/year components.

Sign up to request clarification or add additional context in comments.

Comments

2

It looks like you are only getting the data from the capture group, try this:

(\d{2} (?:January|February|March|April|May|June|July|August|September|October|November|December) \d{4})

I put a capture group around the entire thing and made the month a non-capture group. Now whatever was giving you "August" should give you the entire thing.


I just looked at some python regex stuff here

>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'

Seeing this, I'm guessing (since you didn't show how you were actually using this regex) that you were doing group(1) which will now work with the regex I supplied above.

It also looks like you could have used group(0) to get the whole thing (if I am correct in the assumption that this is what you were doing). This would work in your original regex as well as my modified version.

1 Comment

Thank you, that is working, but Can you explain a bit what was my mistake ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.