Extract substring from filename in Python?

Question

I have a directory full of files that have date strings as part of the filenames:

file_type_1_20140722_foo.txt
file_type_two_20140723_bar.txt
filetypethree20140724qux.txt

I need to get these date strings from the filenames and save them in an array:

['20140722', '20140723', '20140724']

But they can appear at various places in the filename, so I can't just use substring notation and extract it directly. In the past, the way I've done something similar to this in Bash is like so:

date=$(echo $file | egrep -o '[[:digit:]]{8}' | head -n1)

But I can't use Bash for this because it sucks at math (I need to be able to add and subtract floating point numbers). I've tried glob.glob() and re.match(), but both return empty sets:

>>> dates = [file for file in sorted(os.listdir('.')) if re.match("[0-9]{8}", file)]
>>> print dates
>>> []

I know the problem is it's looking for complete file names that are eight digits long, but I have no idea how to make it look for substrings instead. Any ideas?

Use re.search instead of match, and put the digits inside parentheses to get a match group. — Tom Zych
– Tom Zych, Commented Jul 22, 2014 at 18:46
@Batman no, because the numbers are sometimes offset by underscores, and sometimes jammed up next to text. — Jonathan E. Landrum
– Jonathan E. Landrum, Commented Jul 22, 2014 at 18:47
@TomZych that doesn't give the substring, just the files that have that substring matching the pattern (all of them). — Jonathan E. Landrum
– Jonathan E. Landrum, Commented Jul 22, 2014 at 18:49

unutbu · Accepted Answer · 2018-04-07 20:43:43Z

6

>>> import re
>>> import os
>>> [date for file in os.listdir('.') for date in re.findall("(\d{8})", file)]
['20140722', '20140723']

Note that if a filename has a 9-digit substring, then only the first 8 digits will be matched. If a filename contains a 16-digit substring, there will be 2 non-overlapping matches.

edited Apr 7, 2018 at 20:43

answered Jul 22, 2014 at 18:56

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Leniel Maccaferri Over a year ago

Just a note to newcomers to Python... make sure you import the regular expressions engine with import re. :) I couldn't upvote because I exhausted my daily vote limit. hehehe

unutbu Over a year ago

@LenielMacaferi: Thanks for the improvement.

Andrew Johnson · Accepted Answer · 2014-07-22 18:49:43Z

2

Your regular expression looks good, but you should be using re.search instead of re.match so that it will search for that expression anywhere in the string:

import re
r = re.compile("[0-9]{8}")
m = r.search(filename)
if m:
    print m.group(0)

answered Jul 22, 2014 at 18:49

Andrew Johnson

3,2061 gold badge20 silver badges26 bronze badges

2 Comments

Jonathan E. Landrum Over a year ago

This gives the full file name, not the stubstrings

Jonathan E. Landrum Over a year ago

I missed the group() part, my bad

Daniel · Accepted Answer · 2014-07-22 18:54:43Z

1

re.match matches from the beginning of the string. re.search matches the pattern anywhere. Or you can try this:

extract_dates = re.compile("[0-9]{8}").findall
dates = [dates[0] for dates in sorted(
    extract_dates(filename) for filename in os.listdir('.')) if dates]

answered Jul 22, 2014 at 18:54

Daniel

42.9k4 gold badges57 silver badges82 bronze badges

Collectives™ on Stack Overflow

Extract substring from filename in Python?

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related