I'm trying to match some strings using Pythons re-module, but cant get it done correctly. The strings i've to deal with look like this (example):
XY_efgh_1234_0040_rev_2_1_NC_asdf
XY_abcd_1122Ae_1150_rev2_1_NC
XY_efgh_0124e_50_NC
asdf_1980_2234a_2
XY_abcd_5098_2270_2_1_NC
PC_bos_7659Ae_1450sp_rev_2_1_NC_GRAPH
The pattern there is not constant, it could vary to some point. This is important to me:
Forget about the start of the string, up to the first numeric value. Thats not important, i don't need this, it should be stripped from any result.
Then there are always four digits, they can be followed by alphabetical characters (up to three). I need this part, extracted.
Then, after some underscore (there might be a minus in it, too), is another set of numeric values i need, it's always two to four (...and might be followed by up to three alphabetical characters, too) .
Right after this section, seperated by further underscores, there could be further numeric values which are important and belong to the previous values. There might be alphabetical characters in it, too...
The end of the string might contain something like "NC" and maybe further characters, is not important and should be stripped.
So, according to the previous example, this is what i need to work with:
('1234', '0040_rev_2_1')
('1122Ae', '1150_rev2_1')
('0124e', '50')
('1980', '2234a_2')
('5098', '2270_2_1')
('7659Ae', '1450sp_rev_2_1')
...I've never done such if-and-ifnot things in RegEx, it's driving me crazy. Here is what I've got so far, but it's not exactly what I need:
pattern = re.compile(
r"""
([0-9]{4}
[A-Z]{0,3})
[_-]{1,3}
([0-9]{2,4}
[0-9A-Z_-]{0,16})
""",
re.IGNORECASE |
re.VERBOSE
)
if re.search(pattern, string):
print re.findall(pattern, string)
When I use this on the last mentioned Example, this is what I get:
[(u'7659Ae', u'1450sp_rev_2_1_NC_GR')]
...almost what I need - but I don't know how to exclude this _NC_GR at the end, and this simple method of limiting the characters by count is just not good.
Does anyone have a nice and working solution to this case?