Writing regular expression in python

Question

I am weak in writing regular expressions so I'm going to need some help on the one. I need a regular expression that match to section 7.01 and then (a)

Basically with section can be followed by any number like 6.1/7.1/2.1

Examples:

SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

I am trying to write an regular expression which can give me groups which contains these

Group 1

SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:

Group 2

(a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

Also there can be more points after (a) like b and so on.

Please help me out in writing an regular expression.

^(?!().* i was trying to include everything from section till (a) but instead it is skipping ("Events of Default") and including (a) — nikunj2512
– nikunj2512, Commented Sep 9, 2016 at 3:15
i wrote this also -> ^\s*<backslash>(([a-z]|a[a-z]|i[ivx]{0,2}|v[ivx]{0,2}|x[ivx]{0,2})<backslash>) but this is also not giving what i want. — nikunj2512
– nikunj2512, Commented Sep 9, 2016 at 3:23
Hmm, unless you strip away any newlines, and capture as a single string, I would recommend context sensitive parsing that tracks what nested level you are at. — ospahiu
– ospahiu, Commented Sep 9, 2016 at 3:32
Its fine, i can strip the newlines but isn't we can give re.M flag in regex to enable multi-line parsing? — nikunj2512
– nikunj2512, Commented Sep 9, 2016 at 3:35

ospahiu · Accepted Answer · 2016-09-09 03:55:41Z

3

You can use the following approach, however, multiple assumptions are made. The section headers must begin with SECTION and end with a colon :. Secondly the sub-sections must begin with matching parenthesis', and end with a semi-colon.

import re
def extract_groups(s):
    sanitized_string = ''.join(line.strip() for line in s.split('\n'))
    sections = re.findall(r'SECTION.*?:', sanitized_string)
    sub_sections = re.findall(r'\([a-z]\).*?;', sanitized_string)
    return sections, sub_sections

Sample Output:

>>> s = """SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) Whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

          (b) Test;
SECTION 7.02. Second section:"""
>>> print extract_groups(s)
(['SECTION 7.01. Events of Default. If any of the following events("Events of Default") shall occur:', 'SECTION 7.02. Second section:'], 
['(a) Whether at the due date thereofor at a date fixed for prepayment thereof or otherwise;', '(b) Test;'])

edited Sep 9, 2016 at 3:55

answered Sep 9, 2016 at 3:47

ospahiu

3,5252 gold badges15 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

nikunj2512 Over a year ago

How to modify the sub_section regex if it ends with or keyword instead of ; for some?

ospahiu Over a year ago

Interesting, this complicates the requirements somewhat, what if there are or's inside of the sub-sections that end with ;. With the flattened string that we use here, it is difficult to derive the context of the or (simple word? or end delimiter?).

nikunj2512 Over a year ago

I get it what are You saying but what if strings ends in two ways one with ; and other with this pattern ; or. Then how we can modify the above expression to accommodate this change? I tried these versions -> \([a-z]\).*?;|?or or \([a-z]\).*(?;|?or) but non of them worked

Hanshan · Accepted Answer · 2016-09-09 03:28:40Z

0

I got this to work:

s = """
SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;
"""

r = r'(SECTION 7\.01\.[\s\w\.()"]*:)[\s]*(\(a\)[\s\w,]*;)'
mo = re.search(r, s)
print('Group 1: ' + mo.group(1))
print('Group 2: ' + mo.group(2))

If you wanted to make it generic, so you could grab the any number or section, you could try:

r = r'(SECTION [1-9]\.[0-9]{2}\.[\s\w\.()"]*:)[\s]*(\([a-z]\)[\s\w,]*;)'

answered Sep 9, 2016 at 3:28

Hanshan

3,7745 gold badges32 silver badges36 bronze badges

3 Comments

nikunj2512 Over a year ago

But what about if i add one more point after a? try adding a point (b) and it should match to that point also in separate group.

nikunj2512 Over a year ago

if i write just [\s]*(\([a-z]\)[\s\w,]*;) then it captures all the points (a), (b) but how to achieve the same thing with section in it?

Hanshan Over a year ago

You might want to try capturing a section and its points together with the regex, then use a string split to chop out all the points individually

Shawn · Accepted Answer · 2016-09-09 04:17:05Z

0

In an effort to help you learn, should you have to write another set of regex, I would recommend you check out the docs below: https://docs.python.org/3/howto/regex.html#regex-howto

This is the "easy" introduction to python regex. Essentially, you're going to define a pattern, and use the above link as a reference to build your pattern as you need it. Then, call the pattern to apply it to whatever needs processing.

answered Sep 9, 2016 at 4:17

Shawn

212 bronze badges

Collectives™ on Stack Overflow

Writing regular expression in python

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related