2

I am weak in writing regular expressions so I'm going to need some help on the one. I need a regular expression that match to section 7.01 and then (a)

Basically with section can be followed by any number like 6.1/7.1/2.1

Examples:

SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

I am trying to write an regular expression which can give me groups which contains these

Group 1

SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:

Group 2

(a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

Also there can be more points after (a) like b and so on.

Please help me out in writing an regular expression.

9
  • Can we see what you've tried? Commented Sep 9, 2016 at 3:02
  • ^(?!().* i was trying to include everything from section till (a) but instead it is skipping ("Events of Default") and including (a) Commented Sep 9, 2016 at 3:15
  • i wrote this also -> ^\s*<backslash>(([a-z]|a[a-z]|i[ivx]{0,2}|v[ivx]{0,2}|x[ivx]{0,2})<backslash>) but this is also not giving what i want. Commented Sep 9, 2016 at 3:23
  • Hmm, unless you strip away any newlines, and capture as a single string, I would recommend context sensitive parsing that tracks what nested level you are at. Commented Sep 9, 2016 at 3:32
  • Its fine, i can strip the newlines but isn't we can give re.M flag in regex to enable multi-line parsing? Commented Sep 9, 2016 at 3:35

3 Answers 3

3

You can use the following approach, however, multiple assumptions are made. The section headers must begin with SECTION and end with a colon :. Secondly the sub-sections must begin with matching parenthesis', and end with a semi-colon.

import re
def extract_groups(s):
    sanitized_string = ''.join(line.strip() for line in s.split('\n'))
    sections = re.findall(r'SECTION.*?:', sanitized_string)
    sub_sections = re.findall(r'\([a-z]\).*?;', sanitized_string)
    return sections, sub_sections

Sample Output:

>>> s = """SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) Whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

          (b) Test;
SECTION 7.02. Second section:"""
>>> print extract_groups(s)
(['SECTION 7.01. Events of Default. If any of the following events("Events of Default") shall occur:', 'SECTION 7.02. Second section:'], 
['(a) Whether at the due date thereofor at a date fixed for prepayment thereof or otherwise;', '(b) Test;'])
Sign up to request clarification or add additional context in comments.

3 Comments

How to modify the sub_section regex if it ends with or keyword instead of ; for some?
Interesting, this complicates the requirements somewhat, what if there are or's inside of the sub-sections that end with ;. With the flattened string that we use here, it is difficult to derive the context of the or (simple word? or end delimiter?).
I get it what are You saying but what if strings ends in two ways one with ; and other with this pattern ; or. Then how we can modify the above expression to accommodate this change? I tried these versions -> \([a-z]\).*?;|?or or \([a-z]\).*(?;|?or) but non of them worked
0

I got this to work:

s = """
SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;
"""

r = r'(SECTION 7\.01\.[\s\w\.()"]*:)[\s]*(\(a\)[\s\w,]*;)'
mo = re.search(r, s)
print('Group 1: ' + mo.group(1))
print('Group 2: ' + mo.group(2))

If you wanted to make it generic, so you could grab the any number or section, you could try:

r = r'(SECTION [1-9]\.[0-9]{2}\.[\s\w\.()"]*:)[\s]*(\([a-z]\)[\s\w,]*;)'

3 Comments

But what about if i add one more point after a? try adding a point (b) and it should match to that point also in separate group.
if i write just [\s]*(\([a-z]\)[\s\w,]*;) then it captures all the points (a), (b) but how to achieve the same thing with section in it?
You might want to try capturing a section and its points together with the regex, then use a string split to chop out all the points individually
0

In an effort to help you learn, should you have to write another set of regex, I would recommend you check out the docs below: https://docs.python.org/3/howto/regex.html#regex-howto

This is the "easy" introduction to python regex. Essentially, you're going to define a pattern, and use the above link as a reference to build your pattern as you need it. Then, call the pattern to apply it to whatever needs processing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.