0

I need to fetch lower and upper values in a string having structure in a mix of the following formats:

Rules:
1. If lower and upper range is available then they are separated by '-'. 
2. Sometimes the range is written as <=xx.y

2a. If 'less than' is anywhere in the text then search for the number. pl. see Example below:

3. If at all age range appears then it appears always before the range, separated from range by a ':'
4. the unit is optional

Example data

10.0 - 35.0 MCG/ML
<=6.0 MG/24 H
51-60 YEARS: 37-129
15 - 60
0.5-9.9 %
LESS THAN 30 PG/ML
LESS THAN OR EQUAL 35 UG/DL
LESS THAN OR EQUAL TO 35
NEGATIVE: LESS THAN 20
REF RANGE LESS THAN 2.0
1.3 OR LESS PMOL/L
LAR: LESS THAN 1 NG/M

From the above, sample, my output would be:

10.0,35.0, MCG/ML
0, 6.0, MG/24 H
37, 129,
15,60
0.5, 9.9, %

Edit:

the string is in 'refVal'
re.search(r'([0-9]*\.?[0-9]*)\s*-\s*([0-9]*\.?[0-9]*)', refVal)
re.search(r'(<=|<|<\s*=|<\sOR\s=)\s*([0-9.]+)', refVal)

I added some more examples in the example above (especially for less than). I want to write Regex that fetch the value if 'Less Than' is in the text.

The following gives me unwanted 'None'.

>>> re.search(r'([0-9.]+) OR LESS|LESS THAN ([0-9.]+)', '5.4 OR LESS').groups()
('5.4', None)
1
  • 3
    It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. Commented Jan 28, 2013 at 15:21

1 Answer 1

2

IMO you aren't going to get a reliable solution with regex alone. If it were me, I'd break it down into multiple conditions and regexes. Having said that, for shits and grins I did come up with this...it does match for everything above, but it is quite ugly; for starters the data is captured to different groups depending on the format...

(?(?=.*:).*:\s*([0-9.]+)\s*-\s*([0-9.]+)|(?(?=.*\<=)(.*?)<=\s*([0-9.]+)\s*(.*)|([0-9.]+)\s*-\s*([0-9.]+)\s*(.*)))
Sign up to request clarification or add additional context in comments.

1 Comment

thanks. I am using part of your regular expression to write code. As you suggested I am using different expression to accomplish the same. I am still not done yet with this work at the data is a lot more mess. I edited the code. Pl. see that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.