0

I would like to parse a string in python which is the off format

"JXE 2000 This is a bug to fix blah " or of the format

"JXE-2000: This is a bug to fix blah " and check if the string has JXE and a number.

In the above example I will need to check if the string has JXE and 2000. I am new to python.

I tried the following:

textpattern="JXE-5000: This is bug "
text=re.compile("^([A-Z][0-9]+)*$")

text=re.search("JXE (.*)", textpattern)

print (text.groups())

I seem to be getting only "5000 This is a bug".

3
  • What have you tried? Commented Jan 31, 2013 at 4:40
  • textpattern="EIX-5000"; text=re.compile("^([A-Z][0-9]+)*$"); text=re.search("EIX (.*)", textpattern) Commented Jan 31, 2013 at 4:42
  • You know to not use semicolons (;) in python, right? Commented Jan 31, 2013 at 4:54

3 Answers 3

1

As another alternative, you can allow any character between JXE and 2000:

>>> text=re.compile("(JXE).*(2000(.*))")
>>> textpattern="JXE-2000: This is bug "
>>> text.search(textpattern).group(1,2) # or .group(1,2,3) if you want the bug as well
('JXE', '2000')

Your text=re.compile("^([A-Z][0-9]+)*$") would search for a group with any (ascii) capital letter followed by any digit or digits, with the group occurring zero or more times. re.compile is used to compile the pattern you are after, so that you don't need to indicate it later in the script and so that your code will be faster. If you choose to use re.compile (and you really don't need to here), you need to indicate the pattern you are looking for (in this case, 'JXE' followed by '2000'). If you use re.compile, you will search for this pattern in this format: compiled_pattern.search(string), which for you would be text.search(textpattern).

Sign up to request clarification or add additional context in comments.

Comments

0

You can match either '-' or ' ' with [- ]:

>>> match = re.search("JXE[- ]2000[: ]+ (.*)", "JXE-2000: This is bug ")
>>> if match is not None:
    message = match.groups()[0]

>>> print message
This is bug 

Comments

0

Depends on what you want to capture:

>>> s
['JXE 2000 This is a bug to fix blah',
 'JXE-2000: This is a bug to fix blah',
 'JXE-2000 Blah']
>>> re.search(r'JXE[-|\s+]\d+(.+)',s[0]).groups()
(' This is a bug to fix blah',)
>>> re.search(r'JXE[-|\s+]\d+(.+)',s[1]).groups()
(': This is a bug to fix blah',)
>>> re.search(r'JXE[-|\s+]\d+(.+)',s[2]).groups()
(' Blah',)

Here is what this pattern matches:

  • JXE - the character J, followed by X, followed by E
  • [-|\s+] - a dash - or one or more spaces
  • \d+ - one or more numbers
  • (.+) - one or more of any character (except for a line break)

1 Comment

Thanks to all of you for your ideas. Essentially I want to take a string and see if there is a "JXE" in the string followed by a number. The value of the number is unknown. So in one case it can be "JXE 2000: This is check in for bug 1". In the second case it can be like this "JXE-2002 This is a check in for bug 2".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.