2

My question is a variation to this one. I can't seem to figure this one out.

given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
expected = ["{abc, xyz}", "123", "{def, lmn, ijk}", "{uvw}", "opq"]

As in the above example, an item in the expected could be a {..., ...} or just another string.

Many thanks in advance.

1
  • 2
    If the curly braces can be nested, you cannot split it using regular-expressions (at least not in its "pure" form), because it is not a context-free grammer. Commented Jan 29, 2014 at 6:53

4 Answers 4

3

I think the following regexp fit the job. Howevever you don't have to have nested curly bracket (nested curly bracket can't be parsed using regular expression as far as I know)

>>> s= "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
>>> re.findall(r",?\s*(\{.*?\}|[^,]+)",s)
['{abc, xyz}', '123', '{def, lmn, ijk}', '{uvw}', 'opq']
Sign up to request clarification or add additional context in comments.

3 Comments

May be worth noting why nested curly brackets probably cannot be solved using regular expression...
As @shx2 noted above, language with nested curly brackets is context-free and requires pushdown automaton to solve. Regular expressions in Python are more or less implementation of regular languages, parsed by finite automata, and thus less powerful.
I'd change: \{.*?\} to \{[^}]*\}. It avoids the non-greedy match (which might be slower) and it matches even if the string contains newlines. Your current solution fails to match things like {abc,\nxyz}.
1
given = "{abc,{a:b}, xyz} , 123 , {def, lmn, ijk}, {uvw}, opq"
#expected = ["{abc, xyz}", "123", "{def, lmn, ijk}", "{uvw}", "opq"]
tmp_l = given.split(',')
tmp_l = [i.strip() for i in tmp_l]
result_l = []
element = ''
count = 0
for i in tmp_l:
    if i[0] == '{':
        count += 1
    if i[-1] == '}':
        count -= 1
    element = element + i + ','
    if count == 0:
        element = element[0:-1]
        result_l.append(element)
        element = ''

print str(result_l)

this one can handle nested curly bracket, although it seems not so elegant..

Comments

0

Does the following not provide you with what you are looking for?

import re
given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
expected = re.findall(r'(\w+)', given)

I ran that in Terminal and got:

>>> import re
>>> given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
>>> expected = re.findall(r'(\w+)', given)
>>> expected
['abc', 'xyz', '123', 'def', 'lmn', 'ijk', 'uvw', 'opq']

1 Comment

That's not it, 'abc' and 'xyz' shouldn't be separate words if they're in the same set of brackets, for instance.
0

You can use the below regex to do that. Rest is same as the similar link you provided.

given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
regex = r",?\s*(\{.*?\}|[^,]+)"

print re.findall(regex,given)

OP: ['{abc, xyz}', '123', '{def, lmn, ijk}', '{uvw}', 'opq']

Just import the re module. and do the same as the link says. It will match anything inside the curly braces { } and any string.

2 Comments

you need re.findall somewhere all that you have is only a tuple
He has already given the link which explains the rest. Isn't it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.