Python: Split string by pattern

Question

My question is a variation to this one. I can't seem to figure this one out.

given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
expected = ["{abc, xyz}", "123", "{def, lmn, ijk}", "{uvw}", "opq"]

As in the above example, an item in the expected could be a {..., ...} or just another string.

Many thanks in advance.

If the curly braces can be nested, you cannot split it using regular-expressions (at least not in its "pure" form), because it is not a context-free grammer. — shx2
– shx2, Commented Jan 29, 2014 at 6:53

Xavier Combelle · Accepted Answer · 2014-01-29 06:58:03Z

3

I think the following regexp fit the job. Howevever you don't have to have nested curly bracket (nested curly bracket can't be parsed using regular expression as far as I know)

>>> s= "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
>>> re.findall(r",?\s*(\{.*?\}|[^,]+)",s)
['{abc, xyz}', '123', '{def, lmn, ijk}', '{uvw}', 'opq']

edited Jan 29, 2014 at 6:58

answered Jan 29, 2014 at 6:52

Xavier Combelle

11.3k5 gold badges30 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Fenikso Over a year ago

May be worth noting why nested curly brackets probably cannot be solved using regular expression...

Fenikso Over a year ago

As @shx2 noted above, language with nested curly brackets is context-free and requires pushdown automaton to solve. Regular expressions in Python are more or less implementation of regular languages, parsed by finite automata, and thus less powerful.

Bakuriu Over a year ago

I'd change: \{.*?\} to \{[^}]*\}. It avoids the non-greedy match (which might be slower) and it matches even if the string contains newlines. Your current solution fails to match things like {abc,\nxyz}.

bpceee · Accepted Answer · 2014-01-29 08:07:25Z

1

given = "{abc,{a:b}, xyz} , 123 , {def, lmn, ijk}, {uvw}, opq"
#expected = ["{abc, xyz}", "123", "{def, lmn, ijk}", "{uvw}", "opq"]
tmp_l = given.split(',')
tmp_l = [i.strip() for i in tmp_l]
result_l = []
element = ''
count = 0
for i in tmp_l:
    if i[0] == '{':
        count += 1
    if i[-1] == '}':
        count -= 1
    element = element + i + ','
    if count == 0:
        element = element[0:-1]
        result_l.append(element)
        element = ''

print str(result_l)

this one can handle nested curly bracket, although it seems not so elegant..

answered Jan 29, 2014 at 8:07

bpceee

4164 silver badges17 bronze badges

Comments

Douglas Denhartog · Accepted Answer · 2014-01-29 06:59:57Z

0

Does the following not provide you with what you are looking for?

import re
given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
expected = re.findall(r'(\w+)', given)

I ran that in Terminal and got:

>>> import re
>>> given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
>>> expected = re.findall(r'(\w+)', given)
>>> expected
['abc', 'xyz', '123', 'def', 'lmn', 'ijk', 'uvw', 'opq']

edited Jan 29, 2014 at 6:59

answered Jan 29, 2014 at 6:48

Douglas Denhartog

2,0541 gold badge16 silver badges24 bronze badges

1 Comment

Milo P Over a year ago

That's not it, 'abc' and 'xyz' shouldn't be separate words if they're in the same set of brackets, for instance.

Furquan Khan · Accepted Answer · 2014-01-29 07:21:54Z

0

You can use the below regex to do that. Rest is same as the similar link you provided.

given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
regex = r",?\s*(\{.*?\}|[^,]+)"

print re.findall(regex,given)

OP: ['{abc, xyz}', '123', '{def, lmn, ijk}', '{uvw}', 'opq']

Just import the re module. and do the same as the link says. It will match anything inside the curly braces { } and any string.

edited Jan 29, 2014 at 7:21

answered Jan 29, 2014 at 6:54

Furquan Khan

1,5941 gold badge15 silver badges30 bronze badges

2 Comments

Xavier Combelle Over a year ago

you need re.findall somewhere all that you have is only a tuple

Furquan Khan Over a year ago

He has already given the link which explains the rest. Isn't it

Collectives™ on Stack Overflow

Python: Split string by pattern

4 Answers 4

3 Comments

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related