Pyparsing sequence of repeating pattern

Question

Yesterday I posted a similar question to this one: Python Regex Named Groups. This work's pretty well for simple things.

After some researching I've read about the pyparsing library which seems to be pretty perfect for my tasks.

text = '[@a eee, fff fff, ggg @b eee, fff, ggg @c eee eee, fff fff,ggg ggg@d]'
command_s = Suppress(Optional('[') + Literal('@'))
command_e = Suppress(Literal('@') | Literal(']'))
task = Word(alphas)
arguments = ZeroOrMore(
    Word(alphas) + 
    Suppress(
        Optional(Literal(',') + White()) | Optional(White() + Literal('@'))
    )
)
command = Group(OneOrMore(command_s + task + arguments + command_e))
print command.parseString(text)

# which outputs only the first @a sequence
# [['a', 'eee', 'fff', 'fff', 'ggg']]

# the structure should be someting like:
[
     ['a', 'eee', 'fff fff', 'ggg'],
     ['b', 'eee', 'fff', 'ggg'],
     ['c', 'eee eee', 'fff fff', 'ggg ggg'],
     ['d']
]

@ indicates the start of a sequence, the first word is a task (a) followed by optional comma-separated arguments (eee, fff fff, ggg). The problem is, that @b, @c and @d are ignored by the above code. Also "fff fff" getting treated as two separated arguments, it should only be one.

PaulMcG · Accepted Answer · 2013-01-02 17:50:22Z

4

See the embedded comments.

text = '[@a eee, fff fff, ggg @b eee, fff, ggg @c eee eee, fff fff,ggg ggg@d]'

from pyparsing import *

LBRACK,RBRACK,AT = map(Suppress,"[]@")

key = AT + Word(alphas)

# use originalTextFor to preserve whitespace between words between commas
list_item = originalTextFor(OneOrMore(Word(alphas)))

# define a key_value pair using Group to preserve structure
key_value = Group(key + Optional(delimitedList(list_item)))

parser = LBRACK + OneOrMore(key_value) + RBRACK
print parser.parseString(text)

This will print your desired output.

[['a', 'eee', 'fff fff', 'ggg'], 
 ['b', 'eee', 'fff', 'ggg'], 
 ['c', 'eee eee', 'fff fff', 'ggg ggg'], 
 ['d']]

For extra credit, here is how to have pyparsing define keys for you:

# Extra credit:
# use Dict to auto-define named groups using each '@x' as a key
parser = LBRACK + Dict(OneOrMore(key_value)) + RBRACK
result = parser.parseString(text)

# print the parsed keys
print result.keys()

# print a value for a particular key
print result['c']

# print a value for a particular key using object notation
print result.b

# dump out the whole structure to see just what we got
print result.dump()

Prints

['a', 'c', 'b', 'd']
['eee eee', 'fff fff', 'ggg ggg']
['eee', 'fff', 'ggg']
[['a', 'eee', 'fff fff', 'ggg'], ['b', 'eee', 'fff', 'ggg'], ['c', 'eee eee', 'fff fff', 'ggg ggg'], ['d']]
- a: ['eee', 'fff fff', 'ggg']
- b: ['eee', 'fff', 'ggg']
- c: ['eee eee', 'fff fff', 'ggg ggg']
- d:

edited Jan 2, 2013 at 17:50

answered Jan 2, 2013 at 17:33

PaulMcG

64.1k16 gold badges98 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

hetsch Over a year ago

Jesus christ, have to check that out. Thank's so far!

hetsch Over a year ago

Works perfect!!! I have no idea how it's working under the hood - have to digg into and use the API docs. Billiant, Thank you very much.

PaulMcG Over a year ago

Look at the single difference between the two definitions of parser, the second wraps the OneOrMore(key_value) with Dict. The parsed lists are the same, but Dict creates results names (like regex's named groups) using the first element of each group as the key, and the rest of each group as the value. If you peek into pyparsing's code, this is done inside Dict.postParse.

hetsch Over a year ago

All right, got it. Had to search the docs for the originalTextFor and delimitedList method. Nice explanation, again!

Collectives™ on Stack Overflow

Pyparsing sequence of repeating pattern

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related