2

Yesterday I posted a similar question to this one: Python Regex Named Groups. This work's pretty well for simple things.

After some researching I've read about the pyparsing library which seems to be pretty perfect for my tasks.

text = '[@a eee, fff fff, ggg @b eee, fff, ggg @c eee eee, fff fff,ggg ggg@d]'
command_s = Suppress(Optional('[') + Literal('@'))
command_e = Suppress(Literal('@') | Literal(']'))
task = Word(alphas)
arguments = ZeroOrMore(
    Word(alphas) + 
    Suppress(
        Optional(Literal(',') + White()) | Optional(White() + Literal('@'))
    )
)
command = Group(OneOrMore(command_s + task + arguments + command_e))
print command.parseString(text)

# which outputs only the first @a sequence
# [['a', 'eee', 'fff', 'fff', 'ggg']]

# the structure should be someting like:
[
     ['a', 'eee', 'fff fff', 'ggg'],
     ['b', 'eee', 'fff', 'ggg'],
     ['c', 'eee eee', 'fff fff', 'ggg ggg'],
     ['d']
]

@ indicates the start of a sequence, the first word is a task (a) followed by optional comma-separated arguments (eee, fff fff, ggg). The problem is, that @b, @c and @d are ignored by the above code. Also "fff fff" getting treated as two separated arguments, it should only be one.

1 Answer 1

4

See the embedded comments.

text = '[@a eee, fff fff, ggg @b eee, fff, ggg @c eee eee, fff fff,ggg ggg@d]'

from pyparsing import *

LBRACK,RBRACK,AT = map(Suppress,"[]@")

key = AT + Word(alphas)

# use originalTextFor to preserve whitespace between words between commas
list_item = originalTextFor(OneOrMore(Word(alphas)))

# define a key_value pair using Group to preserve structure
key_value = Group(key + Optional(delimitedList(list_item)))

parser = LBRACK + OneOrMore(key_value) + RBRACK
print parser.parseString(text)

This will print your desired output.

[['a', 'eee', 'fff fff', 'ggg'], 
 ['b', 'eee', 'fff', 'ggg'], 
 ['c', 'eee eee', 'fff fff', 'ggg ggg'], 
 ['d']]

For extra credit, here is how to have pyparsing define keys for you:

# Extra credit:
# use Dict to auto-define named groups using each '@x' as a key
parser = LBRACK + Dict(OneOrMore(key_value)) + RBRACK
result = parser.parseString(text)

# print the parsed keys
print result.keys()

# print a value for a particular key
print result['c']

# print a value for a particular key using object notation
print result.b

# dump out the whole structure to see just what we got
print result.dump()

Prints

['a', 'c', 'b', 'd']
['eee eee', 'fff fff', 'ggg ggg']
['eee', 'fff', 'ggg']
[['a', 'eee', 'fff fff', 'ggg'], ['b', 'eee', 'fff', 'ggg'], ['c', 'eee eee', 'fff fff', 'ggg ggg'], ['d']]
- a: ['eee', 'fff fff', 'ggg']
- b: ['eee', 'fff', 'ggg']
- c: ['eee eee', 'fff fff', 'ggg ggg']
- d: 
Sign up to request clarification or add additional context in comments.

4 Comments

Jesus christ, have to check that out. Thank's so far!
Works perfect!!! I have no idea how it's working under the hood - have to digg into and use the API docs. Billiant, Thank you very much.
Look at the single difference between the two definitions of parser, the second wraps the OneOrMore(key_value) with Dict. The parsed lists are the same, but Dict creates results names (like regex's named groups) using the first element of each group as the key, and the rest of each group as the value. If you peek into pyparsing's code, this is done inside Dict.postParse.
All right, got it. Had to search the docs for the originalTextFor and delimitedList method. Nice explanation, again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.