1

I'm a newbie to regular expressions and I have the following string:

sequence = '["{\"First\":\"Belyuen,NT,0801\",\"Second\":\"Belyuen,NT,0801\"}","{\"First\":\"Larrakeyah,NT,0801\",\"Second\":\"Larrakeyah,NT,0801\"}"]'

I am trying to extract the text Belyuen,NT,0801 and Larrakeyah,NT,0801 in python. I have the following code which is not working:

re.search('\:\\"...\\', ''.join(sequence))

I.e. I want to get the string between characters :\ and \.

0

3 Answers 3

3

Don't use regex for this. It appears to be a rather strangely split set of JSON strings. Join them back together and use the json module to decode it.

import json
sequence = '[%s]' % ','.join(sequence)
data = json.loads(sequence)
print data[0]['First'], data[0]['Second']

(Note the json module is new in Python2.6 - if you have a lower version, download and install simplejson).

Sign up to request clarification or add additional context in comments.

4 Comments

the sequence is actually of string type (I updated question). the interpreter keeps throwing an error for the line data = json.loads(sequence) and the error is raise ValueError(errmsg("Expecting object", s, end))
if I scrap the second line of your code and print data[0] I get: {"First":"Belyuen,NT,0801","Second":"Belyuen,NT,0801"}
and if I print data[0]['First'] it comes up with the following error: ` print data[0]['First'] TypeError: string indices must be integers`
I ended up being able to extract what I wanted by doing the following: ` data = json.loads(sequence) /n location = json.loads(data[0]) /n print location['First']`
3

it seems like a proper serialization of the Python dict, you could just do:

>>> sequence = ["{\"First\":\"Belyuen,NT,0801\",\"Second\":\"Belyuen,NT,0801\"}","{\"First\":\"Larrakeyah,NT,0801\",\"Second\":\"Larrakeyah,NT,0801\"}"]
>>> import json
>>> for i in sequence:
    d = json.loads(i)
    print(d['First'])


Belyuen,NT,0801
Larrakeyah,NT,0801

3 Comments

the sequence is actually a string not list ( I updated the question ). so how do I load it into the json module as a string?
@seth: unfortunately, it seems that the quotes in your input string are misused. it doesn't work either with json or eval. If you fix them, using alternate single and double quote, escaped where needed, then it works just fine with the method I showed. Again, quotes within string should be alternating, quotes that were used for original Python string, should be, of course, escaped.
thanks for your response, check out my comments in Daniel Roseman's answer. I ended up extracting what I needed in a convoluted way, but got it nevertheless. +1 for your help and useful answer.
2

you don't need regex

>>> sequence = ["{\"First\":\"Belyuen,NT,0801\",\"Second\":\"Belyuen,NT,0801\"}","{\"First\":\"Larrakeyah,NT,0801\",\"Second\":\"Larrakeyah,NT,0801\"}"]
>>> for item in sequence:
...  print eval(item).values()
...
['Belyuen,NT,0801', 'Belyuen,NT,0801']
['Larrakeyah,NT,0801', 'Larrakeyah,NT,0801']

1 Comment

solution works in version <2.6. And i don't want to download any other modules.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.