Parsing malformed string in python [duplicate]

Question

Possible Duplicate:
Decode HTML entities in Python string?

I have a malformed string in Python:

Muhammad Ali&#39;s fight with Larry Holmes

where ' is a apostrophe.

Firstly what representation is this: '? Secondly, how can I parse the string in python so that it replaces ' with '

This looks like a HTML entity of a character with code 39 (which would make it easy to parse and reassemble using chr(). However there are is also a big number of symbolic HTML entities like & (&) which you'd probably want to also consider. — Kos
– Kos, Commented Nov 13, 2011 at 20:17
@All: I did not know how to search for an answer because I did not know what to search. — Bruce
– Bruce, Commented Nov 13, 2011 at 20:20

Acorn · Accepted Answer · 2011-11-13 20:20:21Z

5

The Python Standard Library's HTMLParser is able to decode HTML entities in strings.

>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> s = h.unescape('&copy; 2010')
>>> s
u'\xa9 2010'
>>> print s
© 2010
>>> s = h.unescape('&#169; 2010')
>>> s
u'\xa9 2010'

A range of solutions are described here: http://fredericiana.com/2010/10/08/decoding-html-entities-to-text-in-python/

answered Nov 13, 2011 at 20:20

Acorn

50.8k30 gold badges143 silver badges180 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Adam Wagner · Accepted Answer · 2011-11-13 20:24:31Z

1

The &#CHAR-CODE; is a sytax for special chars in html (maybe elsewhere, but I'm not sure). There may be a more complete way to do this, but you could replace it simply with:

mystring = "Muhammad Ali&#39;s fight with Larry Holmes"
print mystring.replace("&#39;", "'")

Yields:

Muhammad Ali's fight with Larry Holmes

edited Nov 13, 2011 at 20:24

answered Nov 13, 2011 at 20:17

Adam Wagner

16.2k7 gold badges54 silver badges67 bronze badges

Collectives™ on Stack Overflow

Parsing malformed string in python [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related