HTML parser in Python [closed]

Question

Closed. This question is seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. It does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.

Closed 6 years ago.

Improve this question

Using the Python Documentation I found the HTML parser but I have no idea which library to import to use it, how do I find this out (bearing in mind it doesn't say on the page).

Community · Accepted Answer · 2017-05-23 11:45:36Z

24

You probably really want BeautifulSoup, check the link for an example.

But in any case

>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> h.feed('<html></html>')
>>> h.get_starttag_text()
'<html>'
>>> h.close()

edited May 23, 2017 at 11:45

CommunityBot

11 silver badge

answered Sep 16, 2008 at 10:54

Vinko Vrsalovic

342k55 gold badges341 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

diabloneo · Accepted Answer · 2011-09-27 14:55:46Z

21

Try:

import HTMLParser

In Python 3.0, the HTMLParser module has been renamed to html.parser you can check about this here

Python 3.0

import html.parser

Python 2.2 and above

import HTMLParser

edited Sep 27, 2011 at 14:55

diabloneo

2,9372 gold badges20 silver badges17 bronze badges

answered Sep 16, 2008 at 10:51

1077

1,3681 gold badge10 silver badges9 bronze badges

1 Comment

noobninja Over a year ago

https://docs.python.org/2/library/htmlparser.html

Swaroop C H · Accepted Answer · 2008-09-16 10:54:21Z

4

I would recommend using Beautiful Soup module instead and it has good documentation.

answered Sep 16, 2008 at 10:54

Swaroop C H

17.1k10 gold badges46 silver badges51 bronze badges

Comments

Paweł Hajdan · Accepted Answer · 2008-09-17 11:19:11Z

4

You may be interested in lxml. It is a separate package and has C components, but is the fastest. It has also very nice API, allowing you to easily list links in HTML documents, or list forms, sanitize HTML, and more. It also has capabilities to parse not well-formed HTML (it's configurable).

answered Sep 17, 2008 at 11:19

Paweł Hajdan

18.6k9 gold badges54 silver badges65 bronze badges

Comments

Cristian Ciupitu · Accepted Answer · 2011-07-25 20:40:37Z

4

You should also look at html5lib for Python as it tries to parse HTML in a way that very much resembles what web browsers do, especially when dealing with invalid HTML (which is more than 90% of today's web).

edited Jul 25, 2011 at 20:40

Cristian Ciupitu

21.1k7 gold badges56 silver badges80 bronze badges

answered Sep 16, 2008 at 12:14

Alexey Feldgendler

1,81010 silver badges17 bronze badges

Comments

1077 · Accepted Answer · 2008-09-16 13:21:55Z

3

I don't recommend BeautifulSoup if you want speed. lxml is much, much faster, and you can fall back in lxml's BS soupparser if the default parser doesn't work.

answered Sep 16, 2008 at 13:21

1077

1,3681 gold badge10 silver badges9 bronze badges

1 Comment

DrDee Over a year ago

I agree, BeautifulSoup is only useful when parsing a handful of files, there are too many memoryleaks.

Antti Rasinen · Accepted Answer · 2008-09-16 10:55:20Z

1

For real world HTML processing I'd recommend BeautifulSoup. It is great and takes away much of the pain. Installation is easy.

answered Sep 16, 2008 at 10:55

Antti Rasinen

10.2k2 gold badges25 silver badges18 bronze badges

Comments

Eric Leschinski · Accepted Answer · 2014-03-05 01:29:04Z

1

There's a link to an example on the bottom of (http://docs.python.org/2/library/htmlparser.html) , it just doesn't work with the original python or python3. It has to be python2 as it says on the top.

edited Mar 5, 2014 at 1:29

Eric Leschinski

155k96 gold badges423 silver badges337 bronze badges

answered Sep 16, 2008 at 10:52

Vytautas Šaltenis

6653 silver badges15 bronze badges

Collectives™ on Stack Overflow

HTML parser in Python [closed]

8 Answers 8

Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Comments

Linked

Related