1

i have read the html.parser documentation, but I cannot find the anchorlist attribute of HTMLParser class. Python 2.x has that attribute.

I googled for it, but cannot find an answer. In Python 3.x, does the class HTMLParser have it?

2
  • Where did you see this attribute? Do you have a reference to it? Commented Aug 3, 2013 at 14:31
  • @BurhanKhalid: See docs.python.org/2/library/… Commented Aug 3, 2013 at 14:37

1 Answer 1

1

The anchorlist attribute was part of the htmllib.HTMLParser class. The module was deprecated in Python 2.6 and is not present in Python 3.

The html.parser module in Python 3, on the other hand, was called HTMLParser in Python 2. It does not have the anchorlist attribute.

You can emulate the attribute by listening for start tag events, for any a tag add the href attribute (if present) to a list to build the same list:

from html.parser import HTMLParser


class MyHTMLParser(HTMLParser):
    def __init__(self, *args, **kw):
        super().__init__(*args, **kw)
        self.archorlist = []

    def handle_starttag(self, tag, attrs):
        if tag == 'a':
            attributes = dict(attrs)
            if "href" in attributes:
                self.anchorlist.append(attributes["href"])

Alternatively, use a friendlier API like BeautifulSoup to gather link anchors instead.

Sign up to request clarification or add additional context in comments.

2 Comments

The attrs argument on handle_starttag is actually a list rather than a dictionary, so one has to iterate over the list which contains tuples with name, value, see docs.python.org/3/library/…
@gforcada: indeed; it's easy enough to turn it into a dictionary where you don't need to expect multiple copies of the attribute. I've done so in an update. Thanks for fixing this for me!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.