i have read the html.parser documentation, but I cannot find the anchorlist attribute of HTMLParser class. Python 2.x has that attribute.
I googled for it, but cannot find an answer. In Python 3.x, does the class HTMLParser have it?
i have read the html.parser documentation, but I cannot find the anchorlist attribute of HTMLParser class. Python 2.x has that attribute.
I googled for it, but cannot find an answer. In Python 3.x, does the class HTMLParser have it?
The anchorlist attribute was part of the htmllib.HTMLParser class. The module was deprecated in Python 2.6 and is not present in Python 3.
The html.parser module in Python 3, on the other hand, was called HTMLParser in Python 2. It does not have the anchorlist attribute.
You can emulate the attribute by listening for start tag events, for any a tag add the href attribute (if present) to a list to build the same list:
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self, *args, **kw):
super().__init__(*args, **kw)
self.archorlist = []
def handle_starttag(self, tag, attrs):
if tag == 'a':
attributes = dict(attrs)
if "href" in attributes:
self.anchorlist.append(attributes["href"])
Alternatively, use a friendlier API like BeautifulSoup to gather link anchors instead.
attrs argument on handle_starttag is actually a list rather than a dictionary, so one has to iterate over the list which contains tuples with name, value, see docs.python.org/3/library/…