Am trying to parse and extract some information from a web page that contains CSS and of course HTML. I am using cssutils and beatifulsoup for this. Lets say I want to find out the font size used for a table heading. Beautifulsoup tells me where the table definition is in HTML. But if I want to know which style is used in the table do I get that information from BeatifulSoup? If not how do I go about solving this problem. Thanks for any help.
1 Answer
Yes you get it. BeautifulSoup is perfect the choice and with regular expression is strong power :)
Example:
import re from BeautifulSoup import BeautifulSoup soup = BeautifulSoup('<h1 style="font-size: 12px; margin: 5px">Test</h>') style = soup.find('h1')['style'] re.findall('font-size[^;]+', style) # [u'font-size: 12px']