parse embedded css beautifulsoup

Question

Is it possible to extract the embedded css properties from an html tag? For instance, suppose I want to find out what the vertical-align attribute for "s5" is.

I'm currently using beautifulsoup and have retrieved the span-tag with tag=soup.find(class_="s5"). I've tried tag.attrs["class"] but that just gives me s5, with no way to link it to the embedded style. Is it possible to do this in python? Every question of this sort that I've found involves parsing inline css styles.

<html>
    <head>
        <style type="text/css">
        * {margin:0; padding:0; text-indent:0; }
        .s5 {color: #000; font-family:Verdana, sans-serif; 
             font-style: normal; font-weight: normal; 
             text-decoration: none; font-size: 17.5pt; 
             vertical-align: 10pt;}
        </style>
    </head>

    <body>
        <p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
        This is a sample sentence. <span class="s5"> 1</span>
        </p>
    </body>
</html>

I couldn't find anything in the documentation that concerned this — nodel
– nodel, Commented Jan 24, 2019 at 16:49

MattDMo · Accepted Answer · 2022-07-30 21:39:56Z

You can use a css parser like [cssutils][1]. I don't know if there is a function in the package itself to do something like this (can someone comment regarding this?), but i made a custom function to get it.

from bs4 import BeautifulSoup
import cssutils
html='''
<html>
    <head>
        <style type="text/css">
        * {margin:0; padding:0; text-indent:0; }
        .s5 {color: #000; font-family:Verdana, sans-serif;
             font-style: normal; font-weight: normal;
             text-decoration: none; font-size: 17.5pt;
             vertical-align: 10pt;}
        </style>
    </head>

    <body>
        <p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
        This is a sample sentence. <span class="s5"> 1</span>
        </p>
    </body>
</html>
'''
def get_property(class_name,property_name):
    for rule in sheet:
        if rule.selectorText=='.'+class_name:
            for property in rule.style:
                if property.name==property_name:
                    return property.value
soup=BeautifulSoup(html,'html.parser')
sheet=cssutils.parseString(soup.find('style').text)
vl=get_property('s5','vertical-align')
print(vl)

Output

10pt

This is not perfect but maybe you can improve upon it. [1]: https://pypi.org/project/cssutils/

luckydonald · Accepted Answer · 2022-04-07 15:24:36Z

4

To improve upon the cssutils answer:

For an inline style="..." tag:

import cssutils

# get the style from beautiful soup, like: 
# style = tag['style']
style = "color: hotpink; background-color:#ff0000; visibility:hidden"

parsed_style = cssutils.parseStyle(style)

Now use parsed_style like you would an dict:

print(parsed_style['color'])  # hotpink
print(parsed_style['background-color'])  # f00
print(parsed_style['visibility'])  # hidden

edited Apr 7, 2022 at 15:24

answered May 1, 2020 at 17:33

luckydonald

7,0665 gold badges43 silver badges65 bronze badges

Collectives™ on Stack Overflow

parse embedded css beautifulsoup

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related