4

Is it possible to extract the embedded css properties from an html tag? For instance, suppose I want to find out what the vertical-align attribute for "s5" is.

I'm currently using beautifulsoup and have retrieved the span-tag with tag=soup.find(class_="s5"). I've tried tag.attrs["class"] but that just gives me s5, with no way to link it to the embedded style. Is it possible to do this in python? Every question of this sort that I've found involves parsing inline css styles.

<html>
    <head>
        <style type="text/css">
        * {margin:0; padding:0; text-indent:0; }
        .s5 {color: #000; font-family:Verdana, sans-serif; 
             font-style: normal; font-weight: normal; 
             text-decoration: none; font-size: 17.5pt; 
             vertical-align: 10pt;}
        </style>
    </head>

    <body>
        <p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
        This is a sample sentence. <span class="s5"> 1</span>
        </p>
    </body>
</html>

2
  • have you looked into tinycss? Commented Jan 22, 2019 at 20:55
  • I couldn't find anything in the documentation that concerned this Commented Jan 24, 2019 at 16:49

2 Answers 2

4

You can use a css parser like [cssutils][1]. I don't know if there is a function in the package itself to do something like this (can someone comment regarding this?), but i made a custom function to get it.

from bs4 import BeautifulSoup
import cssutils
html='''
<html>
    <head>
        <style type="text/css">
        * {margin:0; padding:0; text-indent:0; }
        .s5 {color: #000; font-family:Verdana, sans-serif;
             font-style: normal; font-weight: normal;
             text-decoration: none; font-size: 17.5pt;
             vertical-align: 10pt;}
        </style>
    </head>

    <body>
        <p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
        This is a sample sentence. <span class="s5"> 1</span>
        </p>
    </body>
</html>
'''
def get_property(class_name,property_name):
    for rule in sheet:
        if rule.selectorText=='.'+class_name:
            for property in rule.style:
                if property.name==property_name:
                    return property.value
soup=BeautifulSoup(html,'html.parser')
sheet=cssutils.parseString(soup.find('style').text)
vl=get_property('s5','vertical-align')
print(vl)

Output

10pt

This is not perfect but maybe you can improve upon it. [1]: https://pypi.org/project/cssutils/

Sign up to request clarification or add additional context in comments.

Comments

4

To improve upon the cssutils answer:

For an inline style="..." tag:

import cssutils

# get the style from beautiful soup, like: 
# style = tag['style']
style = "color: hotpink; background-color:#ff0000; visibility:hidden"

parsed_style = cssutils.parseStyle(style)

Now use parsed_style like you would an dict:

print(parsed_style['color'])  # hotpink
print(parsed_style['background-color'])  # f00
print(parsed_style['visibility'])  # hidden

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.