scraping html text using python

Question

how can i get the word Rodger Federer only from the html below

<div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Tennis</div></div>

am using this code

name = soup.find(class_ = 'profile-heading__rank').get_text()

and am getting #1

If the code you're working in is Python, it would be worth adding that (and the appropriate version) as a tag. — DBS
– DBS, Commented Jul 22, 2020 at 14:19

0stone0 · Accepted Answer · 2020-07-22 14:39:20Z

1

Use .next_sibling to get the text next to the <h1>:

from bs4 import BeautifulSoup

html = """
<div class="profile-heading--desktop">
    <h1>
        <span class="profile-heading__rank">#1 </span>
        Roger Federer
    </h1>
    <div class="profile-subheading">
        Athlete, Tennis
    </div>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')
name = soup.find(class_='profile-heading__rank').next_sibling

print(name)  # -->  Roger Federer

answered Jul 22, 2020 at 14:39

0stone0

45.5k6 gold badges54 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

UWTD TV · Accepted Answer · 2020-07-22 17:54:25Z

0

A other way is to use .find(text=True, recursive=False) after finding h1:

from bs4 import BeautifulSoup

html = '<div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Tennis</div></div>'
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('h1').find(text=True, recursive=False))

Output:

Roger Federer

answered Jul 22, 2020 at 17:54

UWTD TV

9101 gold badge6 silver badges11 bronze badges

Collectives™ on Stack Overflow

scraping html text using python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related