Using Python and BeautifulSoup to Parse a Table

Question

I am trying to access content in certain td tags with Python and BeautifulSoup. I can either get the first td tag meeting the criteria (with find), or all of them (with findAll).

Now, I could just use findAll, get them all, and get the content I want out of them, but that seems like it is inefficient (even if I put limits on the search). Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?

Here's my code so far:

from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})

for x in range(len(td)):
    var1 = td[x]
    var2 = var1.contents[0]
    print(var2)

score 2 · Accepted Answer · 2011-06-21 05:56:45Z

2

Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?

Well...

all_tds = [td for td in soup.findAll("td", {'class': 'yfnc_tablehead1'})]

print all_tds[3]

...there is no other way..

edited Jun 21, 2011 at 5:56

answered Jun 21, 2011 at 4:08

user2665694

Sign up to request clarification or add additional context in comments.

4 Comments

Steven Matthews Over a year ago

Sigh, that's what I thought - I was hoping I was wrong! There isn't even a way using find? I just wish that there was a way to find a specific instance of a tag.

Steven Matthews Over a year ago

all_tds = [td for td in td = soup.findAll("td", {'class': 'yfnc_tablehead1'})] Also, that line doesn't work.

user2665694 Over a year ago

Well, fixed - you should be able to discover and fix a typo yourself....blindly copy&pasting code is not a good idea without thinking what you are actually doing

Steven Matthews Over a year ago

You're right, I probably should have analyzed it first. I'm still semi newish to programming (about 3 months of practice) and make rookie mistakes. Sorry!

cerberos · Accepted Answer · 2011-06-21 14:00:37Z

1

find and findAll are very flexible, the BeautifulSoup.findAll docs say

5. You can pass in a callable object which takes a Tag object as its only argument, and returns a boolean. Every Tag object that findAll encounters will be passed into this object, and if the call returns True then the tag is considered to match.

edited Jun 21, 2011 at 14:00

answered Jun 21, 2011 at 5:38

cerberos

8,0835 gold badges44 silver badges45 bronze badges

5 Comments

Steven Matthews Over a year ago

Hrm, that might let me do what I need to do. I'll do some tests tonight after work.

Steven Matthews Over a year ago

Only issue I see with this is that it's the same tag with the same information. Unless there's the ability to check a child, maybe.

cerberos Over a year ago

Yes it's the same tag but you can check the child tags before deciding whether to return true or false, thus giving you an interable of all the tags you want.

Steven Matthews Over a year ago

Hrm, biggest problem is that there aren't actually any child tags, just text that I want to pull. Will check it out.

Steven Matthews Over a year ago

Awesome awesome, think I got this.

Collectives™ on Stack Overflow

Using Python and BeautifulSoup to Parse a Table

2 Answers 2

4 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related