Web Scraping particular tags using Python [closed]

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist

Closed 12 years ago.

Improve this question

I need to be able to extract the HTML content within the tags provided I have the URL's of the pages. Is there any way i can do this using Python?

Duplicate. stackoverflow.com/questions/1391657/… stackoverflow.com/questions/2081586/… stackoverflow.com/questions/6969567/… — Logan
– Logan, Commented Jul 26, 2013 at 5:07

Ralph Caraveo · Accepted Answer · 2013-07-26 05:04:22Z

1

There is an incredible scraping library for Python called BeautifulSoup which will make your life much easier: http://www.crummy.com/software/BeautifulSoup/

BeautifulSoup allows you to select by html tags and/or html attributes such via a css class name. It also handles bad html docs really well but you need to read the docs on how it works. It's pretty amazing what you can scrape with so few lines of code using this library.

Have fun!

answered Jul 26, 2013 at 5:04

Ralph Caraveo

10.3k7 gold badges42 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Serial · Accepted Answer · 2013-07-26 05:04:38Z

0

Use BeautifuSoup

it is very easy to do this just use urllib to get the data from the web then use BeautifulSoup to parse out the information you need

here is an example:

import urllib2
from bs4 import BeautifulSoup

url = urllib2.urlopen('example.com')

soup = BeautifulSoup(url)

you can then use BeautifulSoup to extract the infromation given a certain tag like this

soup.find_all('tag_name')

also there are alot of other ways to extract data this site will help Web-Scraping with bs4

answered Jul 26, 2013 at 5:04

Serial

8,05114 gold badges55 silver badges74 bronze badges

4 Comments

Blender Over a year ago

from bs4 import * should be from bs4 import BeautifulSoup. Also, you don't need to read the file handle before passing it into BeautifulSoup.

Serial Over a year ago

well if you download BeautifulSoup 4 you can import it like that

Blender Over a year ago

Sorry, I was talking about the asterisk. You shouldn't need to do that.

Serial Over a year ago

ohhhh yes youre right ill fix it

Collectives™ on Stack Overflow

Web Scraping particular tags using Python [closed]

2 Answers 2

Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Linked

Related