Parsing through HTML with BeautifulSoup in Python

Question

Currently my code is as follows:

 from bs4 import BeautifulSoup
 import requests

 main_url = 'http://www.foodnetwork.com/recipes/a-z'
 response = requests.get(main_url)
 soup = BeautifulSoup(response.text, "html.parser")
 mylist = [t for tags in soup.find_all(class_='m-PromoList o-Capsule__m-
           PromoList') for t in tags if (t!='\n')]

As of now, I get a list containing the correct information but its still inside of HTML tags. An example of an element of the list is given below:

 <li class="m-PromoList__a-ListItem"><a href="//www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-3612570">"16 Bean" Pasta E Fagioli</a></li>

from this item I want to extract both the href link and also the following string separately, but I am having trouble doing this and I really don't think getting this info should require a whole new set of operations. How do?

Check the Beautiful Soup documentation. You can access the attributes of the tags such as t.href or t.get("href", None) — Goodies
– Goodies, Commented Jan 3, 2018 at 5:22

Keyur Potdar · Accepted Answer · 2018-01-03 08:06:22Z

1

You can do this to get href and text for one element:

href = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a')['href']
text = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a').text

For a list of items:

my_list = soup.find_all('li', attrs={'class':'m-PromoList__a-ListItem'})
for el in my_list:
    href = el.find('a')['href']
    text = el.find('a').text
    print(href)
    print(text)

Edit:
An important tip to reduce run time: Don't search for the same tag more than once. Instead, save the tag in a variable and then use it multiple times.

a = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a')
href = a.get('href')
text = a.text

In large HTML codes, finding a tag takes up lot of time, so doing this will reduce the time taken to find the tag as it will run only once.

edited Jan 3, 2018 at 8:06

Keyur Potdar

7,2386 gold badges27 silver badges40 bronze badges

answered Jan 3, 2018 at 5:22

Sagun Shrestha

1,19812 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Fancypants753 Over a year ago

what if theres two of the same tags, each with relevant info?

Sagun Shrestha Over a year ago

soup.find_all('li', attrs={'class':'m-PromoList__a-ListItem'}) will get all of them

Sagun Shrestha Over a year ago

cool, now can you accept my answer and help me gather some points ;)

Fancypants753 Over a year ago

Is there somewhere other than the beautiful soup website where you're taught to use beautiful soup?

Sagun Shrestha Over a year ago

A better way to scrape website is using scrapy. Its fast, efficient and easy.

SIM · Accepted Answer · 2018-01-03 11:07:11Z

Several ways you can achieve the same. Here is another approach using css selector:

from bs4 import BeautifulSoup
import requests

response = requests.get('http://www.foodnetwork.com/recipes/a-z')
soup = BeautifulSoup(response.text, "lxml")
for item in soup.select(".m-PromoList__a-ListItem a"):
    print("Item_Title: {}\nItem_Link: {}\n".format(item.text,item['href']))

Partial result:

Item_Title: "16 Bean" Pasta E Fagioli
Item_Link: //www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-3612570

Item_Title: "16 Bean" Pasta e Fagioli
Item_Link: //www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-1-3753755

Item_Title: "21" Apple Pie
Item_Link: //www.foodnetwork.com/recipes/21-apple-pie-recipe-1925900

Collectives™ on Stack Overflow

Parsing through HTML with BeautifulSoup in Python

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related