0

Currently my code is as follows:

 from bs4 import BeautifulSoup
 import requests

 main_url = 'http://www.foodnetwork.com/recipes/a-z'
 response = requests.get(main_url)
 soup = BeautifulSoup(response.text, "html.parser")
 mylist = [t for tags in soup.find_all(class_='m-PromoList o-Capsule__m-
           PromoList') for t in tags if (t!='\n')] 

As of now, I get a list containing the correct information but its still inside of HTML tags. An example of an element of the list is given below:

 <li class="m-PromoList__a-ListItem"><a href="//www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-3612570">"16 Bean" Pasta E Fagioli</a></li>

from this item I want to extract both the href link and also the following string separately, but I am having trouble doing this and I really don't think getting this info should require a whole new set of operations. How do?

1
  • Check the Beautiful Soup documentation. You can access the attributes of the tags such as t.href or t.get("href", None) Commented Jan 3, 2018 at 5:22

2 Answers 2

1

You can do this to get href and text for one element:

href = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a')['href']
text = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a').text

For a list of items:

my_list = soup.find_all('li', attrs={'class':'m-PromoList__a-ListItem'})
for el in my_list:
    href = el.find('a')['href']
    text = el.find('a').text
    print(href)
    print(text)

Edit:
An important tip to reduce run time: Don't search for the same tag more than once. Instead, save the tag in a variable and then use it multiple times.

a = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a')
href = a.get('href')
text = a.text

In large HTML codes, finding a tag takes up lot of time, so doing this will reduce the time taken to find the tag as it will run only once.

Sign up to request clarification or add additional context in comments.

5 Comments

what if theres two of the same tags, each with relevant info?
soup.find_all('li', attrs={'class':'m-PromoList__a-ListItem'}) will get all of them
cool, now can you accept my answer and help me gather some points ;)
Is there somewhere other than the beautiful soup website where you're taught to use beautiful soup?
A better way to scrape website is using scrapy. Its fast, efficient and easy.
0

Several ways you can achieve the same. Here is another approach using css selector:

from bs4 import BeautifulSoup
import requests

response = requests.get('http://www.foodnetwork.com/recipes/a-z')
soup = BeautifulSoup(response.text, "lxml")
for item in soup.select(".m-PromoList__a-ListItem a"):
    print("Item_Title: {}\nItem_Link: {}\n".format(item.text,item['href']))

Partial result:

Item_Title: "16 Bean" Pasta E Fagioli
Item_Link: //www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-3612570

Item_Title: "16 Bean" Pasta e Fagioli
Item_Link: //www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-1-3753755

Item_Title: "21" Apple Pie
Item_Link: //www.foodnetwork.com/recipes/21-apple-pie-recipe-1925900

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.