Python Web Scraper issue

Question

I'm new to to programming and trying to learn by building some small side projects. I have this code and it is working but I am having an issue with it formatting correctly in csv when it pulls all the information. It started adding weird spaces after I added price to be pulled as well. if I comment out price and remove it from write it works fine but I can't figure out why I am getting weird spaces when I add it back.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=12&order=BESTMATCH"


# Opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()


#html parsing
page_soup = soup(page_html, "html.parser")


# grabs each products
containers = page_soup.findAll("div",{"class":"item-container"})


filename = "products.csv"
f = open(filename, "w")

headers = "brand, product_name, shipping\n"

f.write(headers)

for container in containers:
    brand = container.div.div.a.img["title"]

    title_container = container.findAll("a", {"class":"item-title"})
    product_name = title_container[0].text

    shipping_container = container.findAll("li", {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()

    price_container = container.findAll("li", {"class":"price-current"})
    price = price_container[0].text.strip()

    print("brand: " + brand)
    print("product_name: " + product_name)
    print("Price: " + price)
    print("shipping: " + shipping)


    f.write(brand + "," + product_name.replace(",", "|") + "," + shipping + "," + price + "\n")

f.close()

I added a picture of the csv with price added and you can see in the picture for some reason its not adding a column for price and instead making spaces and move information around — David
– David, Commented Nov 1, 2018 at 4:24

Raghavendra Gupta · Accepted Answer · 2018-11-01 05:40:40Z

You can write to a csv file like the way I've showed below. The output it produces should serve the purpose. Check out this documentation to get the clarity.

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

my_url = "https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=12&order=BESTMATCH"

page_html = urlopen(my_url).read()
page_soup = BeautifulSoup(page_html, "lxml")

with open("outputfile.csv","w",newline="") as infile:
    writer = csv.writer(infile)
    writer.writerow(["brand", "product_name", "shipping", "price"])

    for container in page_soup.findAll("div",{"class":"item-container"}):

        brand = container.find(class_="item-brand").img.get("title")
        product_name = container.find("a", {"class":"item-title"}).get_text(strip=True).replace(",", "|")
        shipping = container.find("li", {"class":"price-ship"}).get_text(strip=True)
        price = container.find("li", {"class":"price-current"}).get_text(strip=True).replace("|", "")

        writer.writerow([brand,product_name,shipping,price])

VortixDev · Accepted Answer · 2018-11-01 05:02:39Z

1

You're getting the new lines and spam characters because that is the data you're getting back from BS4: it isn't a product of the writing process. This is because you're trying to get all the text in the list item, whilst there's a lot going on in there. Having a look at the page, if you'd rather just get the price, you can concatenate the text of the strong tag within the list with the text of the sup tag, e.g. price = price_container[0].find("strong").text + price_container[0].find("sup").text. That will ensure you're only picking out the data that you need.

answered Nov 1, 2018 at 5:02

VortixDev

1,0331 gold badge14 silver badges26 bronze badges

Collectives™ on Stack Overflow

Python Web Scraper issue

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related