1

enter image description hereI'm new to to programming and trying to learn by building some small side projects. I have this code and it is working but I am having an issue with it formatting correctly in csv when it pulls all the information. It started adding weird spaces after I added price to be pulled as well. if I comment out price and remove it from write it works fine but I can't figure out why I am getting weird spaces when I add it back.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=12&order=BESTMATCH"


# Opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()


#html parsing
page_soup = soup(page_html, "html.parser")


# grabs each products
containers = page_soup.findAll("div",{"class":"item-container"})


filename = "products.csv"
f = open(filename, "w")

headers = "brand, product_name, shipping\n"

f.write(headers)

for container in containers:
    brand = container.div.div.a.img["title"]

    title_container = container.findAll("a", {"class":"item-title"})
    product_name = title_container[0].text

    shipping_container = container.findAll("li", {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()

    price_container = container.findAll("li", {"class":"price-current"})
    price = price_container[0].text.strip()

    print("brand: " + brand)
    print("product_name: " + product_name)
    print("Price: " + price)
    print("shipping: " + shipping)


    f.write(brand + "," + product_name.replace(",", "|") + "," + shipping + "," + price + "\n")

f.close()
2
  • Would you like to show us the error? Commented Nov 1, 2018 at 4:18
  • I added a picture of the csv with price added and you can see in the picture for some reason its not adding a column for price and instead making spaces and move information around Commented Nov 1, 2018 at 4:24

2 Answers 2

4

You can write to a csv file like the way I've showed below. The output it produces should serve the purpose. Check out this documentation to get the clarity.

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

my_url = "https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=12&order=BESTMATCH"

page_html = urlopen(my_url).read()
page_soup = BeautifulSoup(page_html, "lxml")

with open("outputfile.csv","w",newline="") as infile:
    writer = csv.writer(infile)
    writer.writerow(["brand", "product_name", "shipping", "price"])

    for container in page_soup.findAll("div",{"class":"item-container"}):

        brand = container.find(class_="item-brand").img.get("title")
        product_name = container.find("a", {"class":"item-title"}).get_text(strip=True).replace(",", "|")
        shipping = container.find("li", {"class":"price-ship"}).get_text(strip=True)
        price = container.find("li", {"class":"price-current"}).get_text(strip=True).replace("|", "")

        writer.writerow([brand,product_name,shipping,price])
Sign up to request clarification or add additional context in comments.

Comments

1

You're getting the new lines and spam characters because that is the data you're getting back from BS4: it isn't a product of the writing process. This is because you're trying to get all the text in the list item, whilst there's a lot going on in there. Having a look at the page, if you'd rather just get the price, you can concatenate the text of the strong tag within the list with the text of the sup tag, e.g. price = price_container[0].find("strong").text + price_container[0].find("sup").text. That will ensure you're only picking out the data that you need.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.