0

I have a html scraper which pulls titles of items and prices from a website. Once in a while i want to run this scraper to update my prices, doing so i also wish to keep the old ones.

my csv where i save the titles and prices in for the first time looks like this:

Title1, price1, 'END'
Title2, price2, 'END'
Title3, price3, 'END'

I compare the new price against the old with the following method:

ind = row.index('END')
lastprijs = row[ind-1]
print lastprijs
if lastprijs != prijstitel:
    row.pop(ind)
    row.append(prijstitel)
    row.append("END")

if a value has been found (and set) i update the csv with

with open('out.csv', 'a') as out:
    tester = csv.writer(out)
    tester.writerow(row)

if no value has been found i update the csv with the same row:

else: 
    with open('out.csv', 'a') as out:
        tester = csv.writer(out)
        row.append("addedddd")   #add a new line.
        tester.writerow(row)

However the output of my csv is as following after running it:

Item1, price1, 'END'
Item2, price2, 'END'
Item3, price3, 'END'
Item1, price1, 'END'
item1, price1, 'END'
Item2, price2, 'END'
Item3, price3, 'END'
Item1, price1, 'END'
item1, price1, 'END'
Item2, price2, 'END'
Item2, price2, 'END'
Item3, price3, 'END'
Item1, price1, 'END'

And so on... How can i fix this?

** THE FULL CODE **

def updateprices(prijstitel, titelprijs):
  with open('pricewatch.csv', 'r') as csvfileadjust:
    filereader = csv.reader(csvfileadjust)
    print titelprijs
    if titelprijs == "Gembird externe Hardeschijf behuizing met USB 2.0 aansluiting":
        prijstitel = 'EDITED PRIJS!'
    for row in filereader:
        header = row
        print header
        print " ---- "
        if titelprijs in row:
            ind = row.index('END')
            lastprijs = row[ind-1]
            print lastprijs
            if lastprijs != prijstitel:
                row.pop(ind)
                row.append(prijstitel)
                row.append("END")
            with open('out.csv', 'a') as out:
                tester = csv.writer(out)
                tester.writerow(row)
        else: 
            with open('out.csv', 'a') as out:
                tester = csv.writer(out)
                row.append("addedddd")   #add a new line.
                tester.writerow(row)
3
  • I don't know what the specific problem is. But you might benefit from rewriting your code into smaller functions, possibly wrapping the CSV file with a class, and making each item a class. That way you can read the CSV data into Item instances, then scrape the site, updating the Items as yuo go, then finally write all the items back to a new CSV file. You'll probably find that the code is much easier to debug and understand, and will make it easier to modify and improve in the future. You could even add units tests for it! Commented Dec 14, 2014 at 23:06
  • I think it has to do with the fact that if i print row for row in filereader: print row print ---- it prints multiple rows before continuing to the if statement.. it is a weird thing tho. Commented Dec 15, 2014 at 14:16
  • The problem is in the fact that the For loop runs through the same file, so it updates the file each time i run the script. I need to read specific lines only! Commented Dec 16, 2014 at 12:34

1 Answer 1

1

You were right with your problem,

This fixes your problem

x=0
while x < 20:  Or length of csv file.
   updateprices("10,00","TITELS", x)
   x+=1

def updateprices(prijstitel, titelprijs, var):
  with open(csvfile, 'r') as csvfileadjust:  #open the file
    filereader = csv.reader(csvfileadjust)
    row = list(islice(filereader,var+1))[-1]  #get all lines till var+1
    if titelprijs in row:  #if the title is in the row
        ind = row.index('END')  #search for END in that List
        lastprijs = row[ind-1] 
        print lastprijs
        if lastprijs != prijstitel:  #if lastprijs is not equal to prijstitel ( 9,99 != 10,00)
            row.pop(ind)  #drop the last item in list ("END")
            row.append(prijstitel)  #add the new prijstitel (10,00)
            row.append("END")  
        with open('out.csv', 'a') as out:
            tester = csv.writer(out)  
            tester.writerow(row)   #Write to csv file
    else:   #if the title is not in the row
        with open('out.csv', 'a') as out:
            tester = csv.writer(out)
            tester.writerow(row)  #write (or copy) the line
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.