Create new headers when writing a csv using python

Question

I´m web scraping different webpages and for each webpage I´m writing each row of the csv file

import csv
fieldnames=["Title", "Author", "year"]
counter=1
for webpage of webpages:
    if counter==1:
        f = open('file.csv', 'wb')  
        my_writer = csv.DictWriter(f, fieldnames)
        my_writer.writeheader()
        f.close()

    something where I get the information (title, author and year) for each webpage

    variables={ele:"NA" for ele in fieldnames}
    variables['Title']=title        
    variables['Author']=author
    variables['year']=year


    with open('file.csv', 'a+b') as f:
    header = next(csv.reader(f))
    dict_writer = csv.DictWriter(f, header)
    dict_writer.writerow(variables) 
    counter+=1

However, there could be more than one author (so author after web scraping is actually a list) so I would like to have in the headers of the csv file: author1, author2, author3, etc. But I don't know what would be the maximum number of authors. So in the loop I would like to edit the header and start adding author2,author3 etc depending if in that row is necessary to create more authors.

after you write headers you can't overwrite them. You can keep all data in memory and write everythink when you get all data. Or write all data in file and at the end create new file, write headers and copy/add data from file without headers. And then you can also add empty values to rows which didn't have some authors (to create correctly formatted CSV). — furas
– furas, Commented Oct 14, 2016 at 18:58

Dave · Accepted Answer · 2016-10-14 19:00:53Z

1

Because "Author" is a variable-length list, you should serialize it in some way to fit inside a single field. For example, use a semicolon as a separator.

Assuming you have an authors field with all the authors in them from your webpage object, you would want to change your assignment line to something like this:

variables['Authors']=';'.join(webpage.authors)

This is a simple serialization of all of the authors. You can of course come up with something else - use a different separator or serialize to JSON or YAML or something more elaborate like that.

Hopefully that gives some ideas.

answered Oct 14, 2016 at 19:00

Dave

3,60034 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DisappointedByUnaccountableMod · Accepted Answer · 2021-02-10 11:53:10Z

It could be something like:

def write_to_csv(file_name, records, fieldnames=None):

    import csv
    from datetime import datetime

    with open('/tmp/' + file_name, 'w') as csvfile:
        if not fieldnames:
            fieldnames = records[0].keys()
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames,   extrasaction='ignore')
        writer.writeheader()
        for row in records:
            writer.writerow(row)

def scrape():
    for webpage of webpages:
        webpage_data = [{'title':'','author1':'foo','author2':'bar'}] #sample data
        write_to_csv(webpage[0].title+'csv', webpage_data,webpage_data[0].keys())

I`m assuming:

Data will be consistent for the same webpage, but differ the next webpage in loop
webpage data is a list of dictionaries, having values mapped to keys
the above code is based on Python 3

So in the loop, we`ll just get the data, and pass the relevant fieldnames and the values to another function, so be able to write it to csv.

Collectives™ on Stack Overflow

Create new headers when writing a csv using python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related