How to change column datatype when loading csv in postgresql

Question

I have a big script, the result is that the data is stored in a dataframe and then in csv. Then csv is opened and written to PostgreSQL. But there is a problem that the data type of one column is int4, and after opening csv the column format is 'text'. I cannot change the data type in the database, they must be there exactly as int. Tell me pls how to do it.

total.to_csv("C:/Users/.../total_type19.csv", index = False, sep =';')

conn5 = psycopg2.connect(dbname='', user='',
                       password='', host='', port = '')
cursor5 = conn5.cursor()


with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
    reader = csv.reader(file, delimiter = ";")
    for row in reader:
        # print(row)
        cursor5.execute(
            'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
            (row[0], row[1], row[2], row[3], row[4], row[5]))

conn5.commit()

The test_id column must be in int4 format

['312229', "['[{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3},{from:[5,8],to:[9,8],color:5},{from:[5,11],to:[10,11],color:6},{from:[1,0],to:[1,11],color:0},{from:[10,1],to:[10,6],color:4},{from:[3,0],to:[8,0],color:1}],']", '[\'["v","b","c","c","a","h","i","e","r","s","f","j"],["d","i","w","s","s","r","i","f","y","y","f","c"],["j","b","m","w","d","q","s","q","t","w","e","m"],["x","l","m","m","l","s","o","x","d","q","u","t"],["l","i","f","p","l","a","c","e","t","u","t","o"],["m","o","s","b","r","t","c","y","z","v","r","r"],["j","t","x","c","a","r","t","a","b","l","e","o"],["b","h","k","m","d","b","r","y","q","u","i","y"],["y","è","s","r","h","g","o","m","m","e","w","h"],["u","q","p","c","s","c","x","b","k","e","d","o"],["u","u","o","l","q","v","y","y","b","y","e","h"],["r","e","o","u","j","b","u","r","e","a","u","k"]],\']', '[\'"#ff0000","#00fe00","#0000ff","#d2ea9a","#407f76","#211f95","#e1f233"\']', '[\'"place","cartable","gomme","bureau","bibliothèque","feutre","cahier"\']']

This is an example of one line from csv. Looks bad but that's the way it should be

The code you posted doesn't change the data type from a single column in the PostgreSQL table. Make sure all your columns use the correct data type and you will be fine. Off topic: Why don't you use copy_from() to load the csv into the table? Much faster — Frank Heikens
– Frank Heikens, Commented Jan 19, 2023 at 13:46
Do you mean that the csv data comes in as string or the data will not pass test_id = int(row[0]). Post a sample data row. — Cary H
– Cary H, Commented Jan 19, 2023 at 13:53
Thanks, but I'm pretty sure the data type of the column is not int. Tell me, can I somehow prescribe that the data is loaded in the desired format? Copy_from () just never used, but this method is clear to me — skrtsway
– skrtsway, Commented Jan 19, 2023 at 13:53
When trying to do int(row[0]), I get the error ValueError: invalid literal for int() with base 10: 'test_id'. Now I will post an example from csv — skrtsway
– skrtsway, Commented Jan 19, 2023 at 13:57
A CSV file is a text format so that is the only type you will get from it. Having an integer value as text is not an issue as it will be automatically cast to integer on entry per; select '312229'::integer; 312229. The exceptions would be empty strings or strings with non-numeric characters. — Adrian Klaver
– Adrian Klaver, Commented Jan 19, 2023 at 16:06

Cary H · Accepted Answer · 2023-01-19 14:38:53Z

1

Can you change your data to int or is it something like "m22" non-integer?

# to remove non-numeric digits from string
with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
    reader = csv.reader(file, delimiter = ";")
    header = next(reader )
    print(f"HEADER {header}")
    counter = 1 #or whatever number you want to start with
    for row in reader:
        print(row)
        test_id =row[0]
        test_id = ''.join([i for i in test_id if i.isdigit()])
        if test_id == '':
            counter +=1
            test_id = counter
        else:
            test_id = int(test_id)
        print(test_id)
        cursor5.execute(
            'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
            (test_id, row[1], row[2], row[3], row[4], row[5]))

edited Jan 19, 2023 at 14:38

answered Jan 19, 2023 at 13:56

Cary H

1779 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

skrtsway Over a year ago

When I add your code, I get the error "invalid literal for int() with base 10:"

Cary H Over a year ago

does it print your row? Can you post what it prints?

skrtsway Over a year ago

You edited and the code works, thank you very much

Cary H Over a year ago

ok we can skip the headers. I'll post

Cary H Over a year ago

The 'header = next(reader )' is a good trick to remember :)

|

Adrian Klaver · Accepted Answer · 2023-01-19 16:17:30Z

1

Use copy_expert from `psycopg2.

import psycopg2

conn5 = psycopg2.connect(dbname='', user='',
                       password='', host='', port = '')
cursor5 = conn5.cursor()

with open("C:/Users/.../total_type19.csv", "r") as csv_file:
   cursor5.copy_expert("COPY interaction_fillword FROM STDIN WITH CSV HEADER", csv_file)

The CSV HEADER will do a couple of things:

Skip the header line automatically.
Take empty non-quoted strings as NULL.

copy_expert uses the Postgres COPY to do bulk data import(or export) a lot quicker then inserting. The down side is that COPY is all or nothing, either the entire import/export succeeds or a single error will rollback the entire thing.

answered Jan 19, 2023 at 16:17

Adrian Klaver

20.4k3 gold badges24 silver badges40 bronze badges

Collectives™ on Stack Overflow

How to change column datatype when loading csv in postgresql

2 Answers 2

9 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related