0

I have a big script, the result is that the data is stored in a dataframe and then in csv. Then csv is opened and written to PostgreSQL. But there is a problem that the data type of one column is int4, and after opening csv the column format is 'text'. I cannot change the data type in the database, they must be there exactly as int. Tell me pls how to do it.

total.to_csv("C:/Users/.../total_type19.csv", index = False, sep =';')

conn5 = psycopg2.connect(dbname='', user='',
                       password='', host='', port = '')
cursor5 = conn5.cursor()


with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
    reader = csv.reader(file, delimiter = ";")
    for row in reader:
        # print(row)
        cursor5.execute(
            'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
            (row[0], row[1], row[2], row[3], row[4], row[5]))

conn5.commit()
   

The test_id column must be in int4 format

['312229', "['[{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3},{from:[5,8],to:[9,8],color:5},{from:[5,11],to:[10,11],color:6},{from:[1,0],to:[1,11],color:0},{from:[10,1],to:[10,6],color:4},{from:[3,0],to:[8,0],color:1}],']", '[\'["v","b","c","c","a","h","i","e","r","s","f","j"],["d","i","w","s","s","r","i","f","y","y","f","c"],["j","b","m","w","d","q","s","q","t","w","e","m"],["x","l","m","m","l","s","o","x","d","q","u","t"],["l","i","f","p","l","a","c","e","t","u","t","o"],["m","o","s","b","r","t","c","y","z","v","r","r"],["j","t","x","c","a","r","t","a","b","l","e","o"],["b","h","k","m","d","b","r","y","q","u","i","y"],["y","è","s","r","h","g","o","m","m","e","w","h"],["u","q","p","c","s","c","x","b","k","e","d","o"],["u","u","o","l","q","v","y","y","b","y","e","h"],["r","e","o","u","j","b","u","r","e","a","u","k"]],\']', '[\'"#ff0000","#00fe00","#0000ff","#d2ea9a","#407f76","#211f95","#e1f233"\']', '[\'"place","cartable","gomme","bureau","bibliothèque","feutre","cahier"\']']

This is an example of one line from csv. Looks bad but that's the way it should be

5
  • The code you posted doesn't change the data type from a single column in the PostgreSQL table. Make sure all your columns use the correct data type and you will be fine. Off topic: Why don't you use copy_from() to load the csv into the table? Much faster Commented Jan 19, 2023 at 13:46
  • Do you mean that the csv data comes in as string or the data will not pass test_id = int(row[0]). Post a sample data row. Commented Jan 19, 2023 at 13:53
  • Thanks, but I'm pretty sure the data type of the column is not int. Tell me, can I somehow prescribe that the data is loaded in the desired format? Copy_from () just never used, but this method is clear to me Commented Jan 19, 2023 at 13:53
  • When trying to do int(row[0]), I get the error ValueError: invalid literal for int() with base 10: 'test_id'. Now I will post an example from csv Commented Jan 19, 2023 at 13:57
  • A CSV file is a text format so that is the only type you will get from it. Having an integer value as text is not an issue as it will be automatically cast to integer on entry per; select '312229'::integer; 312229. The exceptions would be empty strings or strings with non-numeric characters. Commented Jan 19, 2023 at 16:06

2 Answers 2

1

Can you change your data to int or is it something like "m22" non-integer?

# to remove non-numeric digits from string
with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
    reader = csv.reader(file, delimiter = ";")
    header = next(reader )
    print(f"HEADER {header}")
    counter = 1 #or whatever number you want to start with
    for row in reader:
        print(row)
        test_id =row[0]
        test_id = ''.join([i for i in test_id if i.isdigit()])
        if test_id == '':
            counter +=1
            test_id = counter
        else:
            test_id = int(test_id)
        print(test_id)
        cursor5.execute(
            'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
            (test_id, row[1], row[2], row[3], row[4], row[5]))
Sign up to request clarification or add additional context in comments.

9 Comments

When I add your code, I get the error "invalid literal for int() with base 10:"
does it print your row? Can you post what it prints?
You edited and the code works, thank you very much
ok we can skip the headers. I'll post
The 'header = next(reader )' is a good trick to remember :)
|
1

Use copy_expert from `psycopg2.

import psycopg2

conn5 = psycopg2.connect(dbname='', user='',
                       password='', host='', port = '')
cursor5 = conn5.cursor()

with open("C:/Users/.../total_type19.csv", "r") as csv_file:
   cursor5.copy_expert("COPY interaction_fillword FROM STDIN WITH CSV HEADER", csv_file)

The CSV HEADER will do a couple of things:

  1. Skip the header line automatically.
  2. Take empty non-quoted strings as NULL.

copy_expert uses the Postgres COPY to do bulk data import(or export) a lot quicker then inserting. The down side is that COPY is all or nothing, either the entire import/export succeeds or a single error will rollback the entire thing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.