0

Having somewhat a strange issue with python, not unsure if this is something to do with psycopg2 or some rookie error im making with python.

Essentially i have a function that copy's data from a csv and attempts to insert it into a pgsql database using psycopg2. If there is an a data type error i want the code to try and rectify it and then re-attempt to insert the data into the data base. here is the code:

def copy(self, csvFile):
    error = True
    i = 0
    while error:
        try:
            i += 1
            print(f'attempt {i}')
            self.connect()
            csr = self.conn.cursor()
            csr.copy_expert("COPY foo.bar FROM STDOUT NULL '' CSV HEADER", csvFile)
        except psycopg2.DataError as err:
            print(err)
            print(err.pgcode)
            csr.close()
            self.conn.close()
            #self.conn.rollback()
            if err.pgcode == '22001':
                if 'character varying' in err.args[0]:
                    currlength = re.search(r'\((.*?)\)', err.args[0]).group(1)
                    newlength = int(currlength) * 2
                    s = err.args[0].split()
                    col = s[s.index('column') + 1].replace(':','')
                    sql = f'alter table foo.bar alter column {col} type varchar({newlength})'
                    print(f'Column Length too short adjusting {col} from {currlength} to {newlength}\n {sql}')
                    self.execute(sql)
            elif err.pgcode == '22p02':
                s = err.args[0].split()
                col = s[s.index('column') + 1].replace(':', '')
                sql = f'alter table foo.bar alter column {col} varchar(64)'
                print(f'numeric column {col} contains text altering to varchar')
                self.execute(sql)
        else:
            self.conn.commit()
            csr.close()
            error = False

What happens is that the first try executes as expected and throws the error, then the alter table statement runs correctly, on the second attempt the copy_expert function does nothing but doesnt error and the code completes without inserting the csv data into the database. This is the output showing it tries for the second time.

> attempt 1 
> value too long for type character varying(1) CONTEXT:  COPY
table, line 3, column id: "12345678"
> 
> 22001 
> Column Length too short adjusting assetid from 1 to 2  
> alter table foo.bar alter column id type varchar(2) 
> Executing query alter table foo.bar alter column assetid type varchar(2)             
> attempt 2
> Download and insert of file.csv Complete
2
  • COPY foo.bar FROM STDOUT [...]: shouldn't that read COPY foo.bar FROM STDIN [...]? Commented Jan 23, 2019 at 7:25
  • its a good point @shmee but its not the problem. if the table doesn't need alteration. i.e. the exception isn't reached the data copy's to the database no problem Commented Jan 23, 2019 at 21:19

1 Answer 1

1

So, after having spent ~30 minutes enthusiastically walking down a blind alley, I think I found the cause of the issue. It has neither to do with psycopg2, nor would I necessarily call it a rookie mistake. I was actually quite convinced that it was about isolation levels ... it wasn't.

It's the file handle. The file is read completely by copy_expert so the internal pointer is at its end when the psycopg2.DataError raises. There's simply nothing left to read from that handle the second time around.

If you put csvFile.seek(0) in your except block, the pointer will be reset to the beginning of the file.

except psycopg2.DataError as err:
    csvFile.seek(0)

I have created a little test class using your copy method and implemented the methods execute and connect the way I assumed you did.
I was able to reproduce the behavior you describe in your post and resetting the pointer in the except block led to the data in the file being visible in the database after the second attempt, following the modification of the column length.

Sign up to request clarification or add additional context in comments.

2 Comments

Excellent. Interesting i was starting to assume that nothing was actually wrong with the code and that its was reading nothing, which i guess i kind of true. I have never actually seen the seek method within csv, but anyway cant thank you enough works a treat now
@Tik seek is a method of TextIOBase objects. An instance of a subclass of these is returned by open() when passing the path to a file to it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.