PostgreSQL COPY FROM Command Help

Question

I have CSV File which is quite large (few hundred MBs) which I am trying to import into Postgres Table, problem arise when there, is some primary key violation (duplicate record in CSV File)

If it has been one I could manually filter out those records, but these files are generated by a program which generate such data every hour. My script has to automatically import it to database.

My question is: Is there some way out that I can set a flag in COPY command or in Postgres so It can skip the duplicate records and continue importing file to table?

dawebber · Accepted Answer · 2011-04-29 13:48:17Z

3

My thought would be to approach this in two ways:

Use a utility that can help create an "exception report" of duplicate rows, such as this one during the COPY process.
Change your workflow by loading the data into a temp table first, massaging it for duplicates (maybe JOIN with your target table and mark all existing in the temp with a dup flag), and then only import the missing records and send the dups to an exception table.

I personally prefer the second approach, but that's a matter of specific workflow in your case.

answered Apr 29, 2011 at 13:48

dawebber

3,5631 gold badge18 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user330315 Over a year ago

I'd go for the second solution as well (or use a different tool to load the data)

Rajeev Over a year ago

I'd go for the second solution

Rajeev Over a year ago

I tried @dawebber's second approach but as the database size increases per import, further imports are delayed and import time increases as the table grows.

dawebber Over a year ago

@regexhacks, which part of the workflow is seeing a slowdown?

Collectives™ on Stack Overflow

PostgreSQL COPY FROM Command Help

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related