PostgreSQL COPY FROM csv - csv formating issues

Question

I have a csv file that I'm trying to import into my PostgreSQL database (v.10). I'm using the following basic SQL syntax:

COPY table (col_1, col_2, col_3)
FROM '/filename.csv'
DELIMITER ',' CSV HEADER
QUOTE '"'
ESCAPE '\';

First 30,000 lines or so are imported without any problem. But then I start bumping into formatting issues in the csv file that break the import:

Double quotes in double quotes: "value_1",""value_2"","value_3" or "value_1","val"ue_2","value_3"

The typical error I get is

ERROR: extra data after last expected column

So I started editing the csv file manually using Vim (the csv file has close to 7 million lines so can't really think of another desktop tool to use)

Is there anything I can do with my SQL syntax to handle those malformed strings? Using alternative ESCAPE clauses? Using regex?
Can you think of a way to handle those formatting issues in Vim or using another tool or function?

Thanks a lot!

Community · Accepted Answer · 2021-10-07 11:32:16Z

1

Note that the file does not meet the CSV specification:

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

You should specify a quote sign other than double-quote, for example '|':

create table test(a text, b text, c text);

copy test from '/data/example.csv' (format csv, quote '|');

select * from test;

     a     |      b      |     c     
-----------+-------------+-----------
 "value_1" | ""value_2"" | "value_3"
 "value_1" | "val"ue_2"  | "value_3"
(2 rows)

You can get rid of the unwanted double-quotes using the trim() or replace() functions, e.g.:

update test
set a = trim(a, '"'), b = trim(b, '"'), c = trim(c, '"');

select * from test;

    a    |    b     |    c    
---------+----------+---------
 value_1 | value_2  | value_3
 value_1 | val"ue_2 | value_3
(2 rows)

edited Oct 7, 2021 at 11:32

CommunityBot

11 silver badge

answered Nov 6, 2017 at 20:23

klin

123k15 gold badges241 silver badges263 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

FinanceGardener Over a year ago

I completely agree that the file doesn't meet the CSV specification (it's compliant for probably 99.99% but the remaining 500 or so cases I fear I'll have to fix manually). Unfortunately, I only have this csv not the database behind so I cannot generate another export using other delimiters (pipe-delimited would be ideal). To complicate things, some of the double quotes in the fields are very legitimate too and would need escape characters before: "2' 10", 2' 50"" for GPS coordinates in one case or "Mark "the beast" Hogan" for a nickname or string in Hebrew that are too tough to edit

klin Over a year ago

You define the quote character in a COPY command and you asked about SQL syntax, so I believe you can do it in this way. You don't have to own a table to do that. The second query is only an example, you can easily write a query to remove only first and last characters in a string if they are double-quotes.

FinanceGardener Over a year ago

If I change the quote character to pipe at import, Postgres doesn't import anything. ERROR: Invalid input syntax for integer: "" 1 "" on my primary key column. There are no pipes at all in my original csv. I'm not sure this really solve the problem. I'd like to tell Postgres to ignore double quotes within double quotes.

klin Over a year ago

In fact, the command won't work well when the anomalies occur in columns of type other than text. Maybe you can create a temporary table to buffer the data?

klin Over a year ago

You could import the data to a temporary table with text columns and then insert the data from the temp table into the destination one with necessary corrections in a single query.

|

Collectives™ on Stack Overflow

PostgreSQL COPY FROM csv - csv formating issues

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related