How to null missing columns when importing CSV

Question

I have a very large(200+GB), incorrectly formatted CSV file. Some columns have fewer values than the other columns, i.e.

col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9

Obviously, when I try to import this into postgres it will throw an error about missing data for columns. I would like to avoid fixing this CSV file, as it is very large and will likely take quite a bit of time. How do I get the importer to simply insert null values for the missing data instead of throwing an error?

You can't, this val2 is seen as a row with a single column. To get it to work it would need to be val2,,,,. — Adrian Klaver
– Adrian Klaver, Commented Mar 26, 2022 at 22:36

wildplasser · Accepted Answer · 2022-03-28 12:12:40Z

1

You could use awk to edit the .csv file.

#!/bin/sh

cat - <<OMG > omg.csv
col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9
OMG

awk -F, '{printf($0); for (i=NF;i<4;i++) {printf(",");} printf("\n"); }' < omg.csv # >out.csv

Result:

$ sh awk.sh
col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9
$

answered Mar 28, 2022 at 12:12

wildplasser

44.5k9 gold badges72 silver badges116 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to null missing columns when importing CSV

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related