0

I have a very large(200+GB), incorrectly formatted CSV file. Some columns have fewer values than the other columns, i.e.

col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9

Obviously, when I try to import this into postgres it will throw an error about missing data for columns. I would like to avoid fixing this CSV file, as it is very large and will likely take quite a bit of time. How do I get the importer to simply insert null values for the missing data instead of throwing an error?

2
  • 2
    You can't, this val2 is seen as a row with a single column. To get it to work it would need to be val2,,,,. Commented Mar 26, 2022 at 22:36
  • You could use awk to edit the .csv file. Commented Mar 28, 2022 at 11:50

1 Answer 1

1

You could use awk to edit the .csv file.


#!/bin/sh

cat - <<OMG > omg.csv
col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9
OMG

awk -F, '{printf($0); for (i=NF;i<4;i++) {printf(",");} printf("\n"); }' < omg.csv # >out.csv

Result:


$ sh awk.sh
col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9
$
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.