6

I have a table in Postgres with ~6M rows and want to split and export them as multiple CSV files. Is there a way to automatically generate a different CSV file based on the value of a row's column?

4
  • 1
    Are you using COPY for this? Have you read the documentation? Commented Sep 8, 2014 at 18:42
  • I have seen COPY but unfortunately it's still going to take some more commands for deciding splits based on a column's value Commented Sep 9, 2014 at 0:46
  • @Paul COPY (SELECT ...) ? Commented Sep 9, 2014 at 1:24
  • @CraigRinger yup, but there's a lot of different column values so it would be kind of difficult to manually create the copy statements Commented Sep 9, 2014 at 5:55

2 Answers 2

8

In general, you want COPY (SELECT ...).

One option is to use PL/PgSQL and EXECUTE. Something like:

DO
LANGUAGE plpgsql
$$
DECLARE
    colval integer;
BEGIN
    FOR colval IN SELECT DISTINCT thecol FROM thetable
    LOOP
       EXECUTE format('COPY (SELECT * FROM thetable WHERE colval = %L) TO ''/tmp/out-%s.csv'';', colval, colval);
    END LOOP;
END;
$$

Another is to write a script using psql and \copy.

Another would be to use your preferred client language and its support for COPY, e.g. PgJDBC's CopyManager, Python+psycopg2's copy_to, etc.

Update: I just realised it's simpler than this. ORDER BY the target column and split the file stream as you process it. Example with psql, bash and awk:

CREATE TABLE demo(
  id serial primary key,
  targetcol integer not null
);

-- Create 10 distinct values for targetcol with 100 entries each
insert into demo(targetcol)
select x
from generate_series(1,10) x cross join generate_series(1,100) y;

then take column $2 to be part of the filename, switching files to output records:

psql -At -c '\copy (SELECT * FROM demo ORDER BY targetcol) TO stdout' | \
awk '
  BEGIN {
    prev_col=0;
    cur_file="";
  } 
  {
    if ($2 != prev_col) {
      prev_col = $2;
      if (cur_file != "") {
        close(cur_file);
      }
      cur_file = sprintf("outfile-%d",$2);
      printf "" > cur_file;
    } 
    print $0 >> cur_file; 
  }
';

In fact, this doesn't even require sorted input if you don't mind it being a bit slower and possibly running out of max open files if there are lots of values of the target column:

psql -At -c '\copy demo TO stdout' | \
awk '
  BEGIN {
    cur_file="";
  } 
  {
    print $0 >> sprintf("outfile-%d",$2); 
  }
';
Sign up to request clarification or add additional context in comments.

1 Comment

@Paul Edited to provide a complete solution using bash and awk. Really simple once I thought about it, and it should be very fast too.
6

There are certainly a few ways to do this. I can't think of a way to do it in a single command automatically. I don't know what your operating system is, or if you might want to do this in a stored procedure, or ? If I was going to do this, quick and dirty, from the command line, I'd:

$ # bash shell here.
$ for i in `psql -Upostgres -h HOSTIP -Atq DBNAME -c 'select distinct COLNAME from TABLENAME'`; do
$   echo 'working on ': $i
$   cmd="select * from TABLENAME where COLNAME = '$i'"
$   psql -Upostgres -h HOSTIP -Atq DBNAME -c "copy ( $cmd ) to stdout with delimiter ','" > /tmp/$i
$ done

You will need to supply: HOSTIP (or omit the -h HOSTIP if you connect correctly by default) DBNAME the database with the data in it TABLENAME the name of the table with 6MM rows COLNAME the name of the column which dictates the name of file to copy data to

The result is a bunch of files in the /tmp directory with comma separated pieces of the table contents.

This should give you some ideas. I guess the answer to your question is no, there is no 'automatic' way. Good Luck!

-g

1 Comment

Yeah, I guess it might not be that easy to do in pure SQL. I guess using stored procedures or another program would be the best way to go. Thanks for the answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.