2

I'm experimenting with using the pg_bulkload project to import millions of rows of data into a database. However, none of the new rows have a primary key and only two of several columns are avalable in my input file. How do I tell pg_bulkload which columns I'm importing and how do I generate the primary key field? Do I need to edit my import file to match exactly what the output of a COPY command would be and generate the id field myself?

For example, lets say my database columns might be:

id         title        body        published

The data that I have is limited to title and published and are listed in a tab delimited file. My .ctl file looks like this:

TABLE = posts
INFILE = stdin
TYPE = CSV
DELIMITER = "   "

1 Answer 1

4

You can use FILTER functionality of pg_loader. Something like:

In database

CREATE FUNCTION pg_bulkload_filter(text, text) RETURNS record
AS $$
  SELECT nextval('tablename_id_seq'), NULL, NULL, $1, $2, NULL
$$ LANGUAGE SQL;

And in pg_bulkload control file:

FILTER = pg_bulkload_filter

Sign up to request clarification or add additional context in comments.

1 Comment

This does the trick. Looking back, it is in the documentation but it isn't too clear. Also, I had to cast everything, even the NULL values, to the appropriate types. Thanks for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.