Primary Key assignment in Postgres

Question

I have table tmp in my postgres database that contains roughly 139 million records. I am trying to move the the columns col1, col2, and col3 to col1, col2, and col3 of another tabled named r4a. I created the table r4a with this query:

CREATE TABLE r4a(
    gid serial NOT NULL,
    col1 double precision,
    col2 double precision,
    col3 double precision,
    the_geom geometry,
    CONSTRAINT r4a_pkey PRIMARY KEY (gid));

I created this insert into query to populate fields in r4a:

INSERT INTO r4a (col1,col2,col3)
SELECT col1, col2, col3
FROM tmp
limit 500;

It populates the gid [PK] serial column with numbers ranging from [14816024-14816523].

How does it determine which 500 records to limit the query too?
Is it choosing to import rows [14816024-14816523] or is it just arbitrarily assigning numbers?

Ideally I want the primary key to begin at 1 and count upwards. Being new to postgres and having such a large (in my opinion) table, I want to make sure I understand what is going on.

In most databases, using limit or top or something similar without specifying any order by clause will return a random set of rows. It might be that the rows affected are in some sort of order (usually in insertion order) but there is no guarantee. If you want a specific set of rows you have to specify it. I won't post this as an answer as I'm not familiar with the specifics of Postgresql, but I would bet that it applies to PG as well. — jpw
– jpw, Commented Jul 31, 2015 at 23:18
How could I query I change the query to only move the first 500 rows? — dubbbdan
– dubbbdan, Commented Jul 31, 2015 at 23:23
i don't know the mechanics of Postgresqls serial type so I'm afraid I can't help you. — jpw
– jpw, Commented Jul 31, 2015 at 23:28

IMSoP · Accepted Answer · 2015-07-31 23:42:35Z

The values chosen for the Serial column have nothing to do with the values chosen from the other table - although without an ORDER BY clause, those will be a completely arbitrary sample which happen to be easy to retrieve.

A Serial column is actually an Integer column with a default value defined which takes the next value from a special object called a Sequence. The Sequence is a transaction-independent counter which starts at 1, and is never rewound, even if a value is read and discarded.

So if your sequence value is that high, it's because you've requested that many values from it already - maybe in inserts that you later deleted, transactions that you rolled back, or statements that were aborted halfway through with an error.

You can manually reset the sequence with the setval() function; a useful recipe is setval(pg_get_serial_sequence('r4a', 'gid'), 1) But remember that this won't care what values have already been inserted into the table, so you'll get duplicate key errors if it generates an ID that's already there (repeating the insert will keep incrementing the sequence and eventually generate an ID which hasn't been used yet, but that's not something you'd want production code to rely on!).

Collectives™ on Stack Overflow

Primary Key assignment in Postgres

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related