Moving data from a table to another in postgresql

Question

I have 2 tables on my database. They both have more than 16m records, they have the same uuid for relation (I have indexes for both uuid fields). One of them is like 166GB and the other one is around 50GB. I'll change table names on my question but I hope you will get the question.

Let's say my first table is called users and the second one is profile. Now I have a field on my users table and I want to copy it to my profile table.

I've done something last night but it's still processing and been more than 10 hours already.

I have 3 questions now. First question; are my queries ok?

ALTER TABLE profiles ADD COLUMN start_stamp TIMESTAMP DEFAULT NOW();
SET start_stamp = (SELECT start_stamp::DATE FROM users WHERE uuid = profiles.uuid);
CREATE INDEX start_stamp ON profiles;

And the second question; is there any difference between these two queries? If yes, whats the difference and which one is better?

UPDATE profiles 
SET start_stamp = (SELECT start_stamp::DATE FROM users WHERE uuid = profiles.uuid);

QUERY PLAN
--------------------------------------------------------------------------
Update on profiles  (cost=0.00..159956638.61 rows=18491638 width=116)
->  Seq Scan on profiles  (cost=0.00..159956638.61 rows=18491638 width=116)
     SubPlan 1
       ->  Index Scan using unique_user_uuid on users  (cost=0.56..8.58 rows=1 width=20)
             Index Cond: ((uuid)::text = (profiles.uuid)::text)




UPDATE profile
SET start_stamp = users.start_stamp
FROM users
WHERE profiles.start_stamp = users.start_stamp;

QUERY PLAN
--------------------------------------------------------------------------
Update on profiles  (cost=2766854.25..5282948.42 rows=11913522 width=142)
->  Hash Join  (cost=2766854.25..5282948.42 rows=11913522 width=142)
     Hash Cond: ((profiles.uuid)::text = (users.uuid)::text)
     ->  Seq Scan on profiles  (cost=0.00..1205927.56 rows=18491656 width=116)
     ->  Hash  (cost=2489957.22..2489957.22 rows=11913522 width=63)
           ->  Seq Scan on users  (cost=0.00..2489957.22 rows=11913522 width=63)

And my final question is; is there a better way to copy a value from a table to another with more than 16m and 200gb records?

Thanks.

You've really asked three questions here. For the two updates, both are logically identical, but I'm not sure if the first version would even run on Postgres. The second version is the standard update join syntax. In terms of performance, you may check the execution plans of both updates, assuming both run. — Tim Biegeleisen
– Tim Biegeleisen, Commented Oct 16, 2018 at 6:26
I assume the mismatching WHERE condition in the latter one is a mistype and not actually meant that way — Sami Kuhmonen
– Sami Kuhmonen, Commented Oct 16, 2018 at 6:26
@a_horse_with_no_name Bad choice of words. I should have said something like "the typical way to do an update join in Postgres syntax." Does that work better? Yes, second one completely ANSI SQL. — Tim Biegeleisen
– Tim Biegeleisen, Commented Oct 16, 2018 at 6:31
Yes, uuid field is unique for each record. But every profile and user has the same uuid on different tables. But the point is that query s still working since last night and i dont know how long will it take more. That was the main reason to ask here. Can i do it faster or should i wait for it? — htunc
– htunc, Commented Oct 16, 2018 at 6:33

Grzegorz Grabek · Accepted Answer · 2018-10-18 17:51:17Z

1

The fastest way to update/copy huge amount of data is CTAS (create table as select). It is only possible if you have rights to do so and you can change names or drop original table.

In your case it would be like this:

create table tmp_profiles as
select p.* , us.strat_stamp:date
 from profiles p
 left join users u on p.uuid = us.uuid;

drop table profiles;

alter table tmp_profiles, rename to profiles;

After that you have to recreate your keys, indexes, and other constraints.

If you update more then 5% of records in your table then CTAS will be at least few times faster then regular update. Below that threshold update can be faster then CTAS.

answered Oct 18, 2018 at 17:51

Grzegorz Grabek

1,0007 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

htunc Over a year ago

i've tried to do that, it's been 4 hours and still processsing :/

Grzegorz Grabek Over a year ago

This is the fastest way. I doubt if you will find any faster.

Grzegorz Grabek Over a year ago

Do you have HDD or SATA disc? If HDD it will take a long time to rewrite over 166 GB.

Shankar · Accepted Answer · 2018-10-18 20:06:33Z

0

Both of your queries are the same. It will take forever to update. This is a well-known problem adding NOT NULL COLUMN to the bigger table

Sol1: Updating defaults in chunks, Running multiple queries to update the date Sol2: Recreate the entire table

Useful Links for the large number of rows in Postgres: https://medium.com/doctolib-engineering/adding-a-not-null-constraint-on-pg-faster-with-minimal-locking-38b2c00c4d1c

https://dba.stackexchange.com/questions/52517/best-way-to-populate-a-new-column-in-a-large-table/52531#52531

https://dba.stackexchange.com/questions/41059/optimizing-bulk-update-performance-in-postgresql

answered Oct 18, 2018 at 20:06

Shankar

87610 silver badges24 bronze badges

2 Comments

htunc Over a year ago

i've tried to create the entire table with create as select, it's been 4 hours and still processing :/

Shankar Over a year ago

@htunc Have you locked the table before inserting because there are other oeprations can block this . Please follow this dba.stackexchange.com/questions/52517/…

Collectives™ on Stack Overflow

Moving data from a table to another in postgresql

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related