Retrieve a fixed number of random records from a Postgres database table

I have a table with partitioning by day. This table stores data for one month and there are about 3 billion of them, but many partitions in the table are empty.

How can I optimally select exactly 5000 random records from the depth of the entire table?

I'm thinking about TABLESAMPLE SYSTEM (0.1) with limit, but if empty partitions are included in the sample, then there are fewer than 5000 records. The query select *, row_number() over (order by random()) takes a long time to execute and loads the cpu.

asked Jan 23, 2024 at 21:18

Violetta

6218 silver badges28 bronze badges

You can consider several strategies: 1) Use an improved TABLESAMPLE SYSTEM method within a loop to accumulate the desired number of records. 2) Sample a few rows from each non-empty partition and union the results, ensuring a more uniform distribution. 3) Employ a random offset method, selecting rows based on a list of unique random numbers, though this can be CPU-intensive. 4) Combine TABLESAMPLE with a random offset approach to first narrow down the dataset and then apply randomness, potentially offering a balance between efficiency and randomness.

TSCAmerica.com
– TSCAmerica.com

2024-01-23 21:37:17 +00:00
Commented Jan 23, 2024 at 21:37
Try other table sample methods.

Laurenz Albe
– Laurenz Albe

2024-01-24 08:21:32 +00:00
Commented Jan 24, 2024 at 8:21
It depends what exactly you call random records. There is an easy way to get exactly N random records quickly, so long as said records are stored consecutively in a table (rather, in segments of the table). Does it count as random to you. I am asking because the records will not look chosen randomly if e.g. records are inserted in your table with CURRENT_TIMESTAMP or a SERIAL in one of the columns. If you want to get random records from random segments, I am afraid it will inherently be slow anyway from reading the whole table from the disk.

Atmo
– Atmo

2024-01-30 15:07:54 +00:00
Commented Jan 30, 2024 at 15:07

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Retrieve a fixed number of random records from a Postgres database table

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest