I have a table with partitioning by day. This table stores data for one month and there are about 3 billion of them, but many partitions in the table are empty.
How can I optimally select exactly 5000 random records from the depth of the entire table?
I'm thinking about TABLESAMPLE SYSTEM (0.1) with limit, but if empty partitions are included in the sample, then there are fewer than 5000 records. The query select *, row_number() over (order by random()) takes a long time to execute and loads the cpu.
Nrandom records quickly, so long as said records are stored consecutively in a table (rather, in segments of the table). Does it count as random to you. I am asking because the records will not look chosen randomly if e.g. records are inserted in your table withCURRENT_TIMESTAMPor aSERIALin one of the columns. If you want to get random records from random segments, I am afraid it will inherently be slow anyway from reading the whole table from the disk.