postgres: get random entries from table - too slow

Question

In my postgres database, I have the following relationships (simplified for the sake of this question):

Objects (currently has about 250,000 records)
-------
n_id
n_store_object_id (references store.n_id, 1-to-1 relationship, some objects don't have store records)
n_media_id (references media.n_id, 1-to-1 relationship, some objects don't have media records)

Store (currently has about 100,000 records)
-----
n_id
t_name,
t_description,
n_status,
t_tag

Media
-----
n_id
t_media_path

So far, so good. When I need to query the data, I run this (note the limit 2 at the end, as part of the requirement):

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
limit
    2

This works fine and gives me two entries back, as expected. The execution time on this is about 20 ms - just fine.

Now I need to get 2 random entries every time the query runs. I thought I'd add order by random(), like so:

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
order by
    random()
limit
    2

While this gives the right results, the execution time is now about 2,500 ms (over 2 seconds). This is clearly not acceptable, as it's one of a number of queries to be run to get data for a page in a web app.

So, the question is: how can I get random entries, as above, but still keep the execution time within some reasonable amount of time (i.e. under 100 ms is acceptable for my purpose)?

Random means you might get the same two rows several times in succession. Is that really what you want? — Mike Sherrill 'Cat Recall'
– Mike Sherrill 'Cat Recall', Commented Nov 28, 2011 at 18:13
@Catcall, according to the use of order by random() — not. — Michael Krelin - hacker
– Michael Krelin - hacker, Commented Nov 28, 2011 at 18:15
Take a look at this post. [stackoverflow.com/questions/5297396/… [1]: stackoverflow.com/questions/5297396/… — Kuberchaun
– Kuberchaun, Commented Nov 28, 2011 at 18:16

Michael Krelin - hacker · Accepted Answer · 2011-11-28 18:12:56Z

3

Of course it needs to sort the whole thing according to random criteria before getting first rows. Maybe you can work around by using random() in offset instead?

answered Nov 28, 2011 at 18:12

Michael Krelin - hacker

144k25 gold badges200 silver badges177 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Aleks G Over a year ago

In order to use offset I would need to know the total number of records in the resultset - right? However just doing select count(1) from the group still takes over 1 second.

Michael Krelin - hacker Over a year ago

Yes, you may have it cached or something… Also, you can combine Alex's approach with this. Well, that's the best I could come up with ;)

Aleks G Over a year ago

No worries, I'm not complaining. Caching the count is not an easy option, as it's one of the transactional table, so the data there changes often (usually inserts and updates though, very few deletes).

Michael Krelin - hacker Over a year ago

I just assumed that since you need randomness, being a bit out of sync is not awfully bad (as long as you don't delete rows).

Aleks G Over a year ago

Ok, having gone through various iterations and testing of different approaches, I've got something that kind of, loosely based on Alex's answer. Thanks for your help

Alex Howansky · Accepted Answer · 2011-11-28 18:17:59Z

0

I'm thinking you'll be better off selecting random objects first, then performing the join to those objects after they're selected. I.e., query once to select random objects, then query again to join just those objects that were selected.

answered Nov 28, 2011 at 18:17

Alex Howansky

54k8 gold badges85 silver badges102 bronze badges

2 Comments

Aleks G Over a year ago

Thanks for the suggestion. Just tried it - better, but still not good enough: about 600 ms. Thanks anyway, though.

Aleks G Over a year ago

Ok, having gone through several iterations, etc, I finally got something that kind of works. As it's loosely based on your suggestion, I'll accept your answer.

Mike Christensen · Accepted Answer · 2011-11-28 19:02:14Z

It seems like your problem is this: You have a table with 250,000 rows and need two random rows. Thus, you have to generate 250,000 random numbers and then sort the rows by their numbers. Two seconds to do this seems pretty fast to me.

The only real way to speed up the selection is not have to come up with 250,000 random numbers, but instead lookup rows through an index.

I think you'd have to change the table schema to optimize for this case. How about something like:

1) Create a new column with a sequence starting at 1.
2) Every row will then have a number.
3) Create an index on: number % 1000
4) Query for rows where number % 1000 is equal to a random number between 0 and 999 (this should hit the index and load a random portion of your database)
5) You can probably then add on RANDOM() to your ORDER BY clause and it will then just sort that chunk of your database and be 1,000x faster.
6) Then select the first two of those rows.

If this still isn't random enough (since rows will always be paired having the same "hash"), you could probably do a union of two random rows, or have an OR clause in the query and generate two random keys.

Hopefully something along these lines could be very fast and decently random.

Collectives™ on Stack Overflow

postgres: get random entries from table - too slow

3 Answers 3

5 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related