1

In my postgres database, I have the following relationships (simplified for the sake of this question):

Objects (currently has about 250,000 records)
-------
n_id
n_store_object_id (references store.n_id, 1-to-1 relationship, some objects don't have store records)
n_media_id (references media.n_id, 1-to-1 relationship, some objects don't have media records)

Store (currently has about 100,000 records)
-----
n_id
t_name,
t_description,
n_status,
t_tag

Media
-----
n_id
t_media_path

So far, so good. When I need to query the data, I run this (note the limit 2 at the end, as part of the requirement):

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
limit
    2

This works fine and gives me two entries back, as expected. The execution time on this is about 20 ms - just fine.

Now I need to get 2 random entries every time the query runs. I thought I'd add order by random(), like so:

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
order by
    random()
limit
    2

While this gives the right results, the execution time is now about 2,500 ms (over 2 seconds). This is clearly not acceptable, as it's one of a number of queries to be run to get data for a page in a web app.

So, the question is: how can I get random entries, as above, but still keep the execution time within some reasonable amount of time (i.e. under 100 ms is acceptable for my purpose)?

6
  • Random means you might get the same two rows several times in succession. Is that really what you want? Commented Nov 28, 2011 at 18:13
  • @Catcall, according to the use of order by random() — not. Commented Nov 28, 2011 at 18:15
  • Take a look at this post. [stackoverflow.com/questions/5297396/… [1]: stackoverflow.com/questions/5297396/… Commented Nov 28, 2011 at 18:16
  • @StarShip3000, oh, it has my answer! :) Commented Nov 28, 2011 at 18:25
  • @Michael Krelin - hacker Yep +1 Commented Nov 28, 2011 at 18:27

3 Answers 3

3

Of course it needs to sort the whole thing according to random criteria before getting first rows. Maybe you can work around by using random() in offset instead?

Sign up to request clarification or add additional context in comments.

5 Comments

In order to use offset I would need to know the total number of records in the resultset - right? However just doing select count(1) from the group still takes over 1 second.
Yes, you may have it cached or something… Also, you can combine Alex's approach with this. Well, that's the best I could come up with ;)
No worries, I'm not complaining. Caching the count is not an easy option, as it's one of the transactional table, so the data there changes often (usually inserts and updates though, very few deletes).
I just assumed that since you need randomness, being a bit out of sync is not awfully bad (as long as you don't delete rows).
Ok, having gone through various iterations and testing of different approaches, I've got something that kind of, loosely based on Alex's answer. Thanks for your help
0

I'm thinking you'll be better off selecting random objects first, then performing the join to those objects after they're selected. I.e., query once to select random objects, then query again to join just those objects that were selected.

2 Comments

Thanks for the suggestion. Just tried it - better, but still not good enough: about 600 ms. Thanks anyway, though.
Ok, having gone through several iterations, etc, I finally got something that kind of works. As it's loosely based on your suggestion, I'll accept your answer.
0

It seems like your problem is this: You have a table with 250,000 rows and need two random rows. Thus, you have to generate 250,000 random numbers and then sort the rows by their numbers. Two seconds to do this seems pretty fast to me.

The only real way to speed up the selection is not have to come up with 250,000 random numbers, but instead lookup rows through an index.

I think you'd have to change the table schema to optimize for this case. How about something like:

  • 1) Create a new column with a sequence starting at 1.
  • 2) Every row will then have a number.
  • 3) Create an index on: number % 1000
  • 4) Query for rows where number % 1000 is equal to a random number between 0 and 999 (this should hit the index and load a random portion of your database)
  • 5) You can probably then add on RANDOM() to your ORDER BY clause and it will then just sort that chunk of your database and be 1,000x faster.
  • 6) Then select the first two of those rows.

If this still isn't random enough (since rows will always be paired having the same "hash"), you could probably do a union of two random rows, or have an OR clause in the query and generate two random keys.

Hopefully something along these lines could be very fast and decently random.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.