To get a single random record from the db, I'm currently doing:
User.all.sample
But when there are 100000+ users, it takes a few seconds to load them all, just to select one.
What's the simplest way to get load a single random user from db?
You can try following database independent query:
User.find(User.pluck(:id).sample)
[DEBUG] (36.5ms) SELECT `users`.`id` FROM `users`
[DEBUG] User Load (0.5ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 58229 LIMIT 1
this one fires two queries but this one is performance efficient as it took only 37ms to get single random user record.
whereas the following query will take around 624.7ms
User.order("RAND()").first
[DEBUG] User Load (624.7ms) SELECT `users`.* FROM `users` ORDER BY RAND() LIMIT 1
I have checked this for 105510 user records.
User.find(User.pluck(:id).sample) is great, not DB-type dependent. If anyone needs more than one record, you can add an argument to sample, like User.find(User.pluck(:id).sample(4))Using Postgresql or SQLite, using RANDOM():
User.order("RANDOM()").take
Presumably the same would work for MySQL with RAND()
User.order("RAND()").take
Note: From Rails 6.1, this won't run any more. Use .order(Arel.sql('RAND()')) instead (passing literals to .order is deemed unsafe and has been deprecated for quite some time now.)
Model.first(offset: rand(Model.count)) does a better job.You can find the maximum user id in the table and find a user given a random id limited to this maximum. Example:
max_id = User.order(id: :desc).limit(1).pluck(:id).first
user = User.find_by('id > ?', rand(max_id))
These two queries are extremely fast because you are using the index for the primary key (id).
rand() accepts an integerI decided to run some simple benchmarks, using PostgreSQL, Rails 7 and a table with 65k records. In my case I needed a random record after a scope.
Update: Now using Arel.sql('RANDOM()') instead of 'RANDOM()' to avoid deprecation warning mentioned by @MDickten.
TL;DR in order from best to worst
1) Model.scope.order(Arel.sql('RANDOM()')).first
2) Model.find(Model.scope.ids.sample)
3) Model.scope.sample
For a more detailed answer here's the snippet I ran
require 'benchmark'
N = 1000
Benchmark.bmbm do |x|
x.report('sample') { N.times { Model.scope.sample } }
x.report('find ids sample') { N.times { Model.find(Model.scope.ids.sample) } }
x.report('RANDOM()') { N.times { Model.scope.order(Arel.sql('RANDOM()')).first } }
end
And the results were
| user system total real
sample | 957.387427 11.710236 969.097663 (1054.653218)
find ids sample | 64.859209 3.164221 68.023430 (97.080131)
RANDOM() | 0.819056 0.086088 0.905144 (11.837224)
Arel.sql('RAND()').for rails 6
you can pass records count to get how many records you want
User.all.sample(1)
Above Query will Return only one random record of the user
SELECT "users".* FROM "users" - fetching all users into memory.TLDR: Scroll to the bottom of the answer - strategy #5 is what I use today.
I've actually implemented this in an app, where I randomly need to pull a record out of a growing list of currently 1_000_000 records (for business reasons).
Use DB RANDOM for PostgreSQL/SQLite or RAND for MySQL/MariaDB
User.order('RANDOM()').limit(1)
Pluck/Select IDs and then use Ruby Array#sample to get a random id. Then look up that record in DB
User.find(User.pluck(:id).sample)
Get min & man value of IDs in the DB, then rand(min..max) and do a find_by (in case you hit a deleted record) in DB for that record:
User.find_by(rand(1..User.count))
There is another strategy that I haven't seen other answers mention - Random Offset. Just make sure it's in the range of your existing records.
User.offset(rand(1..User.count)).limit(1)
Ultimately, I chose 3rd strategy - despite the cons, it is still faster on average for my use-case. Our deletes from that table are also rare, and we use soft deletes, that I can use in my error handling. However, since I came up with Strategy #4 - I've been using that sometimes. Works well.
Tips:
If the actual current count is not critical: You can preload User.count and memorize it as User.total_count (for example) or keep in an app-load config, that gets loaded on app restart, and you can either invalidate this caches or restart your app on a regular basis. This will remove the extra 15-20ms needed to get User.count and will speed up option 3 to under 3ms, and option 4 to about 100ms
def self.total_count
@total_count ||= count
end
This is another reason I used stratagem #3. Even hitting "missing" records, and having to error handle it and do additional lookups, until I find a "working" random records - is very cheap. In the time I can look up using other strategies, I can do 5-10 look-ups using this approach and still be ahead of the curve on speed.
Error handling can be done 2 ways:
find_by will return result or nilmax = User.maximum(:id)
ids = 5.times.map { rand(1..max) }
User.where(id: ids).sample
Error handling should be chosen, depending on how many "missing" records you have in a sequence. If you have a lot, then strategy #3 may not be a good choice.
EDIT - new strategy #5:
Adding 5th strategy. I've come up with after finishing this post, and I think this is the fastest, while being the most robust
Get min & man value of IDs in the DB, then rand(min..max)and do a where lookup:
User.published_today.random_record (Just keep in mind that it's essentially reimplementing an offset in a way, which may limit your data set for a lookup)offset, but offset will get slow er depending on how big is your data.# `random_id` Can be memoized / cached, but beware of scoping before caching!!!
random_id = rand(User.minumum(:id)..User.maximum(:id))
User.where("id >= ?", random_id).limit(1)
try this:
User.offset(0..(rand(User.count))).limit(1).sample
Enjoy it :)
User.offset(rand(User.count)).limit(1).firstIt can sometimes be important to disable Rails caching, such as if you plan to run the same query multiple times within the same HTTP request:
random_user = User.uncached do
# query is not cached
User.order(Arel.sql("RANDOM()")).first
end
Any of the other, potentially more efficient queries in this thread will work as long as they're inside the uncached block.
Here's a simplified example of the bug we hit. Suppose you have a method that returns a random user, and you call that method more than once:
def get_random_user
User.order(Arel.sql("RANDOM()")).first
end
player_1 = get_random_user
player_2 = get_random_user
# In rails console...
player_1 == player_2 # => false (query not cached)
# If run in a Rails controller
player_1 == player_2 # => true (query IS cached)
As described above, the solution is to use ActiveRecord's uncached method to prevent caching:
def get_random_user
User.uncached do
User.order(Arel.sql("RANDOM()")).first
end
end
You can get a sample in Rails console using Model.all.sample.attribute.
E.g.:
Contact.all.sample.name
=> "Bob Mcmillan"