11

Let's say I'm creating a table foo with a column bar that should be a very large random integer.

CREATE TABLE foo (
    bar bigint DEFAULT round(((9223372036854775807::bigint)::double precision * random())) NOT NULL,
    baz text
);

Is this the best way to do this? Can anyone speak to the quality of PostgreSQL's random() function? Is the multiplication here masking the entropy?

Note that I do have good hardware entropy feeding into /dev/random.

1
  • For this example you might want to consider using UUID's instead, which supports a random 128-bit integer via gen_random_uuid(). Commented Aug 13 at 9:15

2 Answers 2

18

Postgresql random is based on their own portable implementation of POSIX erand48. It's a linear congruential PRNG in a 48 bit domain.

If you need something stronger look to the pg_crypto module's gen_random_bytes function which is used to produce cryptographically strong entropy.

Sign up to request clarification or add additional context in comments.

4 Comments

This is the beauty of open source!
Any more info about the multiplication portion of this question? Is the multiplication here masking the entropy?
Your number looks like a max signed long (263-1), which is a larger domain than pg's random produces. An IEEE 754 double has only 53 bits of mantissa precision which is also smaller than your 63 bit domain. So short-answer, I think exploding the random to 63 bits is going to get sparse coverage in the low order bits and you'll get better coverage if you limit the multiply to the same space as the core PRNG (248).
Note that I have just posted this example using the gen_random_bytes() function: dba.stackexchange.com/a/15884/6776
2

Starting with version 15.0 (released Oct 2022), PostgreSQL uses xoroshiro128** as its PRNG algorithm, for producing the values for random() among other Postgres functions. The xoroshiro algo family was introduced in 2018 and, as of writing, is still quite state-of-the-art.

Regarding the multiplication question, and in reference to @dbenhur's comment, the new algorithm and its 64-bit output will produce better results for your approach, but you still have the double precision bottleneck, so you still won't maximize your 263−1 domain.

If you can relax the requirement of an integer column, especially if you don't need it to be the primary key, consider the simpler and more elegant alternative of using a double precision-type column to save random() values directly.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.