4

I have a table with user_ids that we've gathered from a streaming datasource of active accounts. Now I'm looking to go through and fill in the information about the user_ids that don't do much of anything.

Is there a SQL (postgres if it matters) way to have a query return random numbers not present in the table?

Eg something like this:

SELECT RANDOM(count, lower_bound, upper_bound) as new_id 
WHERE new_id NOT IN (SELECT user_id FROM user_table) AS user_id_table

Possible, or would it be best to generate a bunch of random numbers with a scripted wrapper and pass those into the DB to figure out non existant ones?

5 Answers 5

2

It is posible. If you want the IDs to be integers, try:

SELECT trunc((random() * (upper_bound - lower_bound)) + lower_bound) AS new_id 
FROM generate_series(1,upper_bound) 
WHERE new_id NOT IN (
    SELECT user_id 
    FROM user_table)
Sign up to request clarification or add additional context in comments.

2 Comments

Hmm, it looks like it should work but postgres complains that new_id does not exist. It does the same with replacing the nested select with just a list of numbers. Maybe a DB engine limitation?
Hmmm.... Yes, apparently you cannot use column aliases in a WHERE or HAVING clause in postgresql. You could consider using pl/pgsql to set a variable to a random number, test it against the table, and repeat until you get a good one.
2

You can wrap the query above in a subselect, i.e.

SELECT * FROM (SELECT trunc(random() * (upper - lower) + lower) AS new_id  
FROM generate_series(1, count)) AS x 
WHERE x.new_id NOT IN (SELECT user_id FROM user_table)

Comments

1

I suspect you want a random sampling. I would do something like:

SELECT s
  FROM generate_series(1, (select max(user_id) from users) s
  LEFT JOIN users ON s.s = user_id
 WHERE user_id IS NULL
 order by random() limit 5;

I haven't tested this but the idea should work. If you have a lot of users and not a lot of missing id's it will perform better than the other options, but performance no matter what you do may be a problem.

Comments

1

My pragmatic approach would be: generate 500 random numbers and then pick one which is not in the table:

WITH fivehundredrandoms AS ( RANDOM(count, lower_bound, upper_bound) AS onerandom
FROM (SELECT generate_series(1,500)) AS fivehundred )
SELECT onerandom FROM fivehundredrandoms 
WHERE onerandom NOT IN (SELECT user_id FROM user_table WHERE user_id > 0) LIMIT 1;

Comments

0

There is way to do what you want with recursive queries, alas it is not nice.

Suppose that you have the following table:

CREATE TABLE test (a int)

To simplify, you want to insert random numbers from 0 to 4 (random() * 5)::int that are not in the table.

 WITH RECURSIVE rand (i, r, is_new) AS (
  SELECT 
     0,
     null,
     false
  UNION ALL
    SELECT 
      i + 1,
      next_number.v,
      NOT EXISTS (SELECT 1 FROM test WHERE test.a = next_number.v) 
   FROM
     rand r,
     (VALUES ((random() * 5)::int)) next_number(v)
   -- safety check to make sure we do not go into an infinite loop
   WHERE i < 500
)
SELECT * FROM rand WHERE rand.is_new LIMIT 1

I'm not super sure, but PostgreSQL should be able to stop iterating once it has one result, since it knows that the query has limit 1.

Nice thing about this query is that you can replace (random() * 5)::int with any id generating function that you want

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.