1

I have two tables: users and results. A user has many results.

I need to generate 10 million records in our users table and I was able to do this using the generate_series function in postgres.

Now, I want to generate millions of rows in the results table. But I want there to be a certain distribution of these rows. For example, I want 50% of the users to only have 1 result. I want 40% to have 2 results. And I want 10% to have 5 results.

Is there a way to generate this random data in the results table in postgres?

1 Answer 1

1

Yes:

select u.user_id, gs.result
from (select u.*,
             ntile(10) over (order by rand()) as decile
      from users u
     ) u cross join lateral
     generate_series(1, (case when u.decile <= 5 then 1 when u.decile <= 9 then 2 else 5 end)) gs(result);

This generates the rows. You can fill in with the data you want.

Sign up to request clarification or add additional context in comments.

4 Comments

this is really awesome, thank you. This is much more advanced sql than I know so a question for you: how does generate_series know to go through every "decile" value and generate it's resulting table? Does it know that decile holds values 1-10 and does something under the hood?
@user402516 . . . ntile() returns a value between 1 and 10 (because 10 is the argument). I added table aliases to make it clear that generate_series() is using this information.
I noticed you added the "lateral" to the "Cross join" your query above. But the query seemed to work without it. From googling, it seems that you need lateral there in order to reference subquery aliases. But that didn't seem to be the case as the original query worked for me. Did you change it just to be more explicit?
@user402516 . . . The lateral is optional for generate_series(). I try to always include it so the logic is quite clear.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.