1

I'm using postgres 10 and I'm looking to randomise some data.

I start by creating a temporary table and fill it with 1,000 rows of random data.

I then want to merge that into another table that may have less or more rows than the random data.

For each row in my dimension table I want to select a random row from the random data in the temporary table, setting the values in the dimension table to the randomly selected rows values in the temporary table.

eg.

I have a table called reference.tv_shows with the fields Name and Category.

I have a temporary table called random_tv_shows with the fields Name and Category. This data is completely random and consists of 1,000 rows.

I want to go through EACH row in the reference.tv_shows and pick a random row in the random_tv_shows table and set the reference.tv_shows Name and Category to be that of the selected row in random_tv_shows.

I tried running a fairly simple select but it looks as though it evaluates itself once then updates (Or maybe RANDOM() is only random once per TX?).

UPDATE reference.tv_shows SET "Name" = (SELECT "Name" FROM random_tv_shows ORDER BY RANDOM() LIMIT 1)

Is there a way to do this in postgres?

6
  • 1
    You did forget to explain what you are trying to do (in a clear way), You just told what you are doing, not what you have, and what you want to get. Please add sample input, and desired output (like in: minimal reproducible example) Commented Mar 14, 2022 at 10:34
  • @Luuk Wow, really? I'm quite surprised. I figured that it was quite clear but obviously not. I'll try and fix it so that it's easier to understand... Commented Mar 14, 2022 at 10:39
  • @Luuk I suppose it depends how you read it. I could re-phrase it a little bit as it may come across like I am suggesting "I tried this and it didn't work" but really I was trying to ask "How would I do this?". Let me re-phrase :) Commented Mar 14, 2022 at 10:41
  • So, you are really (trying to) pick a random row from a table that has random values (random_tv_shows) ? Commented Mar 14, 2022 at 10:53
  • @Luuk Yep. Hopefully that's a bit clearer. Apologies :) Commented Mar 14, 2022 at 10:58

1 Answer 1

2

When I have a test table, with the field a which is an integer,

If I do this:

update test set a=random()*1000;

If wil get random values for every record in my table.

But when I do this:

update test set a=(select random()*1000);

All values for a will be the same.

This is shown in this DBFIDDLE

Because, when updating the table reference.tv_shows, you only want 1 tv_show to be updated, you need to have a unique identifier for every tv_show. currently that info is not available in the question.

EDIT: I tried to reproduce your data (less records, and lack of imagination on categories, but... 😉).

When you have a unique id in your tables you can do:

UPDATE tv_shows 
SET Name = rts.Name,
    Category =  rts.Category
FROM tv_shows ts
INNER JOIN (SELECT ROW_NUMBER() OVER () R, Name, Category 
            FROM random_tv_shows
            ORDER BY RANDOM()) rts on rts.R = ts.id
WHERE tv_shows.id = ts.id

see DBFIDDLE

Sign up to request clarification or add additional context in comments.

3 Comments

Haha, bingo! Thankyou. Sorry I didn't provide any data. I thought it might be quick thing but it looks as though it was a bit more complex and some data may have helped. Thankyou again.
I am working on a similar task and wondering how to update each row without a common unique id between both the tables.
@NishantGhodke: When you have a new question, please ask a new question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.