2

I have a table as follows

id| page |   text
------------------------
1 | page1 | Hello World
2 | page1 | Foo Bar
3 | page2 | Baz Baz
3 | page2 | Some Text
4 | page3 | Some Other Text

I want to select 2 random entries - but each page is allowed to appear only once in the result.

I have

SELECT * FROM mydata ORDER BY RANDOM(); LIMIT 2

But can I combine this with DISTINCT or grouping?

2

4 Answers 4

2

Something like:

select id, page, text
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from mydata
) 
where rn <= 2
Sign up to request clarification or add additional context in comments.

2 Comments

I think the OP is looking for a total of two rows from the base table. Your query retrieves two random rows per page.
@ErwinBrandstetter: that was my understanding (retrieving two rows per page)
1

If you want:
... a total of two rows from the base table
... and give every page an equal chance to appear in the sample, regardless of how many entries it has in the table:

SELECT *
FROM  (
    SELECT DISTINCT ON (page) *
    FROM   mydata
    ORDER  BY page, random() -- pick one random entry per page
    ) x
ORDER BY random() -- pick two random pages
LIMIT 2;

Or, with a window function:

WITH x AS (
   SELECT *, row_number() OVER (PARTITION BY page ORDER BY random()) AS rn
   FROM   mydata
   )
SELECT id, page, text
FROM   x
WHERE  rn = 1
ORDER  BY random()
LIMIT  2;

You'll have to test which is faster.
If you are dealing with a big table and need fast performance, you can do better. Here is one way how.


If, on the other hand, you want:
... a total of two rows from table mydata
... and give every entry an (almost) equal chance a to appear in the sample, effectively giving a better chance to pages with more entries in the table.
Chances are still not truly equal - your restriction increases the chances for entries of rare pages by definition.

WITH x AS (
   SELECT *
   FROM   mydata
   ORDER  BY random()
   LIMIT 1
   )
SELECT * FROM x
UNION ALL
(
SELECT m.*
FROM mydata m
   , x
WHERE m.page <> x.page -- assuming page IS NOT NULL
ORDER BY random()
LIMIT 1
);

The parenthesis around the second SELECT of the UNION is required to allow for individual ordering.
Tested with PostgreSQL 9.1. Window functions require version 8.4 or later.

Comments

1

Same as Erwin's answer, just a bit structured: http://www.sqlfiddle.com/#!1/d3e83/6

with first_random as
(
  select * from tbl order by random() limit 1
)
, second_random as
(
  select * 
  from tbl 
  where page <> (select page from first_random)
  order by random() limit 1
)
select * from first_random
union
select * from second_random;

Same with a_horse_with_no_name's answer, except this is correct: http://www.sqlfiddle.com/#!1/d3e83/12

select id, page, text, rn
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from tbl
) x
where rn = 1
order by random() 
limit 2;

Opt for the latter, it has simpler execution plan

Comments

0

That may work:

SELECT * FROM
  (SELECT * FROM mydata GROUP BY page) t
ORDER BY RANDOM() LIMIT 2

1 Comment

No by this i will always get the text from the first occurrence of my main table.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.