In PostgreSQL, return rows with unique values of one column based on the minimum value of another

Question

Background

I've got this PostgreSQL join that works pretty well for me:

select  m.id,
        m.zodiac_sign,
        m.favorite_color,
        m.state,
        c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id

As you can see, I'm joining two tables to bring in a combined_id, which I need for later analysis elsewhere.

The Goal

I'd like to write a query that does so by picking the combined_id that's got the lowest value of m.id next to it (along with the other variables too). This ought to result in a new table with unique/distinct values of combined_id.

The Problem

The issue is that the current query returns ~300 records, but I need it to return ~100. Why? Each combined_id has, on average, 3 different m.id's. I don't actually care about the m.id's; I care about getting a unique combined_id. Because of this, I decided that a good "selection criterion" would be to select rows based on the lowest value m.id for rows with the same combined_id.

What I've tried

I've consulted several posts on this and I feel like I'm fairly close. See for instance this one or this one. This other one does exactly what I need (with MAX instead of MIN) but he's asking for it in Unix Bash 😞

Here's an example of something I've tried:

select  m.id,
        m.zodiac_sign,
        m.favorite_color,
        m.state,
        c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
WHERE m.id IN (select min(m.id))

This returns the error ERROR: aggregate functions are not allowed in WHERE.

Any ideas?

Tim Biegeleisen · Accepted Answer · 2021-07-28 06:19:34Z

3

Postgres's DISTINCT ON is probably the best approach here:

SELECT DISTINCT ON (c.combined_id)
    m.id,
    m.zodiac_sign,
    m.favorite_color,
    m.state,
    c.combined_id
FROM people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c
    ON m.id = c.id
ORDER BY
    c.combined_id,
    m.id;

As for performance, the following index on the crosstable might speed up the query:

CREATE INDEX idx ON people.person_to_person_composite_crosstable (id, combined_id);

If used, the above index should let the join happen faster. Note that I cover the combined_id column, which is required by the select.

edited Jul 28, 2021 at 6:19

answered Jul 28, 2021 at 6:03

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

logjammin Over a year ago

Clicking refresh so fast I went with your first draft that had SELECT DISTINCT ON (m.id) instead of combined_id, which yielded the same ~300 rows 😂 The new version worked perfectly. I'd tried variations on SELECT DISTINCT but didn't bump into SELECT DISTINCT ON. Is it unique to Postgres?

Tim Biegeleisen Over a year ago

@logjammin Yeah... DISTINCT ON is totally unique to Postgres, though some other databases have similar things (such as SQL Server). By the way, the second edit of my answer actually matches what you asked for in the question. But I have reverted as the first version seems to be what you really want.

logjammin Over a year ago

Oh wait no, don't revert -- maybe I messed up in my question but that second version you had (SELECT DISTINCT ON (c.combined_id)) is the one I needed, as I want a table with unique combined_id's, and whichever m.id that appears next to it is fine.

Collectives™ on Stack Overflow

In PostgreSQL, return rows with unique values of one column based on the minimum value of another

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related