1

Background

I've got this PostgreSQL join that works pretty well for me:

select  m.id,
        m.zodiac_sign,
        m.favorite_color,
        m.state,
        c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id

As you can see, I'm joining two tables to bring in a combined_id, which I need for later analysis elsewhere.

The Goal

I'd like to write a query that does so by picking the combined_id that's got the lowest value of m.id next to it (along with the other variables too). This ought to result in a new table with unique/distinct values of combined_id.

The Problem

The issue is that the current query returns ~300 records, but I need it to return ~100. Why? Each combined_id has, on average, 3 different m.id's. I don't actually care about the m.id's; I care about getting a unique combined_id. Because of this, I decided that a good "selection criterion" would be to select rows based on the lowest value m.id for rows with the same combined_id.

What I've tried

I've consulted several posts on this and I feel like I'm fairly close. See for instance this one or this one. This other one does exactly what I need (with MAX instead of MIN) but he's asking for it in Unix Bash 😞

Here's an example of something I've tried:

select  m.id,
        m.zodiac_sign,
        m.favorite_color,
        m.state,
        c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
WHERE m.id IN (select min(m.id))

This returns the error ERROR: aggregate functions are not allowed in WHERE.

Any ideas?

1 Answer 1

3

Postgres's DISTINCT ON is probably the best approach here:

SELECT DISTINCT ON (c.combined_id)
    m.id,
    m.zodiac_sign,
    m.favorite_color,
    m.state,
    c.combined_id
FROM people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c
    ON m.id = c.id
ORDER BY
    c.combined_id,
    m.id;

As for performance, the following index on the crosstable might speed up the query:

CREATE INDEX idx ON people.person_to_person_composite_crosstable (id, combined_id);

If used, the above index should let the join happen faster. Note that I cover the combined_id column, which is required by the select.

Sign up to request clarification or add additional context in comments.

3 Comments

Clicking refresh so fast I went with your first draft that had SELECT DISTINCT ON (m.id) instead of combined_id, which yielded the same ~300 rows 😂 The new version worked perfectly. I'd tried variations on SELECT DISTINCT but didn't bump into SELECT DISTINCT ON. Is it unique to Postgres?
@logjammin Yeah... DISTINCT ON is totally unique to Postgres, though some other databases have similar things (such as SQL Server). By the way, the second edit of my answer actually matches what you asked for in the question. But I have reverted as the first version seems to be what you really want.
Oh wait no, don't revert -- maybe I messed up in my question but that second version you had (SELECT DISTINCT ON (c.combined_id)) is the one I needed, as I want a table with unique combined_id's, and whichever m.id that appears next to it is fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.