1

I have a one to many relationship. In this case, it's a pipelines entity that can have many segments. The segments entity has a column to list the wells associated with this pipeline. This column is purely informational, and is only updated from a regulatory source as a comma separated list, so the data type is text.

What I want to do is to list all the pipelines and show the segment column that has the most associated wells. Each well is identified with a standardized land location (text is the same length for each well). I am also doing other aggregate functions on the segments, so my query looks something like this (I have to simplify it because it's pretty large):

SELECT pipelines.*, max(segments.associated_wells), min(segments.days_without_production), max(segments.production_water_m3)
FROM pipelines
JOIN segments ON segments.pipeline_id = pipelines.id
GROUP BY pipelines.id

This selects the associated_wells that has the highest alphabetical value, which makes sense, but is not what I want.

max(length(segments.associated_wells)) will select the record I want, but only show the length. I need to show the column value.

How can I aggregate based on the string length but show the value?

Here's an example of what I am expecting:

Segment entity:

| id | pipeline_id | associated_wells         | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1  | 1           | 'location1', 'location2' | 30                      | 2.3                 |
| 2  | 1           | 'location1'              | 15                      | 1.4                 |
| 3  | 2           | 'location1'              | 20                      | 1.8                 |

Pipeline entity:

| id | name        |
|----|-------------|
| 1  | 'Pipeline1' |
| 2  | 'Pipeline2' |
|    |             |

Desired Query Result:

| id | name        | associated_wells         | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1  | 'Pipeline1' | 'location1', 'location2' | 15                      | 2.3                 |
| 2  | 'Pipeline2' | 'location1'              | 20                      | 1.8                 |
|    |             |                          |                         |                     |

2 Answers 2

1

If I understand correctly, you want DISTINCT ON:

SELECT DISTINCT ON (p.id) p.*, s.*
FROM pipelines p JOIN
     segments s
     ON s.pipeline_id = p.id
ORDER BY p.id, LENGTH(s.associated_wells) DESC;
Sign up to request clarification or add additional context in comments.

3 Comments

not quite, I realized in my question I forgot to add the group by clause in the query example. I am doing aggregates on many columns, which is why I included other columns in the example.
@Sehael . . . You are returning one row per pipeline.id in your query, which is what this does. It seems like what you are asking for.
take a look at the example I posted. I want to take the values of some columns from different records.
0

Keep normalising and verticalise the locations/associated wells, by cross joining with a series of integers, and then group twice:

WITH
segment(seg_id,pipeline_id,associated_wells,days_without_production,production_water_m3) AS (
          SELECT 1,1,'location1, location2',30,2.3
UNION ALL SELECT 2,1,'location1',15,1.4
UNION ALL SELECT 3,2,'location1',20,1.8
)
,
pipeline(pipeline_id,name) AS (
          SELECT 1,'Pipeline1'
UNION ALL SELECT 2,'Pipeline2'
)
,
i(i) AS (
          SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
location AS (
  SELECT
    seg_id
  , i AS loc_id
  , SPLIT_PART(associated_wells,', ',i) AS location
  FROM segment CROSS JOIN i
  WHERE SPLIT_PART(associated_wells,',' ,i) <>''
)
,
pregroup AS (
  SELECT
    segment.pipeline_id
  , location.location
  , MIN(days_without_production) AS days_without_production
  , MAX(production_water_m3)     AS production_water_m3
  FROM segment
  JOIN pipeline USING(pipeline_id)
  JOIN location USING(seg_id)
  GROUP BY 1,2 
)
SELECT
  pipeline_id
, STRING_AGG(location,',')     AS locations
, MIN(days_without_production) AS days_without_production
, MAX(production_water_m3)     AS production_water_m3
FROM pregroup
GROUP BY 1;
 pipeline_id |      locations      | days_without_production | production_water_m3 
-------------+---------------------+-------------------------+---------------------
           1 | location1,location2 |                      15 |                 2.3
           2 | location1           |                      20 |                 1.8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.