0

I have the following query:

SELECT DISTINCT
  branch.id,
  branch.name,
  SUM(CAST(COALESCE(NULLIF(REPLACE(dia_anterior.col_9, ',', '.'), ''), '0.0') AS double precision)) AS total_venta,
  SUM(CAST(COALESCE(NULLIF(REPLACE(dia_anterior.col_4, ',', '.'), ''), '0.0') AS double precision)) AS total_personas,
  SUM(CAST(COALESCE(NULLIF(REPLACE(ocupacion_dia_anterior.col_3, ',', '.'), ''), '0') AS double precision)) AS total_ocupacion
FROM branches AS branch
  INNER JOIN queries AS q_dia_anterior
    ON q_dia_anterior.query_structure_id = (SELECT id FROM query_structures WHERE query_structures.slug = 'dia-anterior')
  INNER JOIN queries AS q_ocupacion_dia_anterior
    ON q_ocupacion_dia_anterior.query_structure_id = (SELECT id FROM query_structures WHERE query_structures.slug = 'ocupacion-dia-anterior')
  INNER JOIN queries AS q_ventas_x_articulo_dia_anterior
    ON q_ventas_x_articulo_dia_anterior.query_structure_id = (SELECT id FROM query_structures WHERE query_structures.slug = 'ventas-x-articulo-dia-anterior')
  INNER JOIN branch_data AS dia_anterior
    ON dia_anterior.query_id = q_dia_anterior.id
      AND dia_anterior.branch_id = branch.id
  INNER JOIN branch_data AS ocupacion_dia_anterior
    ON ocupacion_dia_anterior.query_id = q_ocupacion_dia_anterior.id
      AND ocupacion_dia_anterior.branch_id = branch.id
  INNER JOIN branch_data AS ventas_x_articulo_dia_anterior
    ON ventas_x_articulo_dia_anterior.query_id = q_ventas_x_articulo_dia_anterior.id
      AND ventas_x_articulo_dia_anterior.branch_id = branch.id
WHERE
  branch.account_id = 1
GROUP BY
  branch.id,
  branch.name
ORDER BY
  2

When I group the results to sum the columns, looks like the values from different rows are concatenating, resulting in huge values.

Am I missing something?
Why am I getting the values summed up multiple times?

5
  • I'm pretty sure you don't want SELECT DISTINCT if you're also doing a GROUP BY. Commented Apr 8, 2015 at 0:53
  • I'm actually getting the same result :/ Commented Apr 8, 2015 at 0:54
  • The GROUP BY in combination with the aggregate function SUM() is already returning a unique set of rows. So specifying DISTINCT, as you observed, doesn't alter the results, but it is an unnecessary distraction. It also causes the server to sort and unique your results unnecessarily. Sorry I couldn't help you with your main problem, though! Commented Apr 8, 2015 at 1:12
  • IIUC, "concatenation" is the wrong term here. This is rather confusing, since you also convert text to numeric data. Seems you mean "multiplication". Commented Apr 8, 2015 at 4:06
  • You are right @ErwinBrandstetter Commented Apr 8, 2015 at 16:28

2 Answers 2

2

Filling in with some educated guesses, this might be the query you want:

SELECT b.name
     , COALESCE(dia_actual.total_venta   , 0) AS total_venta
     , COALESCE(dia_actual.total_personas, 0) AS total_personas
     , COALESCE(ocupacion.total_ocupacion, 0) AS total_ocupacion
FROM   branches b
LEFT   JOIN LATERAL (
   SELECT sum(replace(NULLIF(bd.col_9, ''), ',', '.')::float) AS total_venta
        , sum(replace(NULLIF(bd.col_4, ''), ',', '.')::float) AS total_personas
   FROM   branch_data      bd
   JOIN   queries          q  ON q.id = bd.query_id
   JOIN   query_structures qs ON qs.id = q.query_structure_id
   WHERE  bd.branch_id = b.id  -- lateral reference
   AND    qs.slug = 'dia-actual'
   ) dia_actual ON true
LEFT   JOIN LATERAL (
   SELECT sum(replace(NULLIF(bd.col_3, ''), ',', '.')::float) AS total_ocupacion
   FROM   branch_data      bd
   JOIN   queries          q  ON q.id = bd.query_id
   JOIN   query_structures qs ON qs.id = q.query_structure_id
   WHERE  bd.branch_id = b.id  -- lateral reference
   AND    qs.slug = 'ocupacion-dia-anterior'
   ) ocupacion ON true
WHERE  b.account_id = 1
ORDER  BY  b.name;

Notes

  • Sum first, then join, to avoid multiplication of rows due to the proxy cross join.

  • Use LEFT JOIN to avoid losing rows it not values are found in one of the queries to the right.

  • However, since you select a single branch_id I switched to LEFT JOIN LATERAL ... ON true, assuming you are only using a small fraction of all rows.

  • No need for a more expensive CTE.

  • Also simplified your transformation from text to double precision.

Naturally, it would be better to store those numbers as a numeric data type to begin with - not as text.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your response. I didn't know about LEFT JOIN LATERAL ... ON true. It actually reduces the execution time using this in combination with CTEs. The data is coming from different data sources, different POS software packages, different databases, that's why I'm storing strings only.
@SaulMartínez: It is almost always better to store timestamp data as timestamps. And I don't see how CTEs could help with performance here.
0

Although I don't think this answer the question properly, I got the desired results with the following query:

WITH dia_actual AS (
SELECT
  branch_data.branch_id,
  CAST(COALESCE(NULLIF(REPLACE(branch_data.col_9, ',', '.'), ''), '0') AS double precision) AS total_venta,
  CAST(COALESCE(NULLIF(REPLACE(branch_data.col_4, ',', '.'), ''), '0') AS double precision) AS total_personas
FROM branch_data
  INNER JOIN queries AS query
    ON query.id = branch_data.query_id
  INNER JOIN query_structures AS query_structure
    ON query_structure.id = query.query_structure_id
WHERE
  query_structure.slug = 'dia-actual'
), ocupacion AS (
SELECT
  branch_data.branch_id,
  CAST(COALESCE(NULLIF(REPLACE(branch_data.col_3, ',', '.'), ''), '0') AS double precision) AS total_ocupacion
FROM branch_data
  INNER JOIN queries AS query
    ON query.id = branch_data.query_id
  INNER JOIN query_structures AS query_structure
    ON query_structure.id = query.query_structure_id
WHERE
  query_structure.slug = 'ocupacion-dia-anterior'
)
SELECT
  branch.name,
  SUM(dia_actual.total_venta) AS total_venta,
  SUM(dia_actual.total_personas) AS total_personas,
  SUM(ocupacion.total_ocupacion) AS total_ocupacion
FROM branches AS branch
  INNER JOIN dia_actual
    ON dia_actual.branch_id = branch.id
  INNER JOIN ocupacion
    ON ocupacion.branch_id = branch.id
WHERE
  branch.account_id = 1
GROUP BY
  branch.name
ORDER BY
  branch.name

Using CTEs the values are summed correctly.

1 Comment

You probably get multiplied values from this after the proxy cross-join between dia_actual and ocupacion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.