5

A web application can send to a function an array of arrays like

[
    [
        [1,2],
        [3,4]
    ],
    [
        [],
        [4,5,6]
    ]
]

The outer array length is n > 0. The middle arrays are of constant length, 2 in this example. And the inner arrays lengths are n >= 0.

I could string build it like this:

with t(a, b) as (
    values (1, 4), (2, 3), (1, 4), (7, 3), (7, 4)
)
select distinct a, b
from t
where
    (a = any(array[1,2]) or array_length(array[1,2],1) is null)
    and
    (b = any(array[3,4]) or array_length(array[3,4],1) is null)
    or
    (a = any(array[]::int[]) or array_length(array[]::int[],1) is null)
    and
    (b = any(array[4,5,6]) or array_length(array[4,5,6],1) is null)
;
 a | b 
---+---
 7 | 4
 1 | 4
 2 | 3

But I think I can do better like this

with t(a, b) as (
    values (1, 4), (2, 3), (1, 4), (7, 3), (7, 4)
), u as (
    select unnest(a)::text[] as a
    from (values
        (
            array[
                '{"{1,2}", "{3,4}"}',
                '{"{}", "{4,5,6}"}'
            ]::text[]
        )
    ) s(a)
), s as (
    select a[1]::int[] as a1, a[2]::int[] as a2
    from u
)
select distinct a, b
from
    t
    inner join
    s on
        (a = any(a1) or array_length(a1, 1) is null)
        and
        (b = any(a2) or array_length(a2, 1) is null)
;
 a | b 
---+---
 7 | 4
 2 | 3
 1 | 4

Notice that a text array was passed and then "casted" inside the function. That was necessary as Postgresql can only deal with arrays of matched dimensions and the passed inner arrays can vary in dimension. I could "fix" them before passing by adding some special value like zero to make them all the same length of the longest one but I think it is cleaner to deal with that inside the function.

Am I missing something? Is it the best approach?

2
  • Is the format of the 3-dimensional array wrapped in text format given or can you chose how to pass these values? And the function receiving these values is plpgsql? Postgres version? Commented Oct 9, 2013 at 23:29
  • @Erwin I can chose how to pass the values. The front end is Python so very flexible. That format is just how I managed it to work. The function is now working in sql for simplicity but it can be plpgsql or plpython. Postgresql 9.3. Commented Oct 10, 2013 at 0:05

1 Answer 1

2

I like your second approach.

SELECT DISTINCT t.*
FROM   (VALUES (1, 4), (5, 1), (2, 3), (1, 4), (7, 3), (7, 4)) AS t(a, b)
JOIN   (
   SELECT arr[1]::int[] AS a1
         ,arr[2]::int[] AS b1
   FROM   (
      SELECT unnest(ARRAY['{"{1,2}", "{3,4}"}'
                         ,'{"{}"   , "{4,5,6}"}'
                         ,'{"{5}"  , "{}"}'    -- added element to 1st dimension
                         ])::text[] AS arr     -- 1d text array
      ) sub
   ) s ON (a = ANY(a1) OR a1 = '{}')
      AND (b = ANY(b1) OR b1 = '{}')
;

Suggesting only minor improvements:

  1. Subqueries instead of CTEs for slightly better performance.

  2. Simplified test for empty array: checking against literal '{}' instead of function call.

  3. One less subquery level for unwrapping the array.

Result:

a | b
--+---
2 | 3
7 | 4
1 | 4
5 | 1

For the casual reader: Wrapping the multi-dimensional array of integer is necessary, since Postgres demands that (quoting error message):

multidimensional arrays must have array expressions with matching dimensions

An alternate route would be with a 2-dimensional text array and unnest it using generate_subscripts():

WITH a(arr) AS (SELECT '{{"{1,2}", "{3,4}"}
                        ,{"{}", "{4,5,6}"}
                        ,{"{5}", "{}"}}'::text[]   -- 2d text array
             )
SELECT DISTINCT t.*
FROM  (VALUES (1, 4), (5, 1), (2, 3), (1, 4), (7, 3), (7, 4)) AS t(a, b)
JOIN  (
   SELECT arr[i][1]::int[] AS a1
         ,arr[i][2]::int[] AS b1
   FROM   a, generate_subscripts(a.arr, 1) i       -- using implicit LATERAL
   ) s ON (t.a = ANY(s.a1) OR s.a1 = '{}')
      AND (t.b = ANY(s.b1) OR s.b1 = '{}');

Might be faster, can you test?

In versions before 9.3 one would use an explicit CROSS JOIN instead of lateral cross joining.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes the generate_subscripts version is consistently faster in the real function. Execution time goes from 10 to 5 ms.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.