3

I wish to compare two arrays in a Postgres query, returning true when the first array is embedded within the second array. The smaller array can occur at any point within the larger one. It's probably best shown with an example. For the following, *cmp* is the magical operator that I'm hoping to find.

{b,c}   *cmp* {a,b,b,c,d} -- true
{b,d}   *cmp* {a,b,b,c,d} -- false
{a,b,b} *cmp* {a,b,b,c,d} -- true
{a,b}   *cmp* {a,b,b,c,d} -- true
{a,b,c} *cmp* {a,b,b,c,d} -- false

I know of the <@ operator, which is a good start, but does not take into account the order of elements.

   {b,d} <@ {a,b,b,c,d} -- true, but I want false

I have in my code a workaround which is quite ugly (perl's DBD::Pg uses '?' as a placeholder)

array_values::text similar to '%({|,)' || ? || '(,|})%'

Seems to work, but I'd love to be able to use an index here. It will also fall over whenever quotes are used in the text representation, but fortunately that won't happen for my use case. Am I missing a trick?

EDIT

I probably should have made better examples. Here are some more

{bb,c}   *cmp* {a,b,bb,c,d} -- true
{b,c}    *cmp* {a,b,bb,c,d} -- false
{a,b,bb} *cmp* {a,b,bb,c,d} -- true
{a,b,b}  *cmp* {a,b,bb,c,d} -- false
{c,d}    *cmp* {a,b,bb,c,d} -- true
2
  • I cannot think of anything smarter than a substring match either. That can be made fast with a trigram index. Commented Nov 12, 2019 at 11:38
  • You can write a function that doesn't use string concatenation, but I don't see a way to make that use an index Commented Nov 12, 2019 at 14:22

2 Answers 2

2

You can do this without comparing text versions. I'm not sure if the performance will be better though. Basically, check using the @> operator as a fast-fail (hopefully) and then look for the first item of the array in the test array. Grab a slice starting from that position and see if it's the same as the test array.

CREATE TABLE test (a text[]);
INSERT INTO test VALUES ('{bb,c}'), ('{b,c}'), ('{a,b,bb}'), ('{a,b,b}'), ('{c,d}');
SELECT a, 
       '{a, b, bb, c, d, b}' @> a AND (
         SELECT bool_or(
               ('{a, b, bb, c, d, b}'::text[])[x:(x+array_length(a,1) - 1)] = a
               ) 
         FROM unnest(array_positions('{a, b, bb, c, d, b}', a[1])) as pos(x)
       )
FROM test;
    a     | ?column?
----------+----------
 {bb,c}   | t
 {b,c}    | f
 {a,b,bb} | t
 {a,b,b}  | f
 {c,d}    | t

I added an extra 'b' to the test array so array_positions would return more than one result for the second test.

Sign up to request clarification or add additional context in comments.

1 Comment

Wow, that's quite an involved process! Similar in intent to @jjanes solution, but this one benefits from being general purpose whereas jjanes one will fail for elements with quotes embedded in them.
0

Use @> to get the index, then recheck for the order using your current method.

array_values::text similar to '%({|,)' || $1 || '(,|})%' and array_values @> ('{'||$1||'}')::text[]

How fast this will be will depend on how many rows have all the right values but not in the right order.

I use $1 rather than ? so that you don't have to specify the same parameter twice from your perl.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.