1

I have two tables wherein one of the columns contain the data which is comma separated. I need to compare both the columns. The comma separated fields can be in any order, but the order is irrelevant.

create table x_a (id1 integer, shared text);
create table x_b (id1 integer, shared text);

insert into x_a values 
  (1, 'A,B,C,D,E')
, (2, 'A,B,C,D,E');

insert into x_b values 
  (1, 'B,A,C,E,D')
, (2, 'B,A,C,E');

I used below query, but it is not returning any output:

select a.id1,b.id1, a.shared, b.shared
from x_a a ,x_b b
where a.id1 = b.id1 
and regexp_split_to_array(LOWER(a.shared),',')
  = regexp_split_to_array(LOWER(b.shared),',')

I cannot use the operator && as it will return id=2 which is wrong as the "Shared" column is not the exact copy.

2
  • because regexp_split_to_array(LOWER(a.shared),',') != regexp_split_to_array(LOWER(b.shared),','), since abcde != baced and abcde != bace , respectively. Commented Mar 19, 2019 at 10:13
  • 1
    Change your database/table design. Storing comma separated values violates the fundamental rules of relational databases. Commented Mar 19, 2019 at 10:38

2 Answers 2

3

I cannot use the operator && as it will return id=2 which is wrong ...

But you can use the array operators @> and <@ like this:

SELECT id1, a.shared AS a_shared, b.shared AS b_shared
FROM   x_a a
JOIN   x_b b USING (id1)
WHERE  string_to_array(a.shared, ',') @> string_to_array(b.shared, ',')
AND    string_to_array(a.shared, ',') <@ string_to_array(b.shared, ',');

If A contains B, and B contains A, both are equal - ignoring duplicates.

You might want to store (sorted) arrays to begin with - or normalize your DB design with a 1:n relationship.

If your elements are integer numbers, consider the additional intarray module for superior performance. See:

Sign up to request clarification or add additional context in comments.

Comments

0

The = for array depends on the order of the element, so you need to sort the elements when you turn them into an array. I think the easiest way is to do that with your own function.

create function string_to_array_sorted(p_input text)
  returns text[]
as
$$
  select array_agg(t.x order by t.x)
  from unnest(string_to_array(lower(p_input), ',')) as t(x);
$$
language sql
immutable;

Note that string_to_array() is much more efficient than regexp_split_to_array() if you don't need a regex.

With the above function your query can be written as:

select a.id1,b.id1, a.shared, b.shared
from x_a a 
  join x_b b on a.id1 = b.id1 
where string_to_array_sorted(a.shared) = string_to_array_sorted(b.shared);

Note that I replaced the ancient, outdated and fragile implicit join with a "modern" (30 years old) explicit JOIN operator.

You can create an index to speed this up:

create index on x_a (id1, string_to_array_sorted(shared));
create index on x_b (id1, string_to_array_sorted(shared));

4 Comments

Thank you, it worked. But I do have Crores of data in the table, so I would have preferred avoiding a function call
You already have a function call (regexp_split_to_array) - but you have to sort the elements somehow. Whether you do that in a function or "inline" doesn't really matter. The overhead of a language sql (especially when declared immutable) is next to none compared to "inlining" the same functionality (where you would need an additional group by) statement. The function gives you the ability to create an index on those values. If you really have millions of rows, this might actually be faster than anything you "inline" in your statement
This function is returning the output in Curly braces. basically it is converting the string to an array. For example string " A,B,D,C,E' is getting converted to array and we are getting the output as " {a,b,c,d,e}
@PreetiKaur: yes, it's returning an array - the same that regexp_split_to_array returns

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.