5

Hi all and thanks in advance for your time.

  • The ANSI SQL standard defines the set operators UNION, EXCEPT, and INTERSECT.
  • Each of these has 2 variants - DISTINCT and ALL.
  • In addition, each can be of type SET or MULTISET.

AFAIK:

  • Only PostgreSQL supports the ALL variant for EXCEPT and INTERSECT.
  • Only Oracle supports MULTISETs.

UPDATE - Schwern informed me in a comment that MariaDB also supports ALL for set operators

For this question I'll use an EXCEPT example, although the same is true for INTERSECT.

The ANSI SQL (ISO/IEC CD 9075-2) standard defines the following:

Page 454 Section 7.16 'query expression' - Item 16: If a set operator is specified in a 'query term' or a 'query expression body', then: a) Let T1, T2, and TR be respectively the first operand, the second operand, and the result of the 'query term' or 'query expression body'. b) Let TN1 and TN2 be the effective names for T1 and T2, respectively. c) If the set operator is UNION DISTINCT, EXCEPT ALL, EXCEPT DISTINCT, INTERSECT ALL, or INTERSECT DISTINCT, then each column of T1 and T2 is an operand of a grouping operation...

Page 54 Section 4.10 Collection Types - Item 6.2: MULTISET EXCEPT is an operator that computes the multiset difference of two multisets. There are two variants, ALL and DISTINCT. The variant specified by ALL places in the result a number of instances of a value, equal to the number of instances of the value in the first operand minus the number of instances of the value in the second operand. The variant specified by DISTINCT removes duplicates from the result.

With that in mind, consider the following script (PostgreSQL 12):

CREATE TABLE T1 (C1 INT);
CREATE TABLE T2 (C1 INT);

INSERT INTO T1 VALUES (1), (2), (2), (3), (3), (3);
INSERT INTO T2 VALUES (3);

SELECT  * FROM T1
EXCEPT DISTINCT
SELECT * FROM T2;

SELECT * FROM T1
EXCEPT ALL
SELECT * FROM T2;

DROP TABLE T1, T2;

The result for the EXCEPT DISTINCT (default) is as expected: 2 rows with 1, 2.

The result for the EXCEPT ALL is not what I expect. It returns 5 rows with 3, 3, 2, 2, 1.

The PostgreSQL documentation confirms this behavior, but based on the ANSI definition I would expect to get back only 3 rows with 2, 2, 1.

The first quote suggests that the DISTINCT applies a grouping operation to the rows that qualify the operation. IMHO in this example 3 should not qualify the EXCEPT, and therefore should not be returned at all.

The behavior of subtracting (m-n) elements seems to be inline with a MULTISET operation as defined in the second quote above. However, PostgreSQL documentation state it does not support even basic multisets, and of course those require that we explicit specify MULTISET EXCEPT ALL

What am I missing here?

Based on the comments and replies I received I would like to clarify that I understand what EXCEPT ALL does here. PostgreSQL documentation is clear on this. My question is if this is the correct behavior for (SET) EXCEPT ALL. My understanding is that this is what MULTISET EXCEPT ALL is for.

Thank you in advance for your thoughts.

Be safe and healthy!

7
  • 2
    Though I agree with you about the confusing wording/keywords, the EXCEPT ALL doesn't remove duplicates. Therefore, it only removes one 3 from the first query, leaving the other two 3 intact. If you have more 3s in the second query it will remove more rows. The keypoint is that when using ALL the "result sets" don't behave as "sets" anymore, but as "lists" instead. Commented Jul 3, 2020 at 21:26
  • Thanks @TheImpaler. I understand what it does, my only problem is that it doesn't seem to agree with the standard as I understand it. So either I'm wrong (probably) or PostgreSQL got it wrong. The challenge is I have no other RDBMS to compare with... none of the others support EXCEPT ALL. Thanks for your thought! Commented Jul 3, 2020 at 21:30
  • Interesting question. Your second citation of the standard says it, though: "The variant specified by ALL places in the result a number of instances of a value, equal to the number of instances of the value in the first operand minus the number of instances of the value in the second operand". So it points out they should work as lists. Commented Jul 3, 2020 at 21:35
  • The SQL:1999 standard says when describing EXCEPT ALL on section 4.16.3 "Operations involving tables" (page 47): "The primary effect of a <delete statement: positioned> on T is to delete exactly one specified row from T". So, they behave as lists. It doesn't remove all instances, but only one row at a time. Commented Jul 3, 2020 at 21:38
  • 1
    FWIW Maria DB behaves the same as Postgres. That's the only other server on DBFiddle I found which supports the syntax. Commented Jul 3, 2020 at 23:30

2 Answers 2

1

After some more research and deciphering the standard, I believe that my original assumption was wrong, and that PostgreSQL behavior is correct.

The quote about the grouping refers to both DISTINCT and ALL. What I should have looked at is P462-463 where the standard explains:

Case:

a) If no set operator is specified, then T is the result of the specified .

b) Otherwise:

i) Let D be the degree of T.

ii) For each i, 1 (one) ≤ i ≤ D:

  1. 14 Let DTCi be the declared type of the i-th column of T.

  2. Let TCN1i be the effective name for the i-th column of T1.

  3. Let TCN2i be the effective name for the i-th column of T2.

  4. Let ET1 be a of the form SELECT CAST(TCN11 AS DTC1), CAST(TCN12 AS DTC2), ..., CAST(TCN1D AS DTCD) FROM TN1

  5. Let ET2 be a of the form SELECT CAST(TCN21 AS DTC1), CAST(TCN22 AS DTC2), ..., CAST(TCN2D AS DTCD) FROM TN2

iii) T contains the following rows:

  1. Let R be a row that is a duplicate of some row in ET1 or of some row in ET2 or both. Let m be the number of duplicates of R in ET1 and let n be the number of duplicates of R in ET2, where m ≥ 0 and n ≥ 0.

  2. If DISTINCT is specified or implicit, then

Case:

A) If UNION is specified, then

Case:

I) If m > 0 or n > 0, then T contains exactly one duplicate of R.

II) Otherwise, T contains no duplicate of R.

B) If EXCEPT is specified, then

Case:

I) If m > 0 and n = 0, then T contains exactly one duplicate of R.

II) Otherwise, T contains no duplicate of R.

C) If INTERSECT is specified, then

Case:

I) If m > 0 and n > 0, then T contains exactly one duplicate of R.

II) Otherwise, T contains no duplicates of R.

3) If ALL is specified, then

Case:

A) If UNION is specified, then the number of duplicates of R that T contains is (m + n).

B) If EXCEPT is specified, then the number of duplicates of R that T contains is the maximum of (m – n) and 0 (zero).

C) If INTERSECT is specified, then the number of duplicates of R that T contains is the minimum of m and n.

Sign up to request clarification or add additional context in comments.

Comments

0

EXCEPT will provide you the distinct result eliminating the values present in the the table you validate against. It is similar to MINUS operator in relational algebra. Lets take your example,

CREATE TABLE T1 (C1 INT);
CREATE TABLE T2 (C1 INT);

INSERT INTO T1 VALUES (1), (2), (2), (3), (3), (3);
INSERT INTO T2 VALUES (3);

EXCEPT will provide you the distinct values,

enter image description here

EXCEPT all will provide you the result eliminating the no. of occurrences of a value. In T2 table, you have value 3 with only one row, but in T1 table you have three rows with the value 3. When you use EXCEPT ALL it will just ignore only one occurrence present in the second table and provide you the result.

enter image description here

Now we can try inserting three rows in the T2 table of value 3 and check the result.

INSERT INTO T1 VALUES (1), (2), (2), (3), (3), (3);
INSERT INTO T2 VALUES (3), (3), (3);

Your result of EXCEPT remains same as we have value 3 alone.

But the result of EXCEPT ALL is below, which eliminated the 3 occurrences of the rows with value 3

enter image description here

1 Comment

Thank you for your thoughts Jim. I understand WHAT it does, my question is SHOULD it be that way? If I understand the ANSI SQL standard specifications, this behavior is what MULTISET EXCEPT ALL is for, not for (SET) EXCEPT ALL.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.