2

I'm having trouble wrapping my head around the right way to use EXISTS (and whether there is a right way to use EXISTS for this particular case, or if I'm misunderstanding it).

I'm working against the Rigor schema (defined for SQLAlchemy here: https://github.com/blindsightcorp/rigor/blob/master/lib/types.py ).

The short of it is I have three tables I care about: "percept", "annotation", and "annotation_property". annotation_properties have an annotation_id, annotations have a percept_id.

I want to find all of the percepts that have annotations with a specific annotation_property (FOO=BAR).

Percepts may have many annotations that have a specific property, so it seems like an EXISTS should make things faster.

The (relatively slow) option is:

 SELECT DISTINCT(percept.*) FROM percept, annotation, annotation_property 
        WHERE percept.id = annotation.percept_id AND 
              annotation_property.annotation_id = annotation.id AND 
              annotation_property.name = 'FOO' AND annotation_property.value = 'BAR';

How would I use EXISTS to optimize this?

It feels like the first step is something like:

 SELECT percept.* FROM percept WHERE id IN (SELECT percept_id FROM 
        annotation, annotation_property WHERE 
        annotation.id = annotation_property.annotation_id AND
        annotation_property.name = 'FOO' AND annotation_property.value = 'BAR');

But I don't see where to go from here....

2 Answers 2

2

To begin with, use ANSI JOIN syntax to distinguish your join conditions from your filter conditions. The result is easier to read, and it better displays the structure of your data:

SELECT DISTINCT(percept.*)
FROM
  percept
  JOIN annotation ON percept.id = annotation.percept_id
  JOIN annotation_property ON annotation_property.annotation_id = annotation.id
WHERE
    annotation_property.name = 'FOO'
    AND annotation_property.value = 'BAR'
;

It would probably be an improvement to do as you said, and use distinct on the primary key column instead of on a whole percept row at a time, but that still likely involves computing a large result set and then merging it down. It is an alternative to an exists() condition, not a supplement to one.

Employing an EXISTS condition in the WHERE clause might look like this:

SELECT *
FROM percept p
WHERE EXISTS (
    SELECT *
    FROM
      annotation ann
      JOIN annotation_property anp
        ON anp.annotation_id = ann.id
    WHERE
        anp.name = 'FOO'
        AND anp.value = 'BAR'
        AND ann.percept_id = p.id
  )
;
Sign up to request clarification or add additional context in comments.

Comments

2

The problem with your original query (apart from the implicit join syntax), is that you are bringing together lots of rows from the joins. Then you are aggregating to remove duplicates.

You can eliminate the duplication removal by just selecting from one table:

SELECT p.*
FROM percept p
WHERE EXISTS (SELECT 1
              FROM annotation a JOIN
                   annotation_property ap
                   ON ap.annotation_id = a.id AND 
                      ap.name = 'FOO' AND ap.value =  'BAR'
              WHERE p.id = a.percept_id 
             ) ;

This assumes that the rows in percept do not have duplicates, but that seems like a reasonable assumption.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.