2

recently i've installed PostgreSQL 9.0 w/ Postgis on a windows7 machine. Then, i've loaded several tables into one schema. Now, i want to do a very simple query and at this time, i can see that it takes more then 10minutes. I've searched through alot of places, also in stackoverflow.com and until now i can understand what is my mistake.

my problem:

Tbl_Proprietarios - 230000 records

Tbl_Predio - 160000 records

SELECT id_predios 
FROM "Tbl_Predio"
where id_predios not in 
    (
    SELECT id_predios 
    FROM "Tbl_Proprietarios"
    )
;

thanks

3
  • 1
    Do you have an index on id_predios column in both tables? Commented Oct 6, 2011 at 14:42
  • yes i have ( forgot to tell that ) Commented Oct 6, 2011 at 14:42
  • Is id_predios's data type the same (presumably integer) in both tables ? Commented Oct 6, 2011 at 22:06

3 Answers 3

2

Try a left outer join:

SELECT Tbl_Predio.id_predios
FROM Tbl_Predio
LEFT OUTER JOIN Tbl_Proprietarios ON Tbl_Predio.id_predios = Tbl_Proprietarios.id_predios
WHERE Tbl_Proprietarios.id_predios IS NULL;

Also, make sure that there is an index on Tbl_Proprietarios.id_predios.

Sign up to request clarification or add additional context in comments.

Comments

0

On my experience on SQL Server in operator usually is very slow.

Usually is better exists:

SELECT id_predios 
  FROM "Tbl_Predio"
 WHERE not exists (
           SELECT 1
             FROM "Tbl_Proprietarios"
            WHERE "Tbl_Proprietarios".id_predios = "Tbl_Predio".id_predios
       )

(Note in MySQL usually occurs the opposite, but logically depends the query you are running)

Or you can use a left join:

SELECT id_predios 
  FROM "Tbl_Predio"
  left join "Tbl_Proprietarios" on "Tbl_Proprietarios".id_predios = "Tbl_Predio".id_predios
 WHERE "Tbl_Proprietarios".id_predios is null

To know what is happening use EXPLAIN.

1 Comment

i dont know how but both of your querys are fast and.... i've already tried to use Exists and LEFT JOIN's..... after 2 days on this... better to have a snack! Thank you
0

Should run in a few seconds on a decent machine. Table definition ? Indexes ? query plan ? Configuration ? Memory?

SELECT id_predios 
FROM Tbl_Predio t1
WHERE NOT EXISTS ( 
  SELECT *
  FROM Tbl_Proprietarios t2
  WHERE t2.id_predios = t1.id_predios
  )
;

EDIT: Query plan with 2 * 999K records:

 Hash Anti Join  (cost=29813.47..61463.98 rows=79 width=4) (actual time=3470.658..7142.042 rows=1045 loops=1)
   Hash Cond: (t1.id_predios = t2.id_predios)
   ->  Seq Scan on tbl_predio t1  (cost=0.00..13912.33 rows=999033 width=4) (actual time=0.038..1458.946 rows=999033 loops=1)
   ->  Hash  (cost=13911.54..13911.54 rows=998954 width=4) (actual time=3238.919..3238.919 rows=998954 loops=1)
         ->  Seq Scan on tbl_proprietarios t2  (cost=0.00..13911.54 rows=998954 width=4) (actual time=0.057..1479.807 rows=998954 loops=1)
 Total runtime: 7143.919 ms
(6 rows)

EDIT2: the test script:

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp;

DROP TABLE tmp.Tbl_Predio CASCADE;
DROP TABLE tmp.Tbl_Proprietarios CASCADE;

CREATE TABLE tmp.Tbl_Predio ( id_predios INTEGER NOT NULL );
CREATE TABLE tmp.Tbl_Proprietarios ( id_predios INTEGER NOT NULL );

INSERT INTO tmp.Tbl_Predio ( id_predios) SELECT serie.val
    FROM generate_series(1,1000000) AS serie(val)
    ;

INSERT INTO tmp.Tbl_Proprietarios ( id_predios) SELECT serie.val
    FROM generate_series(1,1000000) AS serie(val)
    ;

DELETE FROM tmp.Tbl_Predio WHERE random() <= 0.001 ;
DELETE FROM tmp.Tbl_Proprietarios WHERE random() <= 0.001 ;

ALTER TABLE tmp.Tbl_Predio ADD PRIMARY KEY (id_predios) ;
ALTER TABLE tmp.Tbl_Proprietarios ADD PRIMARY KEY (id_predios) ;

EXPLAIN ANALYZE
SELECT id_predios
FROM tmp.Tbl_Predio t1
WHERE NOT EXISTS (
  SELECT *
  FROM tmp.Tbl_Proprietarios t2
  WHERE t2.id_predios = t1.id_predios
  )
;

8 Comments

Your query suggestion is also slow :(
What flag of Configuration should i check ?
Your table definitions and query plan are also slow :-)
Memory (shared_buffers, effective_cache_size), file descriptors. Is your disk a real disk or a network thing?
shared_buffers = 32MB? That's just enough to start the database service... wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server and show us the result of EXPLAIN
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.