I have a query in PostgresSQL accessing a big table using a LIKE clause for pattern matching:
Table "rmx_service_schema.document"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id | character varying(36) | | not null | | extended | | |
file_name | character varying(512) | | not null | | extended | | |
...
The query has very good selectivity:
select count(*) from RMX_SERVICE_SCHEMA.DOCUMENT d1_0;
count
--------
630015
select count(*) from RMX_SERVICE_SCHEMA.DOCUMENT d1_0 where d1_0.FILE_NAME LIKE 'sunet_attachments/20240207.xml';
count
-------
1
The application somtimes uses % at the end of the pattern, so replacing the LIKE by = is not always possible
I have created an index on that column with the matching operator definition:
CREATE INDEX rse_tmp_doc_file_name ON RMX_SERVICE_SCHEMA.DOCUMENT (file_name varchar_pattern_ops);
But still, the pattern matching query does a Seq Scan:
EXPLAIN ANALYZE select id from RMX_SERVICE_SCHEMA.DOCUMENT d1_0 where d1_0.FILE_NAME LIKE 'sunet_attachments/2024020.xml';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..129562.16 rows=63 width=37) (actual time=81.075..90.793 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on document d1_0 (cost=0.00..128555.86 rows=16 width=37) (actual time=72.099..77.022 rows=0 loops=5)
Filter: ((file_name)::text ~~ 'sunet_attachments/20240207.xml'::text)
Rows Removed by Filter: 126007
Planning Time: 0.285 ms
Execution Time: 90.814 ms
(8 rows)
If I replace the LIKE by =, it uses the index:
EXPLAIN ANALYZE select id from RMX_SERVICE_SCHEMA.DOCUMENT d1_0 where d1_0.FILE_NAME ='sunet_attachments/20240207.xml';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using rse_tmp_doc_file_name on document d1_0 (cost=0.55..8.57 rows=1 width=37) (actual time=0.025..0.026 rows=1 loops=1)
Index Cond: ((file_name)::text = 'sunet_attachments/20240207.xml'::text)
Planning Time: 0.053 ms
Execution Time: 0.034 ms
(4 rows)
Did I miss some stpes required to make this btree index usable for pattern matching query?
Indexes:
"pk_document" PRIMARY KEY, btree (id)
....
"rse_tmp_doc_file_name" btree (file_name varchar_pattern_ops)
I was expecting the index I created is used for pattern matching, too, as long as selectivity is good and the pattern doesn't start by wildcards.
I have tried SET enable_seqscan=off, as suggested. The plan changed, but is still very slow:
EXPLAIN ANALYZE select id from RMX_SERVICE_SCHEMA.DOCUMENT d1_0 where d1_0.FILE_NAME LIKE 'sunet_attachments/20240207.xml';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=31133.85..158837.55 rows=63 width=37) (actual time=300.945..314.717 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Bitmap Heap Scan on document d1_0 (cost=30133.85..157831.25 rows=16 width=37) (actual time=290.728..297.328 rows=0 loops=5)
Filter: ((file_name)::text ~~ 'sunet_attachments/20240207.xml'::text)
Rows Removed by Filter: 71233
Heap Blocks: exact=19555
-> Bitmap Index Scan on rse_tmp_doc_file_name (cost=0.00..30133.83 rows=355328 width=0) (actual time=149.426..149.426 rows=356167 loops=1)
Index Cond: (((file_name)::text ~>=~ 'sunet'::text) AND ((file_name)::text ~<~ 'suneu'::text))
Planning Time: 0.176 ms
Execution Time: 314.747 ms
(11 rows)
But this plan gave me the right hint. The problem is the _ after the string sunet. This has to be escaped, otherwise it isn't selective, since about 50% of the file_name values in the table start with sunet. With correct escaping in the SQL, the index works:
EXPLAIN ANALYZE select id from RMX_SERVICE_SCHEMA.DOCUMENT d1_0 where d1_0.FILE_NAME LIKE 'sunet\_attachments/20240207.xml' escape '\';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using rse_tmp_doc_file_name on document d1_0 (cost=0.55..8.57 rows=63 width=37) (actual time=0.014..0.015 rows=1 loops=1)
Index Cond: (((file_name)::text ~>=~ 'sunet_attachments/20240207_10111647337'::text) AND ((file_name)::text ~<~ 'sunet_attachments/20240207'::text))
Filter: ((file_name)::text ~~ 'sunet\_attachments/20240207.xml'::text)
Planning Time: 0.152 ms
Execution Time: 0.024 ms
(5 rows)
SET enable_seqscan = off;, then run theEXPLAIN ANALYZEwithLIKEagain and add the result to the question?