How to create index in postgresql for regexp_matches?

Question

I have a table product

product_id | desciption                                     
============================================================
322919     | text {add}185{/add} text                       
322920     | text {add}184{/add} text {add}185{/add} text   
322921     | text {add}185{/add} text {add}187{/add} text

sql query with like is very slow

SELECT product_id, desciption 
FROM product 
WHERE LOWER(desciption) like '%{add}185{/add}%'
> Time: 340,159s

I only need an index to search for {add}185{/add} expressions. i.e. need to make an index for this table

SELECT product_id, regexp_matches (desciption, '(\{add\}\d+\{\/add\})', 'g') 
FROM product

returns:

product_id | regexp_matches 
================================================================================
322919     | {"{add}185{/add}"}
322920     | {"{add}184{/add}"}
322920     | {"{add}185{/add}"}
322921     | {"{add}185{/add}"}
322921     | {"{add}187{/add}"}

Which is better to create an index for data sampling?
Which is better to use the expression in "WHERE"?

You should look into full text indexes or GIN indexes.

Gordon Linoff
– Gordon Linoff

2020-02-26 12:55:42 +00:00
Commented Feb 26, 2020 at 12:55 — Gordon Linoff
– Gordon Linoff, Commented Feb 26, 2020 at 12:55

jjanes · Accepted Answer · 2020-02-26 15:22:01Z

10

The easiest solution is just to build a pg_trgm index.

 create extension pg_trgm;
 create index on product using gin (description gin_trgm_ops);

Then you can use the same query, only remove the LOWER and change the LIKE to ILIKE.

That should probably be good enough, but if it isn't you can make a more targeted index. You will need to create a helper function to do an aggregation, as you can't put an aggregate directly into a functional index.

create function extract_tokens(text) returns text[] immutable language sql as $$ 
   select array_agg(regexp_matches[1]) from 
      regexp_matches ($1, '\{add\}(\d+)\{\/add\}+','g') 
$$;

Note that I moved the capturing parenthesis in, so they only get the digits and not the surrounding tags, which just seem like noise. The fact that there was a match is evidence they were there, we don't need to see them.

create index on product using gin (extract_tokens(description))

select * from product where extract_tokens(description) @> ARRAY['185'];

answered Feb 26, 2020 at 15:22

jjanes

45k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Zamkovyi Volodymyr Over a year ago

Perfectly > Time: 0,253s

hoangnh · Accepted Answer · 2020-02-26 08:21:37Z

0

For better searching you need to create index for column 'description'

When using like, remember only this wildcard work with index

SELECT product_id, desciption FROM product WHERE LOWER(desciption) like '{add}185{/add}%'

So your query above not work with index

answered Feb 26, 2020 at 8:21

hoangnh

2494 silver badges14 bronze badges

Collectives™ on Stack Overflow

How to create index in postgresql for regexp_matches?

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related