1

Let's assume I have a table with many records called comments, and each record includes only a text body:

CREATE TABLE comments(id INT NOT NULL, body TEXT NOT NULL, PRIMARY KEY(id));
INSERT INTO comments VALUES (generate_series(1,100), md5(random()::text));

Now, I have an input array with N substrings, with arbitrary length. For example:

abc
xyzw
123456
not_found

For each input value, I want to return all rows that match a certain condition.

For example, given that the table includes the following records:

| id | body        |
| -- | ----------- |
| 11 | abcd1234567 |
| 22 | unkown12    |
| 33 | abxyzw      |
| 44 | 12345abc    |
| 55 | found       |

I need a query that returns the following result:

| substring | comments.id | comments.body |
| --------- | ----------- | ------------- |
| abc       | 11          | abcd1234567   |
| abc       | 44          | 12345abc      |
| xyzw      | 33          | abxyzw        |
| 123456    | 11          | abcd1234567   |

So far, I have this SQL query:

SELECT substrings, comments.id, comments.body
FROM unnest(ARRAY[
  'abc',
  'xyzw',
  '123456',
  'not_found'
]) AS substrings
JOIN comments ON comments.id IN (
  SELECT id
  FROM comments as inner_comments
  WHERE inner_comments.body LIKE ('%' || substrings || '%')
);

But the database client gets stuck for more than 10 minutes. And I missing something about joins?

Please note that this is a simplified example of my problem. My current check on the comment is not a LIKE statement, but a complex switch-case statement of different functions (fuzzy matching).

2 Answers 2

1

The detour with the IN is unnecessary and unless the optimizer can rewrite this and it likely cannot, adds overhead. Try if it gets better without.

SELECT un.substring,
       comments.id,
       comments.body
       FROM unnest(ARRAY['abc',
                         'xyzw',
                         '123456',
                         'not_found']) un (substring)
       INNER JOIN comments
                  ON comments.body LIKE ('%' || un.substring || '%');

But still indexes cannot be used here because of the wildcard at the beginning. You might want to look at Full Text Search and see what options you have with it to improve the situation.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this is exactly what I needed. I didn't know that you could name columns after an unnest call. Don't worry about the index; the LIKE condition is not actually used on my real-world problem.
0

Basically you are performing FULLTEXT search in a column that most likely doesn't have a FULLTEXT index.

A first step you could try would be to have your column "body" FULLTEXT indexed. See details here and then perform the search using CONTAINS but, quite honestly, since you want to perform fuzzy matching you cannot rely on SQL server to perform the search - it would just not work properly. You will need an indexing service such as ElasticSearch, CloudSearch, Azure Search, etc

2 Comments

I don't think that this is the problem. If I run a query for a single value (not an array), like SELECT id, body FROM comments WHERE comments.body LIKE ('%abcdef%');, it takes no more than 100ms. I don't know, but shouldn't it take no more than 1 second for 4 records, instead of more than 10 minutes?
Please check @sticky bit solution but most likely you only have a couple of entries in your table. If it scales your perfomance even for a simple query using LIKE will deteriore fast...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.