1

I have a model representing an article:

public class Article {

    @Id
    private Integer id;
    private String title;
    private String content;
    // ...
    // plenty of other article properties, used to classify and filter over them
    // ...
    @ElementCollection
    @CollectionTable(name = "article_tags", joinColumns = @JoinColumn(name = "article_id"))
    @Column(name = "tag")
    private Set<String> tags;
}

And a specification used among oters to build a query dynamically:

...
    public static Specification<Article> byTagAnyOf(Set<String> referenceTags) {
        return (root, query, builder) -> {
           if (CollectionUtils.isEmpty(referenceTags)) {
               return builder.conjunction();
           }
            return builder.or(referenceTags.stream()
                    .map(tag -> builder.isMember(tag, root.get("tags")))
                    .toArray(Predicate[]::new));
        };
    }
...

This solution actually works, but creates a query with numerous or statements, performing selection of tag elements, what causes performance problems.

Is there a way to perform array intersection check either by joining referenceTags or by selecting except by means of Specification API?

EDIT 1 (add Hibernate generated sql)

Hibernate: 
    select
        article0_.id,
        article0_.title,
        article0_.content
    from
        articles article0_ 
    where
        1=1 
        and 1=1 
        and (
            article0_.author in (
                ? , ? , ?
            )
        ) 
        and (
            ? in (
                select
                    tagart1_.tag 
                from
                    article_tags tagart1_ 
                where
                    article0_.id=tagart1_.article_id
            ) 

            ...
            
            or ? in (
                select
                    tagart6_.tag 
                from
                    article_tags tagart6_ 
                where
                    article0_.id=tagart6_.article_id
            )
        ) 
        and 1=1 
        and 1=1 
        and 1=1 
        and 1=1 
        and 1=1 
        and 1=1 
        and 1=1

2 Answers 2

0

You could have several reasons because for your performance issue, and for me make sense to investigate all of them.

Regarding your issue, you have many OR conditions because it's you that specify this with builder.or .

Honestly I didn't understand what you mean with array intersection, but I have a question for you: Have you consider to use the IN statement instead an OR of all the possible tags?

The second thing I suggest you to check is the generated query, please verify what is the query that is fired in the Database (you can found it in the logs if you activate it) and things will be more clear at least for me.

I perfectly understand that you are using Specification for generate queries dynamically, there is nothing wrong in this, but It's also true that you are generating "unpredictable" queries and this will force you to create all the possible indexes combination in the database (it's crazy to generate an index for all the possible combination).

A chain of OR clause could prevent the database to use the correct indexes, try with IN statement and let me know.

In conclusion my suggestions are:

  1. I strongly suggest you to re-think about the possibility to filter on all the possible fields because this stress the database and it's the reason of your performance issues.
  2. Generate the proper indexes in the database.
  3. analyze your query in order to see how the database behave.
  4. Consider the usage of IN statement instead a chain of OR
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, Titto! Thank you for your answer. Yes, I consider to use IN clause, not simply iterate over all possible options. I've edited my post with hibernate generated script. Welcome to Stackoverflow, by the way :-)
Hey @meridbt I'm checking here the generated query you post. Potentially this query raise up some warns in my head. Starting with the big amounts of AND 1=1 statement, but also I see the OR clause nested inside an AND condition. This is super bad for the performance. I Think here Hibernate is not generating a good query, probably you should reconsider the usage of the tagd list in your model. But now I want to ask you something else, can you produce a real executable query based on the one you post and provide the output of analyze format=json <your_query_here> pls?
1=1 is the result of builder.conjunction()
sure, but this is fired in the DB, make it sense?
@meridbt I see your final result. Congratulation this is the way. Hope I helped you to drive your choice. At the end using the IN statement instead builder.or was the right choice.
0

The solution I found is quite straightforward:

...
    public static Specification<Article> byTagAnyOf(Set<String> referenceTags) {
        return (root, query, builder) -> {
           if (CollectionUtils.isEmpty(referenceTags)) {
               return builder.conjunction();
           }
            return root.joinSet("tags").in(referenceTags);
        };
    }
...

and the result generated sql:

Hibernate: 
    select
        article0_.id,
        article0_.title,
        article0_.content
    from
        articles article0_ 
    inner join
        article_tags tagart1_
            on article0_.id=tagart1_article_id
    where
        1=1
        and (
            article0_.author in (
                ? , ? , ?
            )
        ) 
        and (
            tagart1_.tag in (
                ? , ? , ? , ? , ? , ?
            )
        ) 

Result: ~140 ms comparing to 2000 ms before

5 Comments

that is not equivalent to the initial query (hint: think about duplicates): you need to either use distinct or replace join with exists
Can you please explain what you mean by duplicates?
imagine that article has two tags: tag1 and tag2, if you pass to byTagAnyOf both tag1 and tag2 the SQL query will return two rows
Yes, and then Hibernate will transform extracted result set into a managed object in the persistence context according to desired model
Hi @AndreyB.Panfilov, I got your point, thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.