1

I'm coming from a Postgres background and trying to convert my application to MySQL. I have a query which is very fast on Postgres and very slow on MySQL. After doing some analysis, I have determined that one cause of the drastic speed difference is nested queries. The following pseudo query takes 170 ms on Postgres and 5.5 seconds on MySQL.

SELECT * FROM (
  SELECT id FROM a INNER JOIN b
) AS first LIMIT 10

On both MySQL and Postgres the speed is the same for the following query (less than 10 ms)

SELECT id FROM a INNER JOIN b LIMIT 10

I have the exact same tables, indices, and data on both databases, so I really have no idea why this is so slow.

Any insight would be greatly appreciated.

Thanks

EDIT

Here is one specific example of why I need to do this. I need to get the sum of max. In order to do this I need a sub select as shown in the query below.

SELECT SUM(a) AS a
  FROM (
    SELECT table2.b, MAX(table1.a) AS a
    FROM table1
    INNER JOIN table2 ON table2.abc_id = table1.abc_id
      AND table1.read_datetime >= table2.issuance_datetime
      AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
    WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
    GROUP BY table2.id, b
) AS first
GROUP BY b
LIMIT 10

Again this query takes 14 seconds on MySQL and 238 ms on Postgres. Here is the output from explain on MySQL:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,25584,Using temporary; Using filesort
2,DERIVED,table2,index,PRIMARY,index_table2_on_b,index_table2_on_d,index_table2_on_issuance_datetime,index_table2_on_unassignment_datetime,index_table2_on_e,PRIMARY,4,\N,25584,Using where
2,DERIVED,tz,ref,index_table1_on_d,index_table1_on_read_datetime,index_table1_on_d_and_read_datetime,index_table1_on_4,4,db.table2.dosimeter_id,1,Using where
15
  • What is the case for the outside nest? It would also be helpful if you posted the EXPLAIN output and table structure. Commented Jul 23, 2013 at 19:48
  • 1
    Did you run the query with explain - if so what is the output ? Commented Jul 23, 2013 at 19:49
  • 1
    @Mark With 9.1+ that is legal as long as table2.id is the primary key. Commented Jul 23, 2013 at 20:29
  • 1
    You should be providing table definitions and indexes to go with that query. Commented Jul 23, 2013 at 23:15
  • 2
    Have you considered the obvious solution? Not to move to MySQL since Postgres works so much better for you? On a different note: LIMIT without ORDER BY produces rather arbitrary results. Commented Jul 23, 2013 at 23:27

4 Answers 4

2

Jon, answering your comment, here is an example:

drop table if exists temp_preliminary_table;
create temporary table temp_preliminary_table
    SELECT table2.b, MAX(table1.a) AS a
    FROM table1
    INNER JOIN table2 ON table2.abc_id = table1.abc_id
      AND table1.read_datetime >= table2.issuance_datetime
      AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
    WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
    GROUP BY table2.id, b;
-- I suggest you add indexes to this temp table
alter table temp_preliminary_table
    add index idx_b(b); -- Add as many indexes as you need
-- Now perform your query on this temp_table
SELECT SUM(a) AS a
FROM temp_preliminary_table
GROUP BY b
LIMIT 10;

This is just an example, splitting your query in three steps.

You need to remember that temp tables in MySQL are only visible to the connection that created them, so any other connection won't see temp tables created by another connection (for better or worse).

This "divide-and-conquer" approach has saved me many headaches. I hope it helps you.

Sign up to request clarification or add additional context in comments.

4 Comments

@Barranka-- I'm confused as to why this approach is supposed to help. It seems like you've just moved the cost of the query into building the table, but the main problem (that the inner join is slow) is still around, right? Is the idea that these temporary tables can basically cache the cost of the inner join so that it only has to happen once?
@mmr Exactly. When you build the temp table you execute the inner query only once. Another advantage is that you can create indexes on the query to speed things up. I know it seems odd, but in some cases it does save some time. Just give it a try... the worse thing that can happen is that it doesn't speed things up for you as it does for me.
@Barranka Totally agree! In fact this approach usually seems to work so well even for the most trivial data quantities that any course on MySQL should probably advise against any use of nested queries. Just one thing with your example: I'd've put "create temporary table temp_preliminary_table ( key (b) ) ENGINE=MEMORY"
@mikerodent It's not a bad idea, but if the temp table is big, it might eat all available RAM... I rather leave the decision to MySQL wether to create the table on RAM or on disk (as far as I know, there's a tunning parameter that will tell MySQL when to transfer the temp table from memory to disk... all temp tables are "born" in RAM and eventually copied to disk if the size is big enough)
1

In the nested query MySQL is doing the whole join before applying the limit while postgresql is smart enough to figure out that it is only necessary to join any 10 tuples.

8 Comments

How do I encourage MySQL to be smarter?
move the limit into the subquery.
In this toy example that is easy, however in the full case this is not possible.
MySQL is notoriously bad at subqueries or complex joins. This is one area that PostgreSQL is much better and provides performance that is orders of magnitude faster.
Hmm, that is not good. Is there any parameter tuning that I can do to make MySQL comparable with Postgres? What would be another solution to this problem, would I have to denormalize my database?
|
0

Correct me if I am wrong, but why don't you try:

SELECT * FROM a INNER JOIN b LIMIT 10;

2 Comments

This would work fine in this toy example, I posted a more complex example above.
If you use only GROUP BY b ? and not GROUP BY table2.id
0

Given the fact that table2.id is the primary key this query with the limit in the inner query is functionally equivalent to yours where the limit is in the outer query and that is what the Postgresql planner figured out.

SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
  AND table1.read_datetime >= table2.issuance_datetime
  AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
order by a desc
LIMIT 10

6 Comments

That may be true, but I also need to ORDER BY SUM(a) which requires the LIMIT on the outside. Again Postgres is MUCH faster.
@Jon You should have posted that also. But still with table2.id as the PK you can order it in the inner query as in my updated answer.
@Jon BTW looking at it again the outer query is not necessary at all as that sum does nothing.
If there is no outer query how do you get the sum of the max values?
@Jon the max values and the outer grouping are done on the PK. The sum is itself. Try both and check.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.