MySQL nested query speed

Question

I'm coming from a Postgres background and trying to convert my application to MySQL. I have a query which is very fast on Postgres and very slow on MySQL. After doing some analysis, I have determined that one cause of the drastic speed difference is nested queries. The following pseudo query takes 170 ms on Postgres and 5.5 seconds on MySQL.

SELECT * FROM (
  SELECT id FROM a INNER JOIN b
) AS first LIMIT 10

On both MySQL and Postgres the speed is the same for the following query (less than 10 ms)

SELECT id FROM a INNER JOIN b LIMIT 10

I have the exact same tables, indices, and data on both databases, so I really have no idea why this is so slow.

Any insight would be greatly appreciated.

Thanks

EDIT

Here is one specific example of why I need to do this. I need to get the sum of max. In order to do this I need a sub select as shown in the query below.

SELECT SUM(a) AS a
  FROM (
    SELECT table2.b, MAX(table1.a) AS a
    FROM table1
    INNER JOIN table2 ON table2.abc_id = table1.abc_id
      AND table1.read_datetime >= table2.issuance_datetime
      AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
    WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
    GROUP BY table2.id, b
) AS first
GROUP BY b
LIMIT 10

Again this query takes 14 seconds on MySQL and 238 ms on Postgres. Here is the output from explain on MySQL:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,25584,Using temporary; Using filesort
2,DERIVED,table2,index,PRIMARY,index_table2_on_b,index_table2_on_d,index_table2_on_issuance_datetime,index_table2_on_unassignment_datetime,index_table2_on_e,PRIMARY,4,\N,25584,Using where
2,DERIVED,tz,ref,index_table1_on_d,index_table1_on_read_datetime,index_table1_on_d_and_read_datetime,index_table1_on_4,4,db.table2.dosimeter_id,1,Using where

What is the case for the outside nest? It would also be helpful if you posted the EXPLAIN output and table structure. — Kermit
– Kermit, Commented Jul 23, 2013 at 19:48
Did you run the query with explain - if so what is the output ? — Ian Kenney
– Ian Kenney, Commented Jul 23, 2013 at 19:49
@Mark With 9.1+ that is legal as long as table2.id is the primary key. — Clodoaldo Neto
– Clodoaldo Neto, Commented Jul 23, 2013 at 20:29
You should be providing table definitions and indexes to go with that query. — Erwin Brandstetter
– Erwin Brandstetter, Commented Jul 23, 2013 at 23:15
Have you considered the obvious solution? Not to move to MySQL since Postgres works so much better for you? On a different note: LIMIT without ORDER BY produces rather arbitrary results. — Erwin Brandstetter
– Erwin Brandstetter, Commented Jul 23, 2013 at 23:27

Barranka · Accepted Answer · 2013-07-23 20:42:57Z

2

Jon, answering your comment, here is an example:

drop table if exists temp_preliminary_table;
create temporary table temp_preliminary_table
    SELECT table2.b, MAX(table1.a) AS a
    FROM table1
    INNER JOIN table2 ON table2.abc_id = table1.abc_id
      AND table1.read_datetime >= table2.issuance_datetime
      AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
    WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
    GROUP BY table2.id, b;
-- I suggest you add indexes to this temp table
alter table temp_preliminary_table
    add index idx_b(b); -- Add as many indexes as you need
-- Now perform your query on this temp_table
SELECT SUM(a) AS a
FROM temp_preliminary_table
GROUP BY b
LIMIT 10;

This is just an example, splitting your query in three steps.

You need to remember that temp tables in MySQL are only visible to the connection that created them, so any other connection won't see temp tables created by another connection (for better or worse).

This "divide-and-conquer" approach has saved me many headaches. I hope it helps you.

answered Jul 23, 2013 at 20:42

Barranka

21.1k14 gold badges72 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mmr Over a year ago

@Barranka-- I'm confused as to why this approach is supposed to help. It seems like you've just moved the cost of the query into building the table, but the main problem (that the inner join is slow) is still around, right? Is the idea that these temporary tables can basically cache the cost of the inner join so that it only has to happen once?

Barranka Over a year ago

@mmr Exactly. When you build the temp table you execute the inner query only once. Another advantage is that you can create indexes on the query to speed things up. I know it seems odd, but in some cases it does save some time. Just give it a try... the worse thing that can happen is that it doesn't speed things up for you as it does for me.

mike rodent Over a year ago

@Barranka Totally agree! In fact this approach usually seems to work so well even for the most trivial data quantities that any course on MySQL should probably advise against any use of nested queries. Just one thing with your example: I'd've put "create temporary table temp_preliminary_table ( key (b) ) ENGINE=MEMORY"

Barranka Over a year ago

@mikerodent It's not a bad idea, but if the temp table is big, it might eat all available RAM... I rather leave the decision to MySQL wether to create the table on RAM or on disk (as far as I know, there's a tunning parameter that will tell MySQL when to transfer the temp table from memory to disk... all temp tables are "born" in RAM and eventually copied to disk if the size is big enough)

Clodoaldo Neto · Accepted Answer · 2013-07-23 19:52:07Z

1

In the nested query MySQL is doing the whole join before applying the limit while postgresql is smart enough to figure out that it is only necessary to join any 10 tuples.

answered Jul 23, 2013 at 19:52

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

8 Comments

Jon Over a year ago

How do I encourage MySQL to be smarter?

Marc B Over a year ago

move the limit into the subquery.

Jon Over a year ago

In this toy example that is easy, however in the full case this is not possible.

CadentOrange Over a year ago

MySQL is notoriously bad at subqueries or complex joins. This is one area that PostgreSQL is much better and provides performance that is orders of magnitude faster.

Jon Over a year ago

Hmm, that is not good. Is there any parameter tuning that I can do to make MySQL comparable with Postgres? What would be another solution to this problem, would I have to denormalize my database?

|

Lefteris Bab · Accepted Answer · 2013-07-23 19:57:00Z

0

Correct me if I am wrong, but why don't you try:

SELECT * FROM a INNER JOIN b LIMIT 10;

answered Jul 23, 2013 at 19:57

Lefteris Bab

7879 silver badges19 bronze badges

2 Comments

Jon Over a year ago

This would work fine in this toy example, I posted a more complex example above.

Lefteris Bab Over a year ago

If you use only GROUP BY b ? and not GROUP BY table2.id

Clodoaldo Neto · Accepted Answer · 2013-07-23 20:49:38Z

0

Given the fact that table2.id is the primary key this query with the limit in the inner query is functionally equivalent to yours where the limit is in the outer query and that is what the Postgresql planner figured out.

SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
  AND table1.read_datetime >= table2.issuance_datetime
  AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
order by a desc
LIMIT 10

edited Jul 23, 2013 at 20:49

answered Jul 23, 2013 at 20:37

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

6 Comments

Jon Over a year ago

That may be true, but I also need to ORDER BY SUM(a) which requires the LIMIT on the outside. Again Postgres is MUCH faster.

Clodoaldo Neto Over a year ago

@Jon You should have posted that also. But still with table2.id as the PK you can order it in the inner query as in my updated answer.

Clodoaldo Neto Over a year ago

@Jon BTW looking at it again the outer query is not necessary at all as that sum does nothing.

Jon Over a year ago

If there is no outer query how do you get the sum of the max values?

Clodoaldo Neto Over a year ago

@Jon the max values and the outer grouping are done on the PK. The sum is itself. Try both and check.

|

Collectives™ on Stack Overflow

MySQL nested query speed

4 Answers 4

4 Comments

8 Comments

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

8 Comments

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related