Translate

Wednesday, October 14, 2009

MySQL – optimize your query to be more scalable (Part 2/2)


databaseExperiment 3

The next experiment is to discover whether using the combination index can speed up the query. The following indices are created:

CREATE INDEX Id on T1 (Id);
CREATE INDEX Time_Id on T2 (Time, Id);

Note: The order of multiple columns index can affect how the query works. See MySQL documentation for more details.

We name this index scheme as E3 and use FORCE INDEX on Time_Id. Same again, the query is run 3 times with FLUSH TABLES.

E3
Test 129 secs
Test 231 secs
Test 332 secs
Average30.66 secs

In fact, this is fractionally slower. The reason may be that the index for T2 table becomes slightly larger and it takes slighltly longer for MySQL load up the index.

Another interesting question how will E3-Full (E3 index scheme with range query cover the full T2 table) compare to E2-Full?

E2-FullE3-Full
Test 12 mins 3 secs1 min 24 secs
Test 21 min 59 secs1 min 44 secs
Test 31 min 59 secs1 min 36 secs
Average2 mins .33 secs1 min 34 secs

Interestingly, using multi-columns index does further improve the performance on a larger scale of query.

Experiment 4

The last experiment uses full covering indices both tables and observes any significant improvement.

CREATE INDEX Id_A_B on T1 (Id, A, B);
CREATE INDEX Time_Id_C on T2 (Time, Id, C);

We name this index scheme as E4 and use FORCE INDEX on Time_Id_C. Same again, the query is run 3 times with FLUSH TABLES.

E4
Test 127 secs
Test 226 secs
Test 326 secs
Average26.33 secs

Having both sides of covering index seems to have a little impact on the performance on sub-range query. On the other hand, there is a noticeable improvement with full range query, E4-Full. This may indicate that the sub-range used (covers 3.9 million rows, 59% of T2 table) for the experiments hasn’t been substantial enough to see the benefit of multiple columns indexing. Nonetheless, there seems a potential for a much larger database table. Perhaps, this would be our next investigation in the future blog, ie varying the ratio of sub-range with covering indexes on a much larger table.

E2-FullE3-FullE4-Full
Test 12 mins 3 secs1 min 24 secs43 secs
Test 21 min 59 secs1 min 44 secs44 secs
Test 31 min 59 secs1 min 36 secs44 secs
Average2 mins .33 secs1 min 34 secs43.66 secs

Why range data is important for a large table?

Having some sort of range data is particular useful for queries with large table. It is because it:

  1. breaks the query and resolves it in smaller chunks. There are articles on the internet showing that it gives a better overall performance when a portion (let say 10% of the data) of a large table query is processed step by step, instead of processing the whole table in one go.
  2. gives better response between multiple client connections accessing the same table. For example, one client requests for read lock while another client operates a query on the whole giant table.
  3. avoids the dead lock of “Copying to tmp table” bug. The bug was first reported on Oct 2005 and it still hasn’t been fixed! See bug 14070. We came across this issue on a large table query with GROUP BY. However, when the same query is broken down into portions with addition of range data condition, deadlock is averted.
  4. provides a better integration with progress bar. It can actually inform the users how far the long query been processed.
  5. uses much less memory and less overall CPU usage.
  6. takes the full advantage of multi-core processor. You can multi-thread your application and run each thread with a range query.

What if the table doesn’t have the range data

Easy, make one. It is pretty common to have an Id column with AUTO_INCREMENT in MySQL tables. The range can then be acquired by MIN(Id) and MAX(Id).

Conclusion

In this article, we showed that we can’t always assume MySQL resolves the query with the optimal approach, especially for large tables. Even using a query with range condition, it is important to check with EXPLAIN first and experiment the benchmark. A significant performance improvement may be accomplished by slightly changing the query.

1 comment:

  1. Please link or RSS to my blog instead of copying & pasting of the full content.

    Joe

    ReplyDelete