Partitioning — Reduce query time by 2000x in a MySQL table with 500M+ rows
At Tookan close monitoring of response times are very important to allow our customers and delivery partners a seamless and hassle-free experience.
A summary screen that displays the efficiency parameters and history of order notifications was released for the Tookan Agent App, the mobile app used by delivery partners. The API consumed by this screen queried data from a MySQL table where all the push notifications sent to the app are stored.
My attention was brought to this table by a slow query from the slow query logs with an execution time of around 2 seconds.
SELECT * FROM tb_notifications WHERE agent_id = 12345 AND notification_sent_time < "2021–05–03" AND notification_sent_time > "2021–05–04"
On checking the query EXPLAIN
, I found out the MySQL table was not indexed as per the WHERE
as the table didn’t have any index of notification_sent_time
field.
Let me take you through the possible solutions one would come around with after some brainstorming.
Add an index on notification_sent_time?
ALTER TABLE `tb_notifications` ADD INDEX `notification_sent_time` (`notification_sent_time`)
What made matters worse was that the table had 500M+rows and growing every second, meaning I couldn’t just ALTER TABLE
and add an index (Have you tried? 😉). The data of only the last two months was required and 400M+ rows of the table were just archival data which unnecessarily slowed down the query.
Delete the old data?
Now at this point, you may think, that a straight-up DELETE
query to wipe out the archived rows would solve the problem.
DELETE FROM tb_notifications WHERE notification_sent_time > NOW() - INTERVAL 2 MONTH
But deletes in large tables eat additional resources and make the SLAVEs lag as well as the disk space relinquished cannot be reclaimed.
Partition the table?
ALTER TABLE `tb_notifications` PARTITION BY RANGE (UNIX_TIMESTAMP(notification_sent_time))PARTITIONS 4 (PARTITION drop_old VALUES LESS THAN (UNIX_TIMESTAMP('2021-04-01')),PARTITION p_20210501 VALUES LESS THAN (UNIX_TIMESTAMP('2021-05-01')),PARTITION p_20210601 VALUES LESS THAN (UNIX_TIMESTAMP('2021-06-01')),PARTITION future VALUES LESS THAN MAXVALUE);
The next obvious solution was to partition the table, but as required the table did not have the partition key included in the primary key.
Here is where it gets tough now, the table cannot be partitioned, the data cannot be deleted, and an index cannot be added. I have performed each of these in a testing environment, so believe me when I state this.
OK. What next then?
A solution was worked out to create a new table with the data imported from the original tb_notifications table
that would be range partitioned on `notification_sent_time`, so that old partitions can be dropped at once.
CREATE TABLE `tb_notifications_new` (
`id` bigint NOT NULL AUTO_INCREMENT,
`agent_id` bigint NOT NULL,
`notification_sent_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`,`notification_sent_time`),
KEY `agent_id` (`agent_id`)
)
/*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(notification_sent_time))PARTITIONS 4 (PARTITION drop_old VALUES LESS THAN (UNIX_TIMESTAMP('2021-04-01')),PARTITION p_20210501 VALUES LESS THAN (UNIX_TIMESTAMP('2021-05-01')),PARTITION p_20210601 VALUES LESS THAN (UNIX_TIMESTAMP('2021-06-01')),PARTITION future VALUES LESS THAN MAXVALUE); */
Now the relevant data was copied from the original tb_notifications
table to the tb_notifications_new
table. To ensure data consistency this operation was done on an offline slave.
INSERT INTO tb_notifications_new (id, agent_id, notification_sent_time) SELECT id, agent_id, notification_sent_time FROM tb_fleet_push WHERE push_id > 472981000
Rename the new table tb_notifications_new
to tb_notifications
, for the API to query data from the newly created partitioned table.
ALTER TABLE tb_notifications RENAME TO tb_notifications_old;
ALTER TABLE tb_notifications_new RENAME TO tb_notifications;
Note: The whole operation took around 3 hours, if you are performing this operation on a slave, a revisit of the binary log file size limit is highly recommended.
The offline slave was put back to replication and once the slave lag was over, all the previously sluggish queries were directed to it only this time the query time reduced to a few milliseconds. A 2000x improvement in query times.
It has been 200+ days and tb_notifications
size is on the rise every month, as we onboard new delivery partners, without denting the query performance. Partitioning should not be directly linked to better query performance, instead, it is a data management strategy. I have illustrated an example where partitioning helped manage old and irrelevant data, which in turn improved the query performance overall.
PartitionKeeper
The partitions of the table tb_notifications
are managed automatically by MySQLPartitionKeeper, a monthly cron job I built that carves out a new partition out of the future partition and drops the oldest one.
pt-online-schema-change
Percona’s pt-osc is a great free tool for altering large MySQL tables that suits a wide variety of applications, including partitioning a non-partitioned table, which is worth mentioning here.
pt-online-schema-change emulates the way that MySQL alters tables internally, but it works on a copy of the table you wish to alter. This means that the original table is not locked, and clients may continue to read and change data in it. It works by creating an empty copy of the table to alter, modifying it as desired, and then copying rows, in small chunks from the original table into the new table. When the copy is complete, it moves away the original table and replaces it with the new one.
pt-osc has additional configurable parameters — chunk-size, max-load, max-lag which makes it a very good candidate for handling large alters and that too directly on the master. The compelling reasons for me to not partition tb_notifications
with pt-osc were - its inability to copy only relevant rows to the new table, and to use the alter option for multiple alter statements, as required in my case. Custom create table statements were supported in older versions of pt-osc, though, I prefer newer versions.
You can also read about Github’s Online Schema Migration For MySQL.
Acknowledgements
I am very thankful to Rick James and his blogs at http://mysql.rjweb.org/ whom I religiously followed and continue to lookup for, Dilraj Singh, my mentor and guide who entrusted me with this responsibility.