How to automatically maintain a MySQL Table’s Partitions?
MySQL defines partitioning as —
Partitioning enables you to distribute portions of individual tables across a file system according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MySQL can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function. The function is selected according to the partitioning type specified by the user, and takes as its parameter the value of a user-supplied expression. This expression can be a column value, a function acting on one or more column values, or a set of one or more column values, depending on the type of partitioning that is used.
After reducing query time by 2000x in a MySQL Table by partitioning, the next immediate requirement was to auto-maintain the partitions of the table. By auto maintenance of partitions, I simply mean that new partitions are created and old partitions, whose data is no longer required, are dropped from the table.
I searched the web and there are no direct tools available for the purpose. So, I created a Node.js script — MySQL Partition Keeper, that is run by a cronjob that automatically maintains MySQL Partitions.
Table Structure
For illustration, tb_foo_partitioned
is a range partitioned table with monthly partitions. The partition key is ts
which is the time at record insertion.
CREATE TABLE `tb_foo_partitioned` (
`id` int NOT NULL AUTO_INCREMENT,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`,`ts`)
)
/*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(`ts`))
(
PARTITION p_20211101 VALUES LESS THAN (UNIX_TIMESTAMP('2021-11-01')),
PARTITION p_20211201 VALUES LESS THAN (UNIX_TIMESTAMP('2021-12-01')),
PARTITION p_20220101 VALUES LESS THAN (UNIX_TIMESTAMP('2022-01-01')),
PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB
) */
This means that all the records with —
timestamp < ‘2021–11–01’ are inserted into p_20211101,
timestamp < ‘2021–12–01’ are inserted into p_20211201,
..
timestamp < MAX_VALUE are inserted into future.
The `future` Partition
The future partition is created as a fail-safe partition, it remains usually empty all the time. In the case of tb_foo_partitioned
the timestamp of a record can never be greater than the current time and hence will always get inserted into the current month’s partition, so one may assume that the future partition is not needed. However, just in case the cron job fails to run, there would be no partition which to insert the next month’s records. The value of the partitioning expression UNIX_TIMESTAMP(`ts`)
for next month’s records does not satisfy any partition range, thus leading to the loss of data.
But in case the future partition exists, the value of the partitioning expression will satisfy the range of the future partition, and will prevent any data loss. Thus it is useful to have the future partition to catch overflows. The future partition can be reorganised to move this data out of the future partition to a separate partition.
I have learnt my lesson the hard way, thus in any current month M, the partition for the next month, M+1 is already present, and MySQLPartitionKeeper is run to create a partition for the month M+2. This saves you the hassle if the cron job fails to run, as the partition for the month M+1 already exists. The number of months for which partitions already exist is called partitionOffset.
Auto Maintenance of Partitions
The auto maintenance of partitions consist of two parts —
Drop the oldest partition (optional)
SELECT PARTITION_NAME, PARTITION_ORDINAL_POSITION FROM information_schema.partitions WHERE TABLE_SCHEMA =’sampledb’ AND TABLE_NAME = ‘tb_foo_partitioned’ AND PARTITION_ORDINAL_POSITION = 1[{"PARTITION_NAME": "p_20211101", "PARTITION_ORDINAL_POSITION": 1}]
The information of the first partition of the table is fetched for creating the query for dropping the oldest partition. It is advisable to backup the data of this partition.
ALTER TABLE tb_foo_partitioned DROP PARTITION p_20211101
Reorganise the future partition
The partition for the next month, M+2 is created out of the future
partition.
ALTER TABLE tb_foo_partitioned REORGANIZE PARTITION future INTO (
PARTITION p_20220101 VALUES LESS THAN (UNIX_TIMESTAMP(‘2022–01–01’)),
PARTITION future VALUES LESS THAN MAXVALUE )
MySQLPartitionKeeper
MySQLPartitionKeeper is a Node.js script that is triggered every month by a cronjob. It generates and runs the three queries illustrated above to carry auto maintenance of the partitions.
Usage
Each config file in the project points to a partitioned table whose partitions need auto-maintenance.
The partitioned table tb_foo_partitioned
in the database sampledb
is the table whose partitions are needed to be maintained. Here is how the config file sample.json
for this table looks like —
{
"dbSettings": {
"dbName" : "sampledb",
"host" : "localhost",
"password" : "password",
"tableName" : "tb_foo_partitioned",
"user" : "maintenance"
}
}
The cron expression for running the script for this config file is —
0 12 20 * * cd /project_location && NODE_ENV=sample.json node index.mjs
MySQLPartitionKeeper automatically maintains partitions for tb_notifications.
If you don’t know what is the partitioning story behind tb_notifications
, read my previous blog, where I discuss how the query time was reduced by 2000x in a MySQL Table with 500M+ rows with partitioning. MySQLPartitionKeeper currently supports auto-maintenance of only Range partitioned tables. In the absence of any other use case, I may not develop it further, but feel free to fork it or contribute to the project.
Acknowledgements
I am very thankful to Rick James and his blogs at http://mysql.rjweb.org/ whom I religiously followed and continue to lookup for, Dilraj Singh, my mentor and guide who entrusted me with this responsibility.