The Percona Toolkit contains many tools for MySQL. Usually, we use the pt-archive to archive our data. Pt-archive can archive the table data to a table or file, also can be used to purge data.
First of all, how to install the Percona-Toolkit?
1. Download a suitable package at https://www.percona.com/downloads/percona-toolkit/LATEST/ , for example, i choose the percona-toolkit-3.0.11-1.el6.x86_64.rpm.
2. Use rpm
to install
1 | rpm -ivh percona-toolkit-3.0.11-1.el6.x86_64.rpm |
You may need to install the dependencies:
1 2 3 4 5 6 7 | yum install perl-DBI yum install perl-DBD-MySQL yum install perl-IO-Socket-SSL yum install perl-Digest-MD5 |
3. Check if the installation is successful
1 | pt-archiver --help |
If it shows the help information, then the installation is finished.
4. Create users for the archiving
At the source side, the user should have at least SELECT, DELETE privileges.
At the target side, the user should have at least INSERT privilege.
We can know why it needs these privileges from the general log below.
Secondly, how to archive data?
(1) Archive data
Now, i’ll archive my table ‘testarchive.sourcetb’ to the target table,
Source database info: 192.168.100.184:8001 testarchive.sourcetb
Target database info: 192.168.100.103:4001 destdb.targettb
Source info:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | mysql> show create table sourcetb \G *************************** 1. row *************************** Table: sourcetb Create Table: CREATE TABLE `sourcetb` ( `id` int(11) NOT NULL, `name` varchar(30) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | mysql> select * from sourcetb; +----+------+ | id | name | +----+------+ | 1 | aaa | | 2 | bbb | | 3 | ccc | +----+------+ |
Because PT-ARCHIVE won’t create the table automatically, so i have to create a same table on the target database:
1 2 3 4 5 6 7 8 9 | CREATE TABLE `targettb` ( `id` int(11) NOT NULL, `name` varchar(30) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Then executing as follows:
1 | pt-archiver --source h=192.168.100.184,P=8001,D=testarchive,t=sourcetb,u=archive,p=archivepwd --where 'id < 3' --dest h=192.168.100.103,P=4001,u=archive_wr,p=wrpwd,D=destdb,t=targettb --limit=1 --no-delete --why-quit 2>&1 >> archive.log |
–source: DSN specifying the table to archive from (required)
–dest: DSN specifying the table to archive to
–where: limit which rows to archive
–no-delete: Do not delete archived rows
–limit: Number of rows to fetch and archive per statement (default 1)
–why-quit: Print reason for exiting unless rows exhausted
The execution log was printed into archive.log
(2)Check the result
After that, check the source table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | mysql> select * from testarchive.sourcetb; +----+------+ | id | name | +----+------+ | 1 | aaa | | 2 | bbb | | 3 | ccc | +----+------+ |
Check the target table:
1 2 3 4 5 6 7 8 9 10 11 12 13 | mysql> select * from destdb.targettb; +----+------+ | id | name | +----+------+ | 1 | aaa | | 2 | bbb | +----+------+ |
It shows that the rows(” id < 3 “) have been archived to the target table, and the source rows are still exist(” –no-delete “).
(3)What the PT-ARCHIVE did?
What the pt-archive actually did in MySQL? As we know, the general log of MySQL can show all the queries, so we can get the details from the general log( turn it on before archiving data):
1 2 3 4 5 6 7 8 9 10 11 12 13 | mysql> show variables like '%general_log%'; +------------------+-----------------------------------+ | Variable_name | Value | +------------------+-----------------------------------+ | general_log | ON | | general_log_file | /data/mysql8001/mysql-general.log | +------------------+-----------------------------------+ |
If the general_log is OFF, use set global general_log=on;
to turn it on, and then open the “general_log_file“.
Source side:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859 2018-09-05T18:43:32.696341+08:00 54 Connect archive@192.168.100.184 on testarchive using TCP/IP2018-09-05T18:43:32.696705+08:00 54 Query set names utf82018-09-05T18:43:32.696873+08:00 54 Query set autocommit=02018-09-05T18:43:32.697653+08:00 54 Query SHOW VARIABLES LIKE 'wait\_timeout'2018-09-05T18:43:32.699146+08:00 54 Query SET SESSION wait_timeout=100002018-09-05T18:43:32.699466+08:00 54 Query SELECT @@SQL_MODE2018-09-05T18:43:32.700519+08:00 54 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'*/2018-09-05T18:43:32.700826+08:00 54 Query SELECT version()2018-09-05T18:43:32.701067+08:00 54 Query /*!40101 SET character_set_connection = utf8mb4 */2018-09-05T18:43:32.701433+08:00 54 Query SHOW VARIABLES LIKE 'version%'2018-09-05T18:43:32.703496+08:00 54 Query SHOW ENGINES2018-09-05T18:43:32.704993+08:00 54 Query SHOW VARIABLES LIKE 'innodb_version'2018-09-05T18:43:32.708559+08:00 54 Query show variables like 'innodb_rollback_on_timeout'2018-09-05T18:43:32.711276+08:00 54 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */2018-09-05T18:43:32.711539+08:00 54 Query USE `testarchive`2018-09-05T18:43:32.712462+08:00 54 Query SHOW CREATE TABLE `testarchive`.`sourcetb`2018-09-05T18:43:32.713398+08:00 54 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */2018-09-05T18:43:32.746212+08:00 54 Query SHOW VARIABLES LIKE 'wsrep_on'2018-09-05T18:43:32.749376+08:00 54 Query SHOW VARIABLES LIKE 'wsrep_on'2018-09-05T18:43:32.752939+08:00 54 Query SHOW VARIABLES LIKE 'version%'2018-09-05T18:43:32.754563+08:00 54 Query SHOW ENGINES2018-09-05T18:43:32.755286+08:00 54 Query SHOW VARIABLES LIKE 'innodb_version'2018-09-05T18:43:32.759183+08:00 54 Query SELECT CONCAT(@@hostname, @@port)2018-09-05T18:43:32.761297+08:00 54 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`name` FROM `testarchive`.`sourcetb` FORCE INDEX(`PRIMARY`) WHERE (id < 3) ORDER BY `id` LIMIT 12018-09-05T18:43:32.768565+08:00 54 Query commit2018-09-05T18:43:32.769014+08:00 54 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`name` FROM `testarchive`.`sourcetb` FORCE INDEX(`PRIMARY`) WHERE (id < 3) AND ((`id` > '1')) ORDER BY `id` LIMIT 12018-09-05T18:43:32.788186+08:00 54 Query commit2018-09-05T18:43:32.788530+08:00 54 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`name` FROM `testarchive`.`sourcetb` FORCE INDEX(`PRIMARY`) WHERE (id < 3) AND ((`id` > '2')) ORDER BY `id` LIMIT 12018-09-05T18:43:32.789915+08:00 54 Query commit2018-09-05T18:43:32.790269+08:00 54 Quit
Target side:
123456789101112131415161718192021222324252627282930313233343536373839404142434445 2018-09-05T18:43:32.714956+08:00 58 Connect archive_wr@192.168.100.184 on destdb using TCP/IP2018-09-05T18:43:32.715127+08:00 58 Query SET SQL_SAFE_UPDATES = 12018-09-05T18:43:32.715929+08:00 58 Query set autocommit=02018-09-05T18:43:32.716946+08:00 58 Query SHOW VARIABLES LIKE 'wait\_timeout'2018-09-05T18:43:32.718650+08:00 58 Query SET SESSION wait_timeout=100002018-09-05T18:43:32.720291+08:00 58 Query SELECT @@SQL_MODE2018-09-05T18:43:32.721198+08:00 58 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO'*/2018-09-05T18:43:32.722882+08:00 58 Query SELECT version()2018-09-05T18:43:32.727548+08:00 58 Query SHOW VARIABLES LIKE 'version%'2018-09-05T18:43:32.731768+08:00 58 Query SHOW ENGINES2018-09-05T18:43:32.735711+08:00 58 Query SHOW VARIABLES LIKE 'innodb_version'2018-09-05T18:43:32.738308+08:00 58 Query show variables like 'innodb_rollback_on_timeout'2018-09-05T18:43:32.740513+08:00 58 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */2018-09-05T18:43:32.743294+08:00 58 Query USE `destdb`2018-09-05T18:43:32.744080+08:00 58 Query SHOW CREATE TABLE `destdb`.`targettb`2018-09-05T18:43:32.745378+08:00 58 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */2018-09-05T18:43:32.759996+08:00 58 Query SELECT CONCAT(@@hostname, @@port)2018-09-05T18:43:32.762450+08:00 58 Query INSERT INTO `destdb`.`targettb`(`id`,`name`) VALUES ('1','aaa')2018-09-05T18:43:32.763767+08:00 58 Query commit2018-09-05T18:43:32.769621+08:00 58 Query INSERT INTO `destdb`.`targettb`(`id`,`name`) VALUES ('2','bbb')2018-09-05T18:43:32.781708+08:00 58 Query commit2018-09-05T18:43:32.789535+08:00 58 Query commit2018-09-05T18:43:32.790562+08:00 58 Quit
Pt-archive performs a lot of checks when it is executed, and it also allows you turn these checks off.
If you want to delete the data, then you can remove the “–no-delete” option, and we can get “delete” from the general log:
123456789101112131415 2018-09-05T18:45:45.187173+08:00 56 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`name` FROM `testarchive`.`sourcetb` FORCE INDEX(`PRIMARY`) WHERE (id < 3) ORDER BY `id` LIMIT 12018-09-05T18:45:45.191838+08:00 56 Query DELETE FROM `testarchive`.`sourcetb` WHERE (`id` = '1')2018-09-05T18:45:45.197355+08:00 56 Query commit2018-09-05T18:45:45.256209+08:00 56 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`name` FROM `testarchive`.`sourcetb` FORCE INDEX(`PRIMARY`) WHERE (id < 3) AND ((`id` >= '1')) ORDER BY `id` LIMIT 12018-09-05T18:45:45.258457+08:00 56 Query DELETE FROM `testarchive`.`sourcetb` WHERE (`id` = '2')2018-09-05T18:45:45.265718+08:00 56 Query commit2018-09-05T18:45:45.272541+08:00 56 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`name` FROM `testarchive`.`sourcetb` FORCE INDEX(`PRIMARY`) WHERE (id < 3) AND ((`id` >= '2')) ORDER BY `id` LIMIT 12018-09-05T18:45:45.274470+08:00 56 Query commit
Other options of the pt-archive
You can use pt-archiver --help
to get whole help information
Here i list some commonly used options:
–source: DSN specifying the table to archive from (required)
–dest: DSN specifying the table to archive to
–where: limit which rows to archive
–no-delete: Do not delete archived rows
–limit: Number of rows to fetch and archive per statement (default 1)
–why-quit: Print reason for exiting unless rows exhausted
–replace: Causes INSERTs into –dest to be written as REPLACE
–bulk-delete: Delete each chunk with a single statement (implies –commit-each)
–statistics: Collect and print timing statistics
–max-lag=m: Pause archiving if the slave given by –check-slave-lag lags (default 1s). Optional suffix s=seconds, m=minutes, h=hours, d=days; if no suffix, s is used.
–sleep=i: Sleep time between fetches