Archive data using pt-archiver

 

The Percona Toolkit contains many tools for MySQL. Usually, we use the pt-archive to archive our data. Pt-archive can archive the table data to a table or file, also can be used to purge data.

First of all, how to install the Percona-Toolkit?

1. Download a suitable package at https://www.percona.com/downloads/percona-toolkit/LATEST/ , for example, i choose the percona-toolkit-3.0.11-1.el6.x86_64.rpm.

2. Use rpm to install

 

You may need to install the dependencies:

 

3. Check if the installation is successful

 

If it shows the help information, then the installation is finished.

 

4. Create users for the archiving

At the source side, the user should have at least SELECT, DELETE privileges.

At the target side, the user should have at least INSERT privilege.

We can know why it needs these privileges from the general log below.

 

Secondly, how to archive data?

(1) Archive data

Now, i’ll archive my table ‘testarchive.sourcetb’ to the target table,

Source database info: 192.168.100.184:8001   testarchive.sourcetb

Target database info: 192.168.100.103:4001   destdb.targettb

Source info:

 

 

Because PT-ARCHIVE won’t create the table automatically, so i have to create a same table on the target database:

 

Then executing as follows:

 

–source: DSN specifying the table to archive from (required)

–dest: DSN specifying the table to archive to

–where: limit which rows to archive

–no-delete: Do not delete archived rows

–limit: Number of rows to fetch and archive per statement (default 1)

–why-quit: Print reason for exiting unless rows exhausted

The execution log was printed into archive.log

 

(2)Check the result

After that, check the source table:

 

Check the target table:

 

It shows that the rows(” id < 3 “) have been archived to the target table, and the source rows are still exist(” –no-delete “).

 

(3)What the PT-ARCHIVE did?

What the pt-archive actually did in MySQL? As we know, the general log of MySQL can show all the queries, so we can get the details from the general log( turn it on before archiving data):

 

If the general_log is OFF, use set global general_log=on; to turn it on, and then open the “general_log_file“.

Source side:

 

 

Target side:

 

 

Pt-archive performs a lot of checks when it is executed, and it also allows you turn these checks off.

If you want to delete the data, then you can remove the “–no-delete” option, and we can get “delete” from the general log:

 

 

Other options of the pt-archive

You can use pt-archiver --help  to get whole help information

Here i list some commonly used options:

–source: DSN specifying the table to archive from (required)

–dest: DSN specifying the table to archive to

–where: limit which rows to archive

–no-delete: Do not delete archived rows

–limit: Number of rows to fetch and archive per statement (default 1)

–why-quit: Print reason for exiting unless rows exhausted

–replace: Causes INSERTs into –dest to be written as REPLACE

–bulk-delete: Delete each chunk with a single statement (implies –commit-each)

–statistics: Collect and print timing statistics

–max-lag=m:  Pause archiving if the slave given by –check-slave-lag lags (default 1s). Optional suffix s=seconds, m=minutes, h=hours, d=days; if no suffix, s is used.

–sleep=i: Sleep time between fetches

Shuo Wang
Author: Shuo Wang

Leave a Reply

Your email address will not be published. Required fields are marked *