Change Data Capture (CDC) is a technology that can be used in incremental data transfers from OLTP systems to data warehouse systems. By storing the insert, update, and delete operations, it stores the first and last version of the record in a change table. This table where the changes are stored is called a change table.
It then reflects these changes to the destination that you want to transfer. We can do similar operations using trigger, but since CDC takes the changes from the log file, it both works more efficiently and makes our processes considerably easier.
Let’s continue with an example as we always do.
Check CDC Enabled Databases
First of all, with the help of the following script, let’s check if there is a database on the instance where CDC is enabled.
1 2 3 | USE master GO SELECT [name],is_cdc_enabled FROM sys.databases |
Enable CDC
Then, enable the CDC for the database we want to use the CDC with the help of the following script.
1 2 3 4 | USE [AdventureWorks2014] GO EXEC sys.sp_cdc_enable_db GO |
After running the above script, a schema named cdc and some tables will be created in this schema. The list of these tables is as follows. You can see these tables under the system tables in the database in which the CDC is enabled.
cdc.captured_columns | This table stores the list of captured columns |
cdc.change_tables | This table stores the list of captured tables |
cdc.ddl_history | This table stores the history of all DDL operations after the CDC is enabled |
cdc.index_columns | This table stores the changes of the indexes of the columns captred with CDC. |
cdc.lsn_time_mapping | This table stores the change and time of transactions |
List CDC Enabled Tables
CDC can be enabled based on the table in the database. You can use the following script to list the tables where the CDC was enabled.
1 2 3 4 5 | USE [AdventureWorks2014] GO SELECT [name] FROM sys.tables where is_tracked_by_cdc=1 |
Enable CDC on a Table
You can enable CDC on a table with the help of the following script
1 2 3 4 5 6 | USE [AdventureWorks2014] GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Address', @role_name = NULL |
If the SQL Server Agent service is not started, you will receive a message as follows. To avoid this error, you must start the SQL Server Agent service. The reason you receive the error is that the CDC creates two jobs and the SQL Server Agent Service is not started, so these jobs cannot be executed. After starting the service, you don’t need to run the script above again.
SQLServerAgent is not currently running so it cannot be notified of this action.
After running the script, you will see that the CDC creates a new table named “cdc.dbo_Address_CT” under the system tables. In this table, you will see that there are some extra columns except the columns of the “dbo.Address” table.
Check if the CDC is working correctly
Perform an update as follows to check if the CDC is working correctly.
1 2 3 4 5 | USE [AdventureWorks2014] GO UPDATE [dbo].[Address] SET [City] = 'Ankara' WHERE City='Seattle' AND AddressID=23 |
Check whether the change is reflected to the “change table”.
1 2 3 4 | USE [AdventureWorks2014] GO SELECT * FROM [cdc].[dbo_Address_CT] GO |
The values you will see in the _$operation column are as follows.
1 | Delete Process |
2 | Insert Process |
3 | Before Update |
4 | After Update |
When you enable CDC with the script I shared above on the table, it monitors and logs the changes made to all columns in the table. If you want to do this for specific columns in the table, you need to add the columns you want with the @captured_column_list parameter to the end of the script.
1 2 3 4 5 6 7 | USE [AdventureWorks2014] GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Address', @role_name = NULL, @captured_column_list = '[City],[PostalCode]' |
A function called “fn_cdc_get_all_changes_dbo_Address” has been created under the database when the CDC is enabled. You can also get specific results by querying this function. Example use;
1 2 3 4 5 6 7 8 9 10 11 | USE AdventureWorks2014 GO DECLARE @start_time DATETIME, @finish_time DATETIME, @first_lsn BINARY(10), @last_lsn BINARY(10); --We set the start time to yesterday to list changes in the last 1 day. SELECT @start_time = GETDATE()-1, @finish_time = GETDATE(); --You can type smallest greater than or equal instead of smallest greater than. SELECT @first_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @start_time); --You can type largest less than instead of largest less than or equal. SELECT @last_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @finish_time); SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Address(@first_lsn,@last_lsn,'all') |
Disable CDC
Finally, you can disable CDC as follows.
1 2 3 4 5 6 7 8 9 10 | USE [AdventureWorks2014]; GO EXECUTE sys.sp_cdc_disable_table @source_schema = N'dbo', @source_name = N'Address', @capture_instance = N'dbo_Address'; GO --If you don't want to disable it on a database basis, don't run the following script. EXEC sys.sp_cdc_disable_db GO |
You can access Capture Instance by running the following stored procedure.
sys.sp_cdc_help_change_data_capture
After enabling the CDC, you can feed dataware house applications with SSIS packages. Below you will find examples.