Tuesday, January 31, 2012

Query to Find Column From All Tables of Database

How many tables in database AdventureWorks have column name like ‘EmployeeID’?
It was quite an interesting question and I thought if there are scripts which can do this would be great. I quickly wrote down following script which will go return all the tables containing specific column along with their schema name.

USE AdventureWorks
SELECT t.name AS table_name,
SCHEMA_NAME(schema_id) AS schema_name,
c.name AS column_name
FROM sys.tables AS t
WHERE c.name LIKE '%EmployeeID%'
ORDER BY schema_name, table_name;

In above query replace EmployeeID with any other column name.

SELECT t.name AS table_name,
SCHEMA_NAME(schema_id) AS schema_name,
c.name AS column_name
FROM sys.tables AS t
ORDER BY schema_name, table_name;

If you want to find all the column name from your database run following script. You can down any condition in WHERE clause to get desired result.


INTERSECT operator in SQL Server 2005 is used to retrieve the common records from both the left and the right query of the Intersect Operator. INTERSECT operator returns almost same results as INNER JOIN clause many times.
When using INTERSECT operator the number and the order of the columns must be the same in all queries as well data type must be compatible.
Let us see understand how INTERSECT and INNER JOIN are related.We will be using AdventureWorks database to demonstrate our example.

Example 1: Simple Example of INTERSECT

FROM HumanResources.EmployeeDepartmentHistory
WHERE EmployeeID IN (1,2,3)
FROM HumanResources.EmployeeDepartmentHistory
WHERE EmployeeID IN (3,2,5)


The ResultSet  shows  the  EmployeeID  which  are  common in both the Queries, i.e  2 and 3.
Example 2:  Using simple INTERSECTbetween two tables.

SELECT VendorID,ModifiedDate
FROM Purchasing.VendorContact
FROM Purchasing.VendorAddress
The Resultset shows the records that are common in both the tables. It shows 104 common records between the tables. 

Example 3:  Using INNER JOIN.

SELECT va.VendorID,va.ModifiedDate
FROM Purchasing.VendorContact vc
INNER JOIN Purchasing.VendorAddress va ON vc.VendorID = va.VendorID
AND vc.ModifiedDate = va.ModifiedDate

Exlanation :

The resultset displays all the records which are common to both the tables. Additionally in example above INNER JOIN retrieves all the records from the left table and all the records from the right table. Carefully observing we can notice many of the records as duplicate records. When INNER JOIN is used it gives us duplicate records, but that is not in the case of INTERSECT operator. 

Example 4:  Using INNER JOIN with Distinct.

SELECT DISTINCT va.VendorID,va.ModifiedDate
FROM Purchasing.VendorContact vc
INNER JOIN Purchasing.VendorAddress va ON vc.VendorID = va.VendorID
AND vc.ModifiedDate = va.ModifiedDate

The resultset in this example does not contain any duplicate records as DISTINCT clause is used in SELECT statement. DISTINCT removes the duplicate rows and final result in this example is exactly same as example 2 described above. In this way, INNER JOIN can simulate with INTERSECT when used with DISTINCT.

EXCEPT Clause in SQL Server is Similar to MINUS Clause in Oracle

EXCEPT clause in SQL Server is exactly similar to MINUS operation in Oracle. The EXCEPT query and MINUS query returns all rows in the first query that are not returned in the second query. Each SQL statement within the EXCEPT query and MINUS query must have the same number of fields in the result sets with similar data types. Let us see that using example below.

First create table in SQL Server and Oracle. 

CREATE TABLE EmployeeRecord
EmpPost VARCHAR(9), ManagerID INT,
Salery INT, COMM INT, DeptNO INT);
INSERT INTO EmployeeRecord
VALUES (7369, 'SMITH', 'CLERK', 7902, 800, NULL, 20);
INSERT INTO EmployeeRecord
VALUES (7499, 'ALLEN', 'SALESMAN', 7698, 1600, 300, 30);
INSERT INTO EmployeeRecord
VALUES (7521, 'WARD', 'SALESMAN', 7698, 1250, 500, 30);
INSERT INTO EmployeeRecord
VALUES (7566, 'JONES', 'MANAGER', 7839, 2975, NULL, 20);
INSERT INTO EmployeeRecord
VALUES (7654, 'MARTIN', 'SALESMAN', 7698, 1250, 1400, 30);
INSERT INTO EmployeeRecord
VALUES (7698, 'BLAKE', 'MANAGER', 7839, 2850, NULL, 30);
INSERT INTO EmployeeRecord
VALUES (7782, 'CLARK', 'MANAGER', 7839, 2450, NULL, 10);
INSERT INTO EmployeeRecord
VALUES (7788, 'SCOTT', 'ANALYST', 7566, 3000, NULL, 20);
INSERT INTO EmployeeRecord
VALUES (7839, 'KING', 'PRESIDENT', NULL, 5000, NULL, 10);
INSERT INTO EmployeeRecord
VALUES (7844, 'TURNER', 'SALESMAN', 7698, 1500, 0, 30);
INSERT INTO EmployeeRecord
VALUES (7876, 'ADAMS', 'CLERK', 7788, 1100, NULL, 20);
INSERT INTO EmployeeRecord
VALUES (7900, 'JAMES', 'CLERK', 7698, 950, NULL, 30);
INSERT INTO EmployeeRecord
VALUES (7902, 'FORD', 'ANALYST', 7566, 3000, NULL, 20);
INSERT INTO EmployeeRecord
VALUES (7934, 'MILLER', 'CLERK', 7782, 1300, NULL, 10);

FROM EmployeeRecord

Now run following query in SQL Server:

SELECT EmpNo, EmpName
FROM EmployeeRecord
WHERE Salery > 1000
EmpNo, EmpName
FROM EmployeeRecord
WHERE Salery > 2000
Now run following query in Oracle:

SELECT EmpNo, EmpName
FROM EmployeeRecord
WHERE Salery > 1000
EmpNo, EmpName
FROM EmployeeRecord
WHERE Salery > 2000
You will find that both the query will return you same results.

Drop the table in SQL Server and Oracle as we are done with example.

DROP TABLE EmployeeRecord;

Wednesday, January 4, 2012

Introduction to Change Data Capture (CDC) in SQL Server 2008

Change Data Capture records INSERTs, UPDATEs, and DELETEs applied to SQL Server tables, and makes a record available of what changed, where, and when, in simple relational 'change tables' rather than in an esoteric chopped salad of XML. These change tables contain columns that reflect the column structure of the source table you have chosen to track, along with the metadata needed to understand the changes that have been made. Pinal Dave explains all, with plenty of examples in a simple introduction.
Often, you’ll be told that the specification of an application requires that  the value of  data in the database of an application must be recorded before it is changed. In other words, we are required to save all the history of the changes to the data. This feature is usually implemented for data security purposes. To implement this, I have seen a variety of solutions from triggers, timestamps and complicated queries (stored procedures) to audit data.
SQL Server 2005 introduced the new features of ‘after update’, ‘after insert’ and ‘after delete’ triggers that  almost solved the problem of tracking changes in data.  A better solution was introduced in SQL Server 2008 and is called Change Data Capture (CDC). CDC has allowed SQL Server developers to deliver SQL Server data archiving and capturing without any additional programming.
CDC is one of the new data tracking and capturing features of SQL Server 2008. It only tracks changes in user-created tables. Because captured data is then stored in relational tables, it can be easily accessed and retrieved subsequently, using regular T-SQL.
When you apply Change Data Capture features on a database table, a mirror of the tracked table is created with the same column structure of the original table, but with additional columns that include the metadata used to summarize the nature of the change in the database table row.  The SQL Server DBA can then easily monitor the activity for the logged table using these new audit tables .

Enabling Change Data Capture on a Database

CDC first has to be enabled for the database. Because CDC is a table-level feature, it then has to be enabled for each table to be tracked. You can run following query and check whether it is enabled for any database.
USE master
SELECT [name]database_idis_cdc_enabled 
FROM sys.databases      
This query will return the entire database name along with a column that shows whether  CDC is enabled.
You can run this stored procedure in the context of each database to enable CDC at database level. (The following script will enable CDC in AdventureWorks database. )
USE AdventureWorks
EXEC sys.sp_cdc_enable_db
As soon as CDC is enabled, it will show this result in SSMS.
Additionally, in the database AdventureWorks, you will see that a schema with the name ‘cdc’ has now been  created.
Some System Tables will have been created within the  AdventureWorks database as part of the cdc schema.
The table which have been created are listed here.
  • cdc.captured_columns – This table returns result for list of captured column.
  • cdc.change_tables – This table returns list of all the tables which are enabled for capture.
  • cdc.ddl_history – This table contains history of all the DDL changes since capture data enabled.
  • cdc.index_columns – This table contains indexes associated with change table.
  • cdc.lsn_time_mapping – This table maps LSN number (for which we will learn later) and time.

Enabling Change Data Capture on one or more Database Tables

The CDC feature can be applied at the table-level  to any database for which CDC is enabled.  It has to be enabled for any table which needs to be tracked. First run following query to show which tables of database have already been enabled for CDC.
USE AdventureWorks

SELECT [name]is_tracked_by_cdc 
FROM sys.tables
The above query will return a result that includes a column with the  table name, along with a column which displays if CDC is enabled or not.
You can run the following stored procedure to enable each table. Before enabling CDC at the table level, make sure that you have  enabled SQL Server Agent. When CDC is enabled on a table, it creates two CDC-related jobs that are specific to the database,  and executed using SQL Server Agent. Without SQL Server Agent enabled, these jobs will not execute.
Additionally, it is very important to understand the role of the required parameter @role_name. If there is any restriction of how data should be extracted from database, this option is used to specify any role which is following restrictions and gating access to data to this option if there is one.  If you do not specify any role and, instead, pass a NULL value, data access to this changed table will not be tracked and will be available to access by everybody.
Following script will enable CDC on HumanResources.Shift table.
USE AdventureWorks
EXEC sys.sp_cdc_enable_table
@source_schema N'HumanResources',
@source_name   N'Shift',
@role_name     NULL
As we are using AdventureWorks database, it creates the jobs with following names.
  1. cdc.AdventureWorks_capture – When this job is executed it runs the system stored procedure sys.sp_MScdc_capture_job.  The procedure sys.sp_cdc_scan  is called internally by sys.sp_MScdc_capture_job. This procedure cannot be executed explicitly when a change data capture log scan operation is already active or when the database is enabled for transactional replication. This system SP enables SQL Server Agent, which in facts enable Change Data Capture feature.
  2. cdc.AdventureWorks_cleanup – When this job is executed it runs the system stored procedure sys.sp_MScdc_cleanup_job. This system SP cleans up database changes tables.
The Stored Procedure sys.sp_cdc_enable_table enables CDC. There are several options available with this SP but we will only mention the required options for this SP. CDC is very powerful and versatile tool. By understanding the Stored Procedure  sys.sp_cdc_enable_table you will gain the true potential of the CDC feature. One more thing to notice is that when these jobs are created they are automatically enabled as well.
By default, all the columns of the specified table  is taken into consideration of this operation. If you want to only few columns of this table to be tracked in that case you can specify the columns as one of the parameters of above mentioned SP.
When everything is successfully completed,  check  the system tables again and you will find a new table  called cdc.HumanResources_Shift_CT. This table will contain all the changes in the table HumanResources.Shift. If you expand this table, you will find five additional columns as well.  
As you will see there are five additional columnsto the mirrored original table
  • __$start_lsn
  • __$end_lsn
  • __$seqval
  • __$operation
  • __$update_mask
There are two values which are very important to us is __$operation and __$update_mask.
Column _$operation contains value which corresponds to DML Operations. Following is quick list of value and its corresponding meaning.
  • Delete Statement = 1
  • Insert Statement = 2
  • Value before Update Statement = 3
  • Value after Update Statement = 4
The column _$update_mask shows, via a bitmap,   which columns were updated in the DML operation that was specified by _$operation.  If this was  a DELETE or INSERT operation,   all columns are updated and so the mask contains value which has all 1’s in it. This mask is contains value which is formed with Bit values.

Example of Change Data Capture

We will test this feature by doing  DML operations such as INSERT, UPDATE and DELETE on the table HumanResources.Shift which we have set up for CDC. We will observe the effects on the CDC table cdc.HumanResources_Shift_CT.
Before we start let’s first SELECT from both tables and see what is in them.
USE AdventureWorks
FROM HumanResources.Shift
USE AdventureWorks
FROM cdc.HumanResources_Shift_CT
The result  of the query is as displayed here.
The original table HumanResources.Shift has three rows in it, whereas the  table cdc.HumanResources_Shift_CT is totally empty. This table will have entries after an operation on the tracked table.

Insert Operation

Lets run an INSERT operation on the table HumanResources.Shift.
USE AdventureWorks
INSERT INTO [HumanResources].[Shift]

Once the script is run, we will check the content of two of our tables HumanResources.Shift and cdc.HumanResources_Shift_CT.
Because of the INSERT operation, we have a newly inserted fourth row in the tracked table HumanResources.Shift . The tracking table also has the same row visible. The value of _operation is 2 which means that this is an INSERT operation.

Update Operation

To illustrate the effects of an UPDATE we will update a newly inserted row.
USE AdventureWorks
UPDATE [HumanResources].[Shift]
SET Name = 'New Name',
      ModifiedDate = GETDATE()
WHERE ShiftID = 4
Once more, we check our tables HumanResources.Shift and cdc.HumanResources_Shift_CT.
UPDATE operations always result in two different entries in the tracking table. One entry contains the previous values before the UPDATE is executed.  The second entry is for new data after the UPDATE is executed. In our case we have only changed two columns of the table but we are tracking the complete table so all the entries in the tableare logged before, and after, the update happens. The Change Data Capture mechanism always captures all the columns of the table unless, when CDC is set up on a table, it is restricted to track only a few columns. We will see how this can be done later on this article.

Delete Operation

To verify this option we will be running a DELETE operation on a newly inserted row.
USE AdventureWorks
Once this script is run, we can see the contents of  our tables HumanResources.Shift and cdc.HumanResources_Shift_CT.
Due to the DELETE operation, we now have only three rows in the tracked table HumanResources.Shift We can see the deleted row visible in the tracking table as new entry. The value of _operation is 4 , meaning that this is a delete operation.

Change Data Capture and Operations

We have now verified that, by using CDC, we are able to capture all the data  affected by DML operation. In the tracked table we have four values of the operation. We can see this operation’s value in the following image.

Understanding Update mask

It is important to understand the Update mask column in the tracking table. It is named as _$update_mask. The value displayed in the field is hexadecimal but is stored as binary.
In our example we have three different operations. INSERT and DELETE operations are done on the complete row and not on individual columns. These operations are listed marked masked with 0x1F is translated in binary as 0b11111, which means all the five columns of the table.
In our example, we had an UPDATE on only two columns – the second and fifth column. This is represented with 0x12 in hexadecimal value ( 0b10010 in binary).  Here, this value stands for second and fifth value if you look at it from the right, as a bitmap. This is a useful way of finding out which columns are being updated or changed.
The tracking table shows  two columns which contains the suffix lsn in them i.e. _$start_lsn and _$end_lsn. These two values correspond to the  Log Sequential Number. This number is associated with committed transaction of the DML operation on the tracked table.

Disabling Change Data Capture on a table

Disabling this feature is very simple. As we have seen earlier, if we have to enable CDC we have to do this in two steps – at table level and at database level,: In the same way, when we have to disable this feature, we can do this at same two levels. Let us see both of them one after one.
For dropping any tracking of any table we need three values the Source Schema, the Source Table name, and the Capture Instance. It is very easy to get schema and table name. In our case, the schema is HumanResource and table name is Shift, however we do not know the name of the Capture Instance. We can retrieve it very easily by running following T-SQL Query.
USE AdventureWorks;
EXEC sys.sp_cdc_help_change_data_capture
this will return a result which contains all the three required information for disabling CDC ona table.
The Stored Procedure  sys.sp_cdc_help_change_data_capture provides lots of other useful information as well. Once we have name of the capture instance, we can disable tracking of the table by running this T-SQL query.
USE AdventureWorks;
EXECUTE sys.sp_cdc_disable_table
    @source_schema = N'HumanResources',
    @source_name = N'Shift',
    @capture_instance = N'HumanResources_Shift';
Once Change Data Capture is disabled on any table, it drops the change data capture table as well all the functions which were associated with them. It also deletes all the rows and data associated with this feature from all the system tables and changes relevant data in catalog views.
In our example, we can clearly see that capture table cdc.HumanResources_Shift_CT is dropped.

Disable Change Data Capture Feature on Database

This is the easiest task out of all process. Running following T-SQL query will disable CDC on whole database.
USE AdventureWorks
EXEC sys.sp_cdc_disable_db
This Stored Procedure will delete all the data, functions, tables related to CDC. If this data is needed for any reason, you must take a  backup  before dropping CDC from any database

Capture Selected Column

When CDC is enabled on any table, it usually captures the data of all the columns. During INSERT or DELETE operations, it is necessary to capture all the data but in UPDATE operations  only the data of the updated columns are required. CDC is not yet advanced enough to provide this kind of dynamic column selection but CDC can let you select the columns from which changes to data should be captured from the beginning.
This stored procedure should be run in the context of each database to enable it at database level. Following script will enable CDC in AdventureWorks database.
USE AdventureWorks
EXEC sys.sp_cdc_enable_db
Now we will enable this feature at table level but for selected columns of ShiftID and Name only. This script will enable table-level change data capture for only two columns.
USE AdventureWorks
EXEC sys.sp_cdc_enable_table
@source_schema N'HumanResources',
@source_name   N'Shift',
@role_name     NULL,
@captured_column_list '[ShiftID],[Name]'
So what’s in the system table which will be created by data capturing purpose in AdventureWorks Database?
So you can see that there are now only two rows which are tracked.
We will change the data of one of the columns that weren’t specified so as to see  the value in cdc.HumanResources_Shift_CT table.
Before we start let us first select from both of the table and observe its content.
USE AdventureWorks
FROM HumanResources.Shift
USE AdventureWorks
FROM cdc.HumanResources_Shift_CT
Here is the result.
The original table HumanResources.Shift now has three rows in it; whereas  table cdc.HumanResources_Shift_CT is totally empty. Lets update ModifiedDate for ShiftID =1 and see if that record creates an entry in the tracking table.
USE AdventureWorks
UPDATE [HumanResources].[Shift]
SET        ModifiedDate GETDATE()
WHERE  ShiftID 3
Now to check the contents of the tracking table  table cdc.HumanResources_Shift_CT and see whether that change is captured.
The tracking table is empty because it only tracks the changes which it contains, and it ignores any changes in other columns.
Retrieve Captured Data of Specific Time Frame
Quite often, one is asked for data to be tracked over a  time interval. If you look at the tracking data there is apparently no time captured at all. It always provides all the information. However, there are few fields which can definitely help us out i.e. _$start_lsn . LSN stands for Last Sequence Number. Every record in transaction log is uniquely identified by a LSN. They are always incrementing numbers.
LSN numbers are always associated with time and their mapping can be found after querying system table  cdc.lsn_time_mapping. This table is one of the tables which was created when AdventureWorks database was enabled for CDC. You can run this query to get all the data in the table  cdc.lsn_time_mapping.
USE AdventureWorks
FROM cdc.lsn_time_mapping
When  this query is run it will give us all the rows of table. It is a little difficult to find the  necessary information from all the data. The usual case is when we need to inspect a change that occurred in a particular  time period.
We can find the time that corresponds to the LSN by using the system function sys.fn_cdc_map_time_to_lsn. If we want all the changes done yesterday, we can run this function as described below and it will return all the rows from yesterday.
Before we run this query let us explore two table valued functions (TVF) in AdventureWorks database. You can see that there are two new TVF are created with schema cfc. These functions are created when table level CDC was enabled.
The function cdc.fn_cdc_get_all_changes_HumanResources_Shift can be used to get events that occurred over a particular time period. You can run this T-SQL script to get event happened during any specific time period. In our case, we will be retrieving this data for the past 24 hours.
Following query should do retrieve data which was modified in the past 24 hours..
USE AdventureWorks
DECLARE @begin_time DATETIME@end_time DATETIME@begin_lsn BINARY(10), @end_lsn BINARY(10);
SELECT @begin_time GETDATE()-1@end_time GETDATE();
SELECT @begin_lsn sys.fn_cdc_map_time_to_lsn('smallest greater than'@begin_time);
SELECT @end_lsn sys.fn_cdc_map_time_to_lsn('largest less than or equal'@end_time);
FROM cdc.fn_cdc_get_all_changes_HumanResources_Shift(@begin_lsn,@end_lsn,'all')
we have used relational operations in the function sys.fn_cdc_map_time_to_lsn. There can be total of four different relational operations available to use in that function:
  • largest less than
  • largest less than or equal
  • smallest greater than
  • smallest greater than or equal
This way captured data can be queried very easily and query based on time interval.

Automatic Clean Up Process

If we track every change of all the  data in our database, there is very good chance that we will outgrow the hard drive of main server. This will also lead to issues with maintenance and input/output buffer issues.
In CDC this there is automatic cleanup process that runs at regular intervals. By default the interval is of 3 days but it can be configured. We have observed that, when we enable CDC on the database, there is one additional system stored procedure created with the  name sys.sp_cdc_cleanup_change_table which cleans up all the tracked data at interval.

Track DML Changes Using after Trigger for Update, Delete and Insert rows

There is out lot many options to track DML changes to data in SQL server like Change data Capture(CDC), Change tracking which are very easy to implement and make use of them but there are scenarios where we cannot use these CDC and change tracking as they were introduced in SQL server 2008. If you want to implement this in before 2008 versions the best way can be using of triggers.
We can track data changes to a SQL server table like update, delete and insert operations individually by creating AFTER Triggers for update , Trigger for Insert and Trigger for Delete. In our scenario I’m going to explain creating a trigger to track the data changes and save into audit table for each individual operation happens on rows in a table.
What is the purpose of this trigger?
  • Track the newly inserted or updated row and save into the audit table
  • Track the deleted rows and save into the audit table by using date stamp column
By implementing this trigger we can have the historical changes to the rows in a table and also we can query the table by using data column to find out the productivity of the table and how much operational is the table?
Note: This trigger captures only the operation occurred and on which row is effected, See below for the snapshot of the result.
Keep in mind that I’m using SQL Server Denali (CTP) Version for creating this trigger but also tested in 2005 and 2008 versions too.
In order to implement this Trigger, I’m creating test database by using the below script

/*Creating a Test Database*/
 Create database TestDB

Next, Creating test table to implement trigger on
/*Create Test Table to make use for Implementing trigger for DML Changes */
 Create table Test_table(
 id int identity
,Name varchar(50)
,phonenumber varchar(10) )
Next, insert some data into the table to perform testing the trigger using the below script
/*Insert data into Test Table for which we track DML changes*/
 Insert into Test_table(Name,phonenumber) values('Lucky',9191919191)
 Insert into Test_table(Name,phonenumber) values('Priya',0110101101)
 Insert into Test_table(Name,phonenumber) values('Meha',9987979237)
 Insert into Test_table(Name,phonenumber) values('stacy',9178697239)
 Insert into Test_table(Name,phonenumber) values('Nancy',9126827982)
/*select the rows inserted in the above script*/
select * from TestDB.dbo.Test_table

Next, creating an audit table in the TestDB to save the trigger tracked data from inserted, updated and deleted as below make sure that we are creating timestamp column to query later for historical changes to the table as below
/*Create Table to save Audit Data changes */
Create table TestDB.dbo.Test_table_Audit(
Effective_date datetime -- To get the date and time of the changed row
,Operation char(10) -- To get the operation occured like Insert or Update or Delete
,Id INT 
,Name Varchar(50)
,phonenumber varchar(10)
In the next final step we will see how to create the After Trigger for all DML changes and save into the audit table using the below script which is well explained with comment lines.

Test Insert Operation – Insert a row into Test_table
/*Testing Insert Operation*/
insert into Test_table(Name,phonenumber) values('Microsoft',9190879979)
Then verify that the row we inserted above is tracked and saved in the audit table.
select * from dbo.Test_table_Audit where Name like ‘Microsoft’
Test Update Operation – Update a row just inserted above using below script
/*Testing Update Operation*/
Update Test_table
Set Name= 'SQLFRNDZ'
Where Name like 'Microsoft'
Then verify that the row we updated is tracked and inserted into audit table
select * from dbo.Test_table_Audit where Name like ‘SQLFRNDZ’

Test Delete operation- Delete a row we just updated above using below script 
/*Delete a row from the table test_table*/ DELETE FROM Test_table WHERE Name like 'SQLFRNDZ'
Then verify that the row we deleted is tracked and inserted into audit table
/*Verify the deleted row*/ Select * from dbo.Test_table_Audit where Operation like 'Deleted'