Statistic Concept and Performance Effect on SQL Server

 

We will talk about statistic concept in this article.

In the article “Index Concept and Performance Effect on SQL Server” we mentioned how index will affect the performance of a query.

When a query arrives at SQL Server, SQL Server decides how the query will work.

It benefits from statistics when deciding how the query will work. The statistics show the distribution of the data in the table or indexed views.

 

Index Access Methods

By using statistics in SQL Server, the query allows one or more of the following methods to be combined to provide access to the query it is looking for.

  • It goes to the index and finds records that it searched for without searching all of the index(Index Seek)
  • It goes to the index and finds records that it searched for with scanning all of the index(Index Scan)
  • First it goes to the index and finds a column that it searched for without searching all of the index, then it goes to the clustered index with the key value on that index and finds the other columns of the record it is looking for. (Index Seek+Key Lookup)
  • First it goes to the index and finds a column that it searched for with searching all of the index, then it goes to the clustered index with the key value on that index and finds the other columns of the record it is looking for. (Index Scan+Key Lookup)
  • It goes to the clustered index and finds records that it searched for without searching all of the clustered index (Clustered Index Seek)
  • It goes to the clustered index and finds records that it searched for with scanning all of the clustered index (Clustered Index Scan)

 

For example, a table named Table1 would have 100 records.

This table is an index in column A. 3 of 100 records in column A contain Tom value. 97 of them also include the value of Jerry.

When I run the below query,

Sql Server will go to the statistics and find Tom’s ratio to 3% on the table, and with statistics it will decide that index seek is the fastest solution.

 

When I run the below query,

 

Sql Server will go to the statistics and find Jerry‘s ratio to 97% on the table, with statistics it will decide that index scan is the fastest solution.

It would be faster to scan the table from scratch instead of seeking 97 times.

If the above query comes from within a sp, the query plan will be created once, and the second time, parameter sniffing will occur due to first query plan to be used.

I would recommend reading the articles named “Sp(Stored Procedure) On SQL Server” and “What is Parameter Sniffing“.

 

Let’s go back to the statistics

When we create an index, the statistics for that index are automatically generated.

But if the index contains more than one column, only statistics for the first column are generated.

When we rebuild the index, the statistics are updated.

There are a few settings related to statistics on a database basis.

Right-click on the database and select Properties, then select the Options tab.

 

 

Statistic Settings

Auto Create Statistics If set to true and if there is no statistics on where clause of the query, the statistics will be created automayically on that column. Default is True. I would recommend you leave it like this.
Auto Update Statistics If set to true, the statistic is automatically updated when the row change on the table exceeds 20%.

When the change exceeds 20%, the first query to wait for the update of the statistic. Auto value is True.

If there is no performance bottleneck on your system, it may remain True.

But if you do False and you do not have a job that updates statistics, your system’s performance will gradually decrease because the statistics will deteriorate over time.

Auto Update Statistics Asynchronously Used with Auto Update Statistics feature.

If you set it to True, you will ensure that the statistics update operation works asynchronously.

And incoming query will not wait this operation.

Auto Create Incremental Statistics It is a feature that comes with SQL Server 2014.

If your database has a partition, it can be very useful, if not , it is not needed.

When you set Auto Update Statistics to True, the statistics are updated when the row change on the table exceeds 20%.

And if you set Auto Create Incremental Statistics to True, it will be partition based.

You can reduce the load on the system by updating statistic by doing partition basis.

I have not enabled this feature even in a database with a very busy transaction size of 20 TB.

Because if you do not do statistical update FULLSCAN (I think it is not necessary), it is not a huge load on the system and it happens very fast.

 

 

 

Update Statistics Methods

There are two ways to update stats.

sp_updatestats Updates all statistics in the database. İt does not FULLSCAN.
UPDATE STATISTICS You can update your statistics on a statistical or table basis.

The “Update Statistics dbo.myIdentity (ss)” command updates the ss statistics in the myIdentity table.

The “Update Statistics dbo.myIdentity” command updates the statistics in the myIdentity table.

The “Update Statistics dbo.myIdentity WITH FULLSCAN” command updates all statistics in the myIdentity table by scanning the entire table. If the table is big, it will be long.

I have never used it until today. I do not think it’s necessary.

 

UPDATE STATISTICS

The most important thing you need to know about statistics is that they have to stay up to date.

In order to keep them up-to-date, you need to have a job that updates statistics outside of the settings I mentioned above.

I turned the following script into a regular job.

I usually create it to work once a week, but on some systems the statistics break down more quickly, and after a few hours of updating, the statistics may be degraded and performance may start to decline.

So if there is a performance problem in a system, the first thing we have to do is update the statistics.

On a heavily used system, you may need to set the job that updates the statistics to run one hour at a time.

Of course you can set it up to work once a week and update your stats more often if you need to.

I will share the script below which will update all statistics on all the databases on the Instance and create them as jobs to work once a week.

If there is a database in restoring, read only or offline mode on the instance, the script will fail.