Monday, 30 July 2012

SQL QUERY OPTIMIZATION AND EXECUTION UNDERSTANDING

Recently, I was reviewing  Education based application I developed. When I clicked to open course detail  screen It took almost 22-25 second , really it was too much time, at first glance I thought that my machine might be slow. I restarted my machine but problem was still there. This could be bad impact when my client going to use this application. I investigated and  found that my stored procedure is taking too much time. I checked the defects in query and after correction the execution time of my stored procedure reduced from 22-25 sec. to 12 sec. I experienced that most of programmers don't care about some minor but very important points and don't even know that how query executes on Server. This blog going to be very helpful for programmers. 
 
Writing optimized query, undoubtedly  , increases the performance of application. As developer mostly it is his  responsibility to write optimized queries rather than DBA to do this job. For writing optimized queries we need to understand  few very important points as developer ; 

What is a SQL execution Plan?
Every SQL query is divided in to the series of execution steps which are called operators. Each operator performs basic operations like insertion, search, scan, updation, aggregation etc.  There are two types of operators Logical operators and physical operators.
Logical Operators describe how the execution will be executed at a conceptual level whereas physical operators are the actual logic which perform the actions.

One logical operator can map to multiple physical operator. In rare scenario it can be vice versa. 
How Query Plan is Executed and parsed?
There are three phases by which SQL is parsed and executed. As Following graph shows;

The Query Processing Process
Parse :- In first step SQL Query is parsed to create processor tree which define logical steps to execute query.  SQL Server makes sure that all the object names do exist, and associates every table and column name on the parse tree with their corresponding object in the system catalog. This process is also called "algebrizer".

Optimize : - On the basis of processor tree defined by “algebrizer” Sql Finds optimized way to execute query using “Optimizer” which takes data statistics as;
  •  How many rows it contains.
  • How many unique data exists in rows.
  • Do table increases more than page size.
According to these statistics cost based plan is prepared. Every cost based plan prepared using resources of CPU and I/O. After doing that Optimizer chooses the best plan to execute.
The optimizer arrives to an estimated plan, for this estimated plan it tries to find an actual execution plan in the cache. Estimated plan is basically which comes out from the optimizer and actual plan is the one which is generated once the query is actually executed.
Execution : - Finally it executes plan sent by optimizer. 

Table Scanning 
In Where clause it goes record by record until it finds out the required result

 

Logical and Physical Reads. 
SQL Server allocates virtual memory to speed up operations like storing, retrieving of data.Every instance of SQL Server has its own cache. 
When data is read it is stored in the SQL cache until it’s not referenced or the cache is needed for some purpose. So rather than reading from physical disk its read from SQLcache. In the same way for writing, data is written back to disk only if it’s modified.

Logical Reads

Logical reads indicates total number of data pages needed to be accessed from data cache to process query. It is very possible that logical read will access the same data pages many times, so count of logical read value may be higher than actual number of pages in a table. Usually the best way to reduce logical read is to apply correct index or to rewrite the query.

Physical Reads

Physical read indicates total number of data pages that are read from disk. In case there is no data in data cache, the physical read will be equal to number of logical read. And usually it happens for the first query request. And for subsequent same query requests, the number will be substantially decreased because the data pages have been in data cache.

Problem with table Scan

Table scan operators are good if number of rows in table are less but if there is a table that contains million or more rows than table scan is not good idea.

Seek Scan

It does not scan all rows to go to a record.It uses b-tree  logic  to get the record as in following diagram.


Following diagram shows how b-tree finds record for clustered index.

Levels of a clustered index

What is Heap?

Heaps are the table without any Clustered index.By default, heap has single partition.Data within a table is grouped together into allocation unites based on their column data types, what it means is one kind of data types are stored together in allocation unites. Data within this allocation unit is stored in pages. Each pages are of size 8KB. Group of 8 pages is stored together and they are referred as Extent. Pages within a table store the data rows with structure which helps to search/locate data faster. If the data of table is not logically sorted, in other word there is no order of data specified in table it is called asHeap Structure.

IAM pages retrieve data in a single partition heap

 Seek Scan Vs Table Scan diagram;


Conclusion
After understanding Sql Statement Execution a good query should contain following characteristics.
  • Table should have primary key.
  • Table should have minimum of one clustered index.
  • Table should have appropriate amount of non-clustered index.
  • Non-clustered index should be created on columns of table based on query which is running
  • Do not to use Views or replace views with original source table.
  • Triggers should not be used if possible, incorporate the logic of trigger in stored procedure.
  • Remove any adhoc queries and use Stored Procedure instead.
  • Check if there is atleast 30% HHD is empty – it improves the performance a bit
  • If possible move the logic of UDF to SP as well.
  • Remove * from SELECT and use columns which are only necessary in code.
  • Remove any unnecessary joins from table.
  • If there is cursor used in query, see if there is any other way to avoid the usage of this (either by SELECT … INTO or INSERT … INTO, etc)