Archive

Archive for October 31, 2009

Simple vs Forced – Query Parameterization

October 31, 2009 Leave a comment

On the drive back from a relaxing week off I had decided to write about Query Parameterization – this is the process where the query optimizer works on ad-hoc queries and chooses to take literal values used within a query predicate, and turn it into a parameter on the fly.

This optimised query is then the one used to check the query cache, and not the original query string, so it can result in a far higher percentage of query plan cache hits than would otherwise occur on ad-hoc queries.

 There are two modes to this feature, and neither of them is ‘off’, you automatically have ‘simple’ parameterization turned on, and can increase this to ‘forced’ mode for your database if you desire. The difference in the two modes is that the simple mode will only deal with relatively simple literal values within the predicates, and will not consider every literal a candidate for parameterization.

So, overall it sounds a good thing but there are some known problems to parameterization, or parameter sniffing as it is oft called – the query plan generated for a given value of the parameter is not necessarily the optimal plan for another. Consider a search on a personnel database where the EmployeeAge is between 95 and 99 – Assuming an index is in place, you can already expect the data quantities to be low to nill, so the index will not tip and an index seek is the expected behaviour.

If the next time the query is executed the value is from 20 to 60, the number of people matching that value would be very high, yet the query plan in cache has already chosen the method by which the query will be executed, and it is no longer the most optimal.

From a user’s perspective this will lead what appears to be random / non-deterministic behaviour – sometimes the query will be very fast and other times it will be slow. There will not necessarily be a pattern to the behaviour because the query might be removed from cache when it is under memory pressure, or for a wide variety of other reasons the plan might be invalidated, the simplest being a data statistics invalidation.

So with the knowledge that parameter sniffing can cause issues, when should you consider investing the time into testing this setting? The time and effort to test could be considerable so I have written how I would come to that decision.

 The first thing I check is the CPU utilisation on the SQL Server, in theory there should be some kind of issue forcing this to be considered and one of the issues that can occur from insufficient parameterization is an increase in the CPU utilisation due to a high query compilation rate. There are a lot of other reasons that can cause a high CPU such as incorrect indexing leading to table scans etc, but for the purpose of this explanation, we can assume that has already been checked and was not enough.

The second performance counters I would check are:

SQLServer : SQL Statistics: Auto-Param Attempts/Sec
SQLServer : SQL Statistics: Batch Request / Sec

I could also check the plan cache hit ratio, or the SQL Compilations / sec figures, but having the Auto-Params and Batch figure per sec allows you to do a rough calculation on the ratio of incoming batches compared to the number of queries that auto-paramerterization is being attempted on (successfully or not). The higher the ratio, then the more ad-hoc queries are already being affected by the ‘simple’ mode parameterization.

If I see a high ratio, the next check would be on the plan cache statistics, primarily:

SQL Server : Plan Cache : Cache Object Counts for SQL Plans
SQL Server : Plan Cache : Cache Pages for SQL Plans

This is to try get a judge of how many plans are sitting in the cache / whether the cache is under memory pressure. If you have a high number of SQL plan objects / cached pages then you can calculate whether you are under memory pressure. You can also look at the buffer manager performance counter for the page life expectancy value, but this could be a bit misleading since data pages will also be affecting this.

At this point I would start pulling the query plan cache, and checking the query text for a sample of plans, to check whether there was a high number of literals remaining within the submitted query, since these are the ones that the simple parameterisation failed to convert to parameters. When pulling the query plan cache, sorting them in SQL text order, allows you to easily spot when a near identical queries except for unconverted literals have been submitted but resulted in a different query plan.

There is no prescriptive guidance on how many of these near identical queries it takes before you consider it, but clearly a single duplicate is not an issue, whilst thousands of them could be quite an issue and be resulting in a high number of unnecessary SQL plan compilations, which is increasing the CPU utilisation of the server.

As you can tell, this setting is certainly not one that should be changed without some considerable forethought and then thorough testing to ensure the situation you have got is actually worth the switch, and preferably some very good instrumented testing in a realistic test environment to ensure the benefits you are getting from the increased level of parameterisation, are not being lost by occurences of non-optimal plan cache hits.