Query Parameterization and Wildcard Searches
Time flies when you’re busy, and it has been far too long since I last posted. To business however, and I noticed a problem in query parameterization the other day which does not make much sense at first glance. To demonstrate, I will use the AdventureWorks example database and use a couple of simple queries. As with all parameterization issues, you need to make sure that you know which mode the database is within, so I’ll begin by setting it to Simple Mode
ALTER DATABASE AdventureWorks SET PARAMETERIZATION SIMPLE
And then run two queries, separately so that they are not considered a single batch.
Select * from HumanResources.Employee where loginid like '%a%' Select * from HumanResources.Employee where loginid like '%b%'
Under simple parameterization it should not be too surprising to see that when the query cache is inspected, the queries have not been parameterized, and 2 entries exist within the cache. So what happens when the mode is changed to Forced?
ALTER DATABASE AdventureWorks SET PARAMETERIZATION FORCED
Clear down the query cache and try the two queries again, in the hope of a plan cache hit – and it hasn’t changed. Two query plans still show in the cache and there was no parameterization. Perhaps it is the existence of the 2 wildcard characters? no, altering the wild-cards makes no difference, removing them entirely still results in the query plan generating a separate plan cache entry.
Parameterization is not limited to dates and numbers, it will work on strings without any problem, but clearly the introduction of the like clause prevents the cache hit. This behaviour is on both SQL Server 2005 and 2008 – which is a bit annoying.
So how can we get around this problem?
Well bizarrely by just using a different syntax to mean the same thing. PatIndex works just like the like clause and takes a wildcard, but returns the position. In a like clause we are just interested in a match – whilst the pat index gives us a character position. If the pattern is not found it returns zero, so the simple replacement is to using patindex and look for any values greater than zero.
Select * from HumanResources.Employee where patindex('%a%',loginid) > 0 Select * from HumanResources.Employee where patindex('%b%',loginid) > 0
In simple mode this still produces 2 cache hits, but in forced mode you get a plan cache hit finally!
If solving it was only that simple… by using PatIndex the query where clause has become non-sargable – which makes no difference if you have a wild card either side of your expression but if you only had a trailing wildcard then this would produce a very bad performance hit. The cost of the extra query plan in memory is unlikely to be more than the use of scans to resolve the query, so faced with a few additional query plans in memory using wildcards, you might be best to leave them there.
Excellent…
will have to give this a try and see the results. We seem to have had an increase in ‘%?%’ like queries just lately!
thanks andrew