Archive
Simple vs Forced – Query Parameterization
On the drive back from a relaxing week off I had decided to write about Query Parameterization – this is the process where the query optimizer works on ad-hoc queries and chooses to take literal values used within a query predicate, and turn it into a parameter on the fly.
This optimised query is then the one used to check the query cache, and not the original query string, so it can result in a far higher percentage of query plan cache hits than would otherwise occur on ad-hoc queries.
There are two modes to this feature, and neither of them is ‘off’, you automatically have ‘simple’ parameterization turned on, and can increase this to ‘forced’ mode for your database if you desire. The difference in the two modes is that the simple mode will only deal with relatively simple literal values within the predicates, and will not consider every literal a candidate for parameterization.
So, overall it sounds a good thing but there are some known problems to parameterization, or parameter sniffing as it is oft called – the query plan generated for a given value of the parameter is not necessarily the optimal plan for another. Consider a search on a personnel database where the EmployeeAge is between 95 and 99 – Assuming an index is in place, you can already expect the data quantities to be low to nill, so the index will not tip and an index seek is the expected behaviour.
If the next time the query is executed the value is from 20 to 60, the number of people matching that value would be very high, yet the query plan in cache has already chosen the method by which the query will be executed, and it is no longer the most optimal.
From a user’s perspective this will lead what appears to be random / non-deterministic behaviour – sometimes the query will be very fast and other times it will be slow. There will not necessarily be a pattern to the behaviour because the query might be removed from cache when it is under memory pressure, or for a wide variety of other reasons the plan might be invalidated, the simplest being a data statistics invalidation.
So with the knowledge that parameter sniffing can cause issues, when should you consider investing the time into testing this setting? The time and effort to test could be considerable so I have written how I would come to that decision.
The first thing I check is the CPU utilisation on the SQL Server, in theory there should be some kind of issue forcing this to be considered and one of the issues that can occur from insufficient parameterization is an increase in the CPU utilisation due to a high query compilation rate. There are a lot of other reasons that can cause a high CPU such as incorrect indexing leading to table scans etc, but for the purpose of this explanation, we can assume that has already been checked and was not enough.
The second performance counters I would check are:
SQLServer : SQL Statistics: Auto-Param Attempts/Sec SQLServer : SQL Statistics: Batch Request / Sec
I could also check the plan cache hit ratio, or the SQL Compilations / sec figures, but having the Auto-Params and Batch figure per sec allows you to do a rough calculation on the ratio of incoming batches compared to the number of queries that auto-paramerterization is being attempted on (successfully or not). The higher the ratio, then the more ad-hoc queries are already being affected by the ‘simple’ mode parameterization.
If I see a high ratio, the next check would be on the plan cache statistics, primarily:
SQL Server : Plan Cache : Cache Object Counts for SQL Plans SQL Server : Plan Cache : Cache Pages for SQL Plans
This is to try get a judge of how many plans are sitting in the cache / whether the cache is under memory pressure. If you have a high number of SQL plan objects / cached pages then you can calculate whether you are under memory pressure. You can also look at the buffer manager performance counter for the page life expectancy value, but this could be a bit misleading since data pages will also be affecting this.
At this point I would start pulling the query plan cache, and checking the query text for a sample of plans, to check whether there was a high number of literals remaining within the submitted query, since these are the ones that the simple parameterisation failed to convert to parameters. When pulling the query plan cache, sorting them in SQL text order, allows you to easily spot when a near identical queries except for unconverted literals have been submitted but resulted in a different query plan.
There is no prescriptive guidance on how many of these near identical queries it takes before you consider it, but clearly a single duplicate is not an issue, whilst thousands of them could be quite an issue and be resulting in a high number of unnecessary SQL plan compilations, which is increasing the CPU utilisation of the server.
As you can tell, this setting is certainly not one that should be changed without some considerable forethought and then thorough testing to ensure the situation you have got is actually worth the switch, and preferably some very good instrumented testing in a realistic test environment to ensure the benefits you are getting from the increased level of parameterisation, are not being lost by occurences of non-optimal plan cache hits.
The Sequence of an Index Uniquifier
During a training session today, I was asked about the structure of the Uniquifier, and whether it was a straight identity column. Off the top of my head I couldn’t remember the exact structure, I considered it a 4 byte int, but was not sure whether it acted as a pure identity value or acted in a more complex manner when incrementing, so decided to investigate it tonight.
To start from the beginning, an index uniquifier is the term given to the field that is automatically generated by SQL Server when you create a clustered index, but the index key is not specified as unique. Since each record in the table has to be uniquely identifiable, SQL will automatically assigned a 4 byte field to the row to make it unique, commonly called the ‘Uniquifier’. At this point I am sure English scholars will be frowning, pondering on the nature of the word and whether it qualifies as English; however that the term used so we will run with it.
It is actually quite easy to see this field in action, let’s create a simple table:
CREATE TABLE dbo.unique_test ( firstname char(20) NOT NULL, surname char(20) NOT NULL ) ON [PRIMARY] GO CREATE CLUSTERED INDEX [ix_test] ON [dbo].[unique_test] ( [firstname] ASC ) ON [PRIMARY]
The clustered index is not unique, by design, so let’s start adding duplicate rows to see the effect:
insert into unique_test values ('John', 'Smith') go 10
The table now contains 10 rows, each with the same details. This does not cause any undue concern, because each row is actually still unique – the way to show this is using the DBCC INC and DBCC Page commands, I’ve cut the output down since it is so wide.
dbcc ind ('testdb','unique_test',1) PageFID PagePID IAMFID IAMPID PageType ------- ----------- ------ ----------- -------- 1 41 NULL NULL 10 1 174 1 41 1
The output shows a data page numbered 174 for my example and the IAM page with an ID of 41. We can crack open the page and view the contents very easily using DBCC Page.
dbcc dbcc traceon(3604) dbcc page (idtest,1,174,3)
The output is quite large, but in essence, the first record is stored with the following details:
UNIQUIFIER = [NULL] Slot 0 Column 1 Offset 0x4 Length 20 firstname = John Slot 0 Column 2 Offset 0x18 Length 20
The second record:
Slot 1 Column 0 Offset 0x33 Length 4 UNIQUIFIER = 1 Slot 1 Column 1 Offset 0x4 Length 20 firstname = John Slot 1 Column 2 Offset 0x18 Length 20 surname = Smith
The third record:
Slot 2 Column 0 Offset 0x33 Length 4 UNIQUIFIER = 2 Slot 2 Column 1 Offset 0x4 Length 20 firstname = John Slot 2 Column 2 Offset 0x18 Length 20 surname = Smith
And so forth. The first record’s uniquifier is visible and clearly named within the data page, but set to null. The second copy of the same value receives the uniquifier of one, the third copy receives a 2 etc. This count is maintained separately for each duplication, so the insert of a new name multiple times will also receive its own counter, beginning at null and working upwards, 1,2,3 etc. So just because the uniquifier is 4 bytes, this does not limit the total number of rows in the table to ~2.1 billion, but does logically limit the total number of duplicates to 2.1 billion. I must confess to not having tested that limit, generating 2.1 billion rows of duplicate data is not trivial and a scrapbook calculation predicts 435 hours of processing on a virtual pc. I suspect the error message it raises when it hits the limit would be interesting.
If we remove all the rows from the table and then add 10 more does the uniquifier reset? Easy to test but the short answer was no, the uniquifier continued to rise, 10 thru 19.
I was a bit suspicious of this since any requirement for the uniquifier to rise / remember what existed before requires it to be stored somewhere – it has to survive a crash after all, but there is no apparent place the current count is stored. If there was, you wouldn’t be storing just 1 value, you would be forced to store a value for each record key that had duplicates. This could run into thousands of separate counters being maintained per clustered key so it just doesn’t make sense that it is stored, it would be a very noticable overhead.
When checking the DBCC Ind for the empty table it insisted it still had a data page, but the only contents of the data page was a single ghost record – a row that has been marked as deleted. The ghost record was the for the ‘John Smith’ with the highest uniquifier before, was this coincidence? The other ghost records had not hung around, so why did this one persist?
I dropped and recreated the table again, inserted 10 rows and then deleted them. Checking DBCC Ind the table still showed a HoBT IAM allocation page for the table and a data page, the data page contained a single ghost record, the one with a Uniquifier of 9 – the highest given out when 10 duplicates were added. Even waiting some considerable time the ghost record was not cleaned up, so it appears that it will not delete it.
If I added another duplicate row, it picked up the next number in the sequence (10) and shortly after the ghost record was removed from the page. Very convenient and not a coincidence at all – the memory of the last uniquifier given out persists as a ghost record, even if all the duplicates for the table have been removed. What seems strange is this ghost record hanging about, persisting an entire record, to keep the duplicate count for that key, when no instances of it remain on the table.
It can not possibly do this for every key since the space overhead would become very noticable again, so how does it choose what to persist, the last entry? unfortunately it doesn’t appear that simple at all, after a number of tests it appeared to only be interested in keeping the ghost entry for the record that had the highest key value, so alphabetically, the one closed to ‘Z’ for my example.
Conclusion? On the evidence, whilst the other ghost records still persist for a short time, even deleting and then adding more duplicates can see the number continue from where it left of, but given a short time for the ghost records to be removed the uniquifier will restart the sequence back at Null,1,2 etc. Except in the case of the highest entry from the index perspective, that ghost record decides to stick around until there is another entry using the same key, continuing the sequence, at which point the ghost record then finally disappears.
I can not think of any sensible reason why it would do this, can you?
Overall, the uniquifier is a cost overhead of not having a unique index, and at a cost of 4 bytes, an int identity column makes a lot of sense – for all purposes it acts the same and serves the same purpose but in a far more visible manner – so it really does not make much sense to rely on the uniquifier provided for you, take control and create your own.
Guidance on How to Layout a Partitioned Table across Filegroups
SQL Server’s Table Partitioning was one of my favourite features within the 2005 release. It really brought SQL into the mainstream when it came to holding very large data quantities and allowed us to talk with confidence about large tables containing ‘Billions’ of rows and not be limited to the ‘Millions’. From extensive use and a couple of SQL Labs specifically surrounding the partitioning feature there are some rules and best practises I use to try maximise the benefit / flexibility of the partitioned table, without causing uneccesary drops in performance or increases in the disk space requirements.
Many examples of table partitions focus heavily on using date range based partition functions and schema, with a common layout mechanism of stripping the weeks / months across a number of file groups. The file groups are re-used and in the example picture, you can see 2 years worth of data stripped across 12 filegroups.
This is pretty common and has an alluring charm of simplicity, but it is going to hurt when you start rolling the window. The underlying problem is that there is no gap.
For a good number of systems it would be unacceptable to remove a month of data unless you had already inserted the new month sucessfully. So the system starts to roll a new month of data in, and is required to use the same filegroups, the layout transforms into the following:
The file group has to expand by 50% to accomodate the new data, before the old data can be removed – and once the old data is removed the filegroups now look like:
So the 50% of space is now wasted unless you use a shrink, which is probably the worst thing you can do to your filegroup and data files at that point in time. Shrink can fragment the data to the extreme and is to be avoided at all costs. Which means you will have to maintain a 50% space penalty for the layout on every filegroup. That might not sound a lot, but on a large database in an enterprise with mirrored SAN’s,that additional 50% is going to cost a substancial amount.
There are also other issues, SQL allows you to backup at a filegroup level and since the bulk of the data is historic and will not alter, you are forced to re-back up historic data (Jan 08) when you backup the recently inserted Jan 09 data. So there is an impact on backup space, backup times and restore times.
The simplicity of the initial layout makes it seem like a good idea, but the side-effects are not pleasant. You can alter the layout and choose to have 6 Filegroups, each storing 4 months of data, and then the expansion is only from 4 to 5, so a 25% overhead. It is better, but still is a cost overhead, The ultimate extreme is to then place it all in one filegroup, but there are a number of difficulties and contention points with that.
A better approach is to use a different file group per month, but then also create an additional spare filegroup, so that no filegroup is forced to expand, as shown:
The difference here is that we have one free filegroup that is not a part of the current partition scheme / function definition, but will be allocated as the ‘Next Used’ filegroup for the partition scheme, so that when we split the partition function that filegroup is brought into the scheme and used. The old data that is going to be removed, will provide an empty filegroup that will be used for the next month’s import. In essence the head of the partition function is chasing the tail and will never catch up.
The expansion of size for this is 1/n where n is the number of partitions to be stored, so for our 24 month example 1/24th expansion – a considerable saving. Even better is that you can choose to utilise file group backups for the older static months of data.
This though is not perfect for a couple of reasons, primarily data loading and indexing. To make the ‘scene’ better there should be a further filegroup dedicated to the data staging / loading that is going to occur to bring the data into the partitioned table.
The purpose of this is twofold:
- The disks used for loading are seperate from the disks used for the actual data so that we can maintain Quality of Service on the main table.
- The data wants to be loaded as a heap for speed, but is likely to be sorted prior to insertion, an in place sort would expand one of the main file groups which is to be avoided.
By loading into a staging filegroup, you give yourself the opportunity to then use a clustered index to sort the data and move it to the appropriate file group, prior to being constrained and switched in to the main partitioned table. If you had loaded the data into FG25 and then tried to apply a clustered index, it would of doubled the size of the filegroup again as it needs to write all the new rows out and commit them, before the heap could be deleted. That would put you back at square one wasting 50% of the disk space.
The staging filegroup does cost us another file group worth of space, so the 24 initial filegroups has grown to 26, which is still a smaller expansion than the potential 50%.
So some simple guidlines are:
- For a partition function that is going to have N partitions, create N+1 Filegroups.
- ETL / Stage your data into a dedicated staging file group
- Move your data using a clustered index creation to the ‘spare’ filegroup you always have.
- Switch new data in, then switch old out creating the new ‘spare’
It is not entirely flawless however – there is a problem in using such a design if the quantity of data per partition is going to vary by a large amount; you then have to provision each filegroup to have enough space to cope with the fluctations, which in itself can result in wasted space and starts chipping away at the gains made.
It works best as a technique on datasets that remain relatively consistent in terms of the number of rows per partition and as the number of partitions goes up, the savings increase.
What is the SQL Server 2008 DateTimeOffset Internal Structure?
After decoding the DateTime2 internal structure I thought I would take a quick look at the DateTimeOffset structure, since it should not present too many difficulties and post it up quickly, but was surprised at the initial result of a test. It follows the same basic premise that the time portion is followed by the date portion and the time / date is based on a day count from the epoch time of 0001/01/01 and the time the number of intervals since midnight, where the interval is defined by the accuracy.
I was expecting the datetime offset value itself to occupy the additional 2 bytes quoted, and not affect the other values. Bad assumption, as soon as I cracked open a few examples I could immediately see that setting the offset also alters the underlying time / date component values as well.
Using the same comparison methods as before the underlying time value is clearly being adjusted:
'0001/01/01 00:00:00 -0:00' => 0x0700000000000000000000 '0001/01/01 00:00:00 -12:00' => 0x0700E034956400000030FD
So, even though an offset is being stored, the underlying time is also being altered to match, and the internal storage is using UTC as the reference point. This makes sense as the most valid reference that you could use.
'0001-01-01 12:00:00.0000000 +00:00' => 0x0700E03495640000000000 '0001-01-01 00:00:00.0000000 -12:00' => 0x0700E034956400000030FD
The same time / date is generated for the two values, but the last two bytes hold the offset and have stored the offset used when the date was initially stored. The underlying storage of the time though is clearly identical in both, so UTC is the common ground they each get stored again.
The final 2 bytes for the offset are pretty easy to decode, since the pattern is the same as before with a slight twist. The offset time records the number of minutes for the offset in hex, with the first byte of the two being the least significant as before with the time, so you end up reading the two bytes left-to-right and then decode that byte right-to-left.
The twist is that for positive offsets, the value increments 1,2,3, in hex as appropriate, but for negative values, it starts decrementing by considering -1 equal to ‘FFFF’, I’ve split the hex output into the individual components by adding some spaces to make it easier to read. (Accuracy Code, Time Value, Date Value, Offset Used)
'2001-01-01 12:00:00.0000000 +00:01' => 0x07 009A717164 75250B 0100 '2001-01-01 12:00:00.0000000 +00:00' => 0x07 00E0349564 75250B 0000 '2001-01-01 12:00:00.0000000 -00:01' => 0x07 0026F8B864 75250B FFFF
Since the offsets supported at only +14 hours to -14 hours, there is no risk of the two ranges overlapping. When I think about this a bit more, it is acting as a signed number, -1 being 11111111111 etc. So the 2 bytes at the end is a signed int of the number of minutes offset.
There are a number of time zones in the world that do not occur at exact hourly intervals from UTC, some are on the half hour mark such as Caracas (-4:30) or Delhi (+5:30) to name a few, whilst Kathmandu (+5:45) is even more specific. In theory the format allows offsets specified to even greater levels of distinction, although I am not sure as to why you would wish to use it. Do you really want an offset of +3:17 minutes? That is potentially scary to consider that as a valid input to the value.
That has made me wonder as to why the accuracy was set so high, when in reality 15 minute intervals would of been sufficient, with a -14 to +14 range, that is 113 different values inclusively, which could be accomodated within a single byte.
So why spend 2 bytes on the format, when 1 was enough? Was it to just be compatible to the ISO format in some way that required it? Not sure.
What is the SQL Server 2008 DateTime2 Internal Structure?
SQL Server has a number of new date time formats, but the one I am most interested in is DateTime2. The internal format of the SQL DateTime is commonly mistaken as 2×4 byte integers, with the latter integer being milliseconds since midnight. It is in fact the number of 1/300ths of a second since midnight which is why the accuracy of the DateTime within SQL Server has historically been 3.33ms. (If you really want to see it, crack it open by converting it to a binary, adding 1 and re-converting, you add 3.33ms, not 1 ms.)
So DateTime2 must use a different format, and as a weekend exercise that had no purpose than understanding the internals I thought I’d take a look. I have not seen the information in the BoL or posted as yet, so might be of use. I am starting with the DateTime2(7) and looking at the maximum accuracy structure. The code used to crack it open each time is basically as follows:
declare @dt datetime2(7) set @dt = '2000/01/01 00:00:00' declare @bin varbinary(max) set @bin = CONVERT(varbinary(max), @dt)
To make my life easier, SQL conveniently outputs all the values as hexi-decimal numbers. The results are not what you would expect.
0x07000000000007240B
The date which traditionally occupied the first 4 bytes, clearly is occupying the last few bytes. So the format is not going to be obvious or simple. Interestingly the returned result is 9 bytes, but the length is quoted as 8. It is returning 8 when checked using the length, that first byte is somewhat odd to make an appearance. It’s also suspiciously the accuracy value, and with a few tests using a change of accuracy, it show that value changes. So the first pseudo-byte is the accuracy indicator.
To start figuring out some more, let’s take the time back to the beginning point, which in this case is not 1900/01/01 but 0001/01/01 which when converted gives us:
'0001/01/01 00:00:00' => 0x070000000000000000
Start incrementing the day portion and there is an obvious pattern, the 6th byte changes.
'0001/01/02 00:00:00' => 0x070000000000010000 '0001/01/03 00:00:00' => 0x070000000000020000 '0001/01/31 00:00:00' => 0x0700000000001E0000
As you try the 2nd month, to check where the month is, the same byte alters, so it represents days, not specific date parts. Is it the number of days since the beginning of the year? No.
'0001/02/01 00:00:00' => 0x0700000000001F0000
If it was, there would be an issue since 1 byte does not represent enough values, as we can see, FF occurs on the 13th of September, and then it rolls over and puts a 1 in the 7th Byte position.
'0001/09/13 00:00:00' => 0x070000000000FF0000 '0001/09/14 00:00:00' => 0x070000000000000100 '0001/09/15 00:00:00' => 0x070000000000010100
It rolls over, then carries on as before. This immediately suggests the next test, to roll over the year, and the pattern continues.
'0001/12/31 00:00:00' => 0x0700000000006C0100 '0001/12/31 00:00:00' => 0x0700000000006D0100
So the format is just counting, we see it in the example as hex, but it is a straight number count going on but the hex values are left-to-right. Only 2 bytes are used so far, which do not represent enough day combinations, add the third byte in by going past 180 years:
'0180/06/06 00:00:00' => 0x070000000000FFFF00 '0180/06/07 00:00:00' => 0x070000000000000001
So the final byte is then increased, so the number of combinations becomes 16777215 – that seems a lot better and certainly going to cover the range required.
'2001/01/01 00:00:00' => 0x07000000000075250B
So that is the final 3 bytes decoded, a simple pattern – and provides the template of how the time is also stored.
'0001/01/01 00:00:00.0000000' => 0x070000000000000000 '0001/01/01 00:00:00.0000001' => 0x070100000000000000 '0001/01/01 00:00:00.0000255' => 0x07FF00000000000000 '0001/01/01 00:00:00.0065535' => 0x07FFFF000000000000 '0001/01/01 00:00:00.0065536' => 0x070000010000000000
So to check whether the format is the same,
'0001/01/01 00:00:00.9999999' => 0x077F96980000000000
Decode that again and it all matches:
select (152 * 256 * 256) + (150 * 256) + 127
-----------
9999999
When we click over into 1 second exactly, we increment the first byte by 1, so the time portion is still represented in 100ns intervals, with the normal system of each byte counting up 1 every time the previous byte rolls over. As we get to the limit of the 3 bytes, it rolls into the 4th and then the 5th.
'0001/01/01 00:00:01.0000000' => 0x078096980000000000
So the internal format of the DateTime2(7) is decoded, not difficult but it is an interesting choice – it is now a straight binary number, with the Least Significant Byte being on the Left, the Most Significant being on the right (for each section.) Within the byte however, to convert it you must still read it right-to-left.
The first 5 bytes are recording how many time units intervals have passed since midnight, and the last 3 bytes recording how many days have passed since 0001/01/01.
The time unit intervals are dictated by the accuracy of the number, 100ns for DateTime2(7), and 1 Micro second intervals for a DateTime2(6) etc. The way in which you interpret it does not change, but the units you are multiplying the time portion by, alters based on the accuracy.
You could construct 2 dates that are identical at a binary level, but due to the field meta-data on accuracy, they do not represent the same date time.
declare @dt1 dt1 datetime2(6) set @dt1 = '0001/01/01 00:00:00.000001' declare @dt2 datetime2(7) set @dt2 = '0001/01/01 00:00:00.0000001' 0x060100000000000000 0x070100000000000000
And that is perhaps why on output they automatically have prefixed the binary value with the datetime accuracy, so that they are not entirely identical? I’m not sure but would be interested to find out.
Finding Next Identity Value, and a Wild Goose Chase.
A question asked on stack overflow was to find the next identity value that would occur on a table, without being required to add a record to work it out. The problem lies in that if the highest row is deleted, the number is not reused so any answers using the existing rows can be incorrect.
Logically the value must be stored, so first place I checked was the DMV’s. That stores the currently used value, but does not store the next value.
The wild goose chase started there…
- I used the dedicated admin console to pull all the system tables, expecting it to be in sys.syshobtcolumns, no joy. Dumped the whole system table contents before and after an insert looking for the difference and didn’t spot it.
- Took a dump of every page in the file before and after inserting a new row and having the textural dumps compared in a text comparison application, still no joy.
- Started dumping out log records using the following script and pulled the identity calls:
select * from ::fn_dblog(null, null) where operation = 'LOP_IDENT_NEWVAL'
- After a couple of hours running around the pages trying to find it, I realised I should of stuck with the DMV and I really went the wrong way around it.
The DMV has the right answer it seems, but as two fields you have to combine to get the answer.
create table foo (MyID int identity not null, MyField char(10)) insert into foo values ('test') go 10 -- Inserted 10 rows select Convert(varchar(8),increment_value) as IncrementValue, Convert(varchar(8),last_value) as LastValue from sys.identity_columns where name ='myid' -- insert another row insert into foo values ('test') -- check the values again select Convert(varchar(8),increment_value) as IncrementValue, Convert(varchar(8),last_value) as LastValue from sys.identity_columns where name ='myid' -- delete the rows delete from foo -- check the DMV again select Convert(varchar(8),increment_value) as IncrementValue, Convert(varchar(8),last_value) as LastValue from sys.identity_columns where name ='myid' -- value is currently 11 and increment is 1, so the next insert gets 12 insert into foo values ('test') select * from foo Result: MyID MyField ----------- ---------- 12 test (1 row(s) affected)
So adding the increment to the last value will predict the next value correctly, assuming someone else does not grab it in the mean time which is why it is not a good idea to use in code, but if you need to investigate a table and want to know what it thinks is next, without actually inserting a row and affecting the table, then it is useful.
All that and the easy way is then:
select ident_current(‘foo’) + ident_incr(‘foo’)
Ah well, was fun investigating it – but what a wild goose chase to find it was an easy answer.
Do Filtered Index Have Different Tipping Points?
I’ve been taking a look at filtered indexes and they do initially look particularly useful, but I was wandering what effect the filtered index has on the tipping point. The tipping point was described by Kimberley L. Tripp in an excellent post – and describes the point at which SQL opts to change the query plan strategy from a NC index seek to a scan of some kind (Table / Clustered Index)
So I was wondering whether Filtered indexes chose a different strategy, or would they ‘tip’ from a targeted index to a table scan in the same way.
First off was to create a dummy table:
create tabletblPersonnel ( PersonnelID Int Identity, FirstName Char(30), LastName Char(30), Department Char(30), SomePadding Char(250) )
This was deliberately created as a heap, since data had to be generated and loaded:
insert into tblPersonnel(firstname,lastname,Department) values( dbo.randomstring(20), dbo.randomstring(20), dbo.randomstring(20)); go 1000000
The dbo.randomstring is a user created function I added to generate a random string [A-Z] the length of the function parameter, so the insert statement is inserting random strings of length 20 into the first name, last name and department fields.
Once the data was loaded, a clustered index was applied.
alter table dbo.tblPersonnel add constraint PK_tblPersonnel primary key clustered(PersonnelID) with( statistics_norecompute = off, ignore_dup_key = off, allow_row_locks = on, allow_page_locks = on) on [PRIMARY] GO
A quick check of the table:
select COUNT(*) from tblpersonnel ----------- 1000000 (1 row(s) affected)
To get a rough idea of the tipping point we need to know the number of pages at the leaf level of the table.
select * from sys.objects where name ='tblpersonnel' -- grab the object id and query the physical stats. select * from sys.dm_db_index_physical_stats(DB_ID('FilteredIndexTest'), 1117247035 , 1,0,'DETAILED') -- 45455 pages shown in the table at leaf level 0 i.e. the clustered index
So create an NC index on the first name, last name.
create nonclustered index[IX_Normal] ON [dbo].[tblPersonnel] ( [FirstName] asc, [LastName] asc ) with( pad_index = off ,statistics_norecompute = off, ignore_dup_key = off, allow_row_locks = on, allow_page_locks = on) on [PRIMARY] GO
For reference, we will also check how many pages the NC index has:
select * from sys.dm_db_index_physical_stats(DB_ID('FilteredIndexTest'), 1117247035 , 2,0,'DETAILED') -- Level 0 of the index shows 8696 pages
The tipping point is expected between 25% to 33% of the number of pages when expressed as rows, so something between 11363 to 15000 rows is where to expect it. Since the data was random when inserted, a bit of human binary chop is needed to find the tipping point. After some experimentation – the dividing point was between the firstname like ‘a[a-h]%’ and ‘a[a-i]%’
select * from tblpersonnel where firstname like 'a[a-h]%' -- 11898 rows - seeks select * from tblpersonnel where firstname like 'a[a-i]%' -- 13299 rows - scans
So the tipping point showed up within the expected range, so next I created the filtered index based on FirstName ‘>=a’ and ‘<= ak’ since I can not use a like clause in the Filtered Index where statement.
create nonclustered index [IX_Filter] ON[dbo].[tblPersonnel] ( [FirstName] asc, [LastName] asc ) where[FirstName] >= 'a'and firstname <= 'ak' ) with( pad_index = off ,statistics_norecompute = off, ignore_dup_key = off, allow_row_locks = on, allow_page_locks = on) on [PRIMARY] GO
The filtered index is in play, so I disabled the non-filtered index and re-ran the same queries.
select * from tblpersonnel where firstname like 'a[a-h]%' -- 11898 rows - seeks select * from tblpersonnel where firstname like 'a[a-i]%' -- 13299 rows - scans
Conclusion: The tipping point did not alter based on the filtered index – it still tips based on the ratio of rows to total pages in the leaf level of the clustered index.