07 | November | 2009 | SQL Fascination

Can a Covering NC Index be Tipped?

November 7, 2009 Andrew Hogg Leave a comment

Non-clustered indexes normally have a ‘tipping point’, which is the point at which the query engine decides to change strategies from seeking the index with a nested loop operator back to a seek on the underlying table or choosing to just scan the underlying table and ignore the index. Kimberley Tripp wrote a great article about ‘The Tipping Point‘ , and the guidance is at about the 25-33% the query engine will change strategies.

If the non-clustered index is a covering index (it contains all the fields within the query) the query engine does not take the same decision – it makes sense that if any change in strategy occurs, it would have to be at a far higher figure, and as we are about to see, it will not take that decision and tip.

To test what strategy the engine would use I created a test situation of 2 separate tables, with different page counts, due to the padding column forcing the second table to use far more pages (5953 pages vs 9233)

CREATE TABLE [dbo].[tblIxTest1]( [PersonnelID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [char](30) NULL, [LastName] [char](30) NULL,
   [Department] [char](30) NULL, [SomePadding] [char](10) NULL
) ON [PRIMARY]

And,

CREATE TABLE [dbo].[tblIxTest2]( [PersonnelID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [char](30) NULL, [LastName] [char](30) NULL,
   [Department] [char](30) NULL, [SomePadding] [char](1000) NULL
) ON [PRIMARY]

Next step was to insert some data, I needed random data to be able to ensure the index was not unbalanced in some way, so I broke out my useful little random string generation function. I should mention how to create this, a SQL function will not directly support the inclusion of a Rand() call within them, any attempt to do this results in the error:

Msg 443, Level 16, State 1, Procedure test, Line 13
Invalid use of a side-effecting operator 'rand' within a function.

However, there is nothing stopping a view from using this, and the function from using the view to get around the limitation:

Create View [dbo].[RandomHelper] as Select Rand() as r

And then the function can be generated to use this, it is not necessarily the most efficient random string generation function, but it works nicely.

CREATE FUNCTION [dbo].[RandomString] (@Length int) RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
  DECLARE @Result Varchar(100)
  SET @Result = ''
  DECLARE @Counter int
  SET @Counter = 0
  WHILE @Counter <= @Length
  BEGIN
     SET @Result = @Result + Char(Ceiling((select R from randomhelper) * 26) + 64)       
     SET @Counter = @Counter + 1   END
  RETURN(@Result)
END

This now allows me to generate random data and insert it into the tables to get a nice data distribution, and this was run for both of the tables.

insert into tblIxTest1 values (dbo.RandomString(20),dbo.RandomString(20),dbo.RandomString(20),'')
go 1000000

Two NC indexes are now needed, one for each table and both are identical and cover just the FirstName and PersonnelID fields within the table.

CREATE NONCLUSTERED INDEX [IX_Test1] ON [dbo].[tblIxTest1] ( [FirstName] ASC, [PersonnelID] ASC
)WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Test2] ON [dbo].[tblIxTest2] ( [FirstName] ASC, [PersonnelID] ASC
)WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

The setup is complete and it is pretty easy to now show the NC covering index is not going to tip, the most extreme where clause is where I am allowing every record to be returned:

select personnelid , firstname from tblixtest1 where firstname >= 'a' and firstname <= 'zzzzzzzzzzzzzzzzzzzzz'

This still produces a query plan with a seek strategy, regardless of which of my two tables it was executed on:

select personnelid , firstname from tblixtest1  where firstname >= 'a' and firstname <= 'zzzzzzzzzzzzzzzzzzzzz'  
    |--Index Seek(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1]), SEEK:([FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] >= [@1] AND [FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] <= [@2]) ORDERED FORWARD)

If we just select the entire table, unsurprisingly at that point it chooses to perform an index scan.

select personnelid , firstname from tblixtest1

Results in the following plan:

select personnelid , firstname from tblixtest1   |--Index Scan(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1]))

The row counts on both queries were identical at 1 million. Slightly more interesting is that if I use a Like clause instead of a direct string evaluation, the behaviour alters slightly when selecting all the values:

select personnelid , firstname from tblixtest1 where firstname like '[a-z]%'

Gives the query plan:

select personnelid , firstname from tblixtest1  where firstname like '[a-z]%'  
   |--Index Scan(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1]),  WHERE:([FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] like '[a-z]%'))

So the query engine is potentially making an optimisation that it knows the like clause covers 100% and adopts an automatic scan, but it is not really very clear why it has this optimisation path. If the like clause changes to [a-y] then it reverts back to a seek, so it looks specific to covering all the values within the like statement. If a between statement is used, it remains a seek regardless.

So the result is that a Non-clustered covering index is very unlikely to tip, you either have to not give it a where clause, or use a like statement across all the values available, it will steadfastly refuse to seek and choose the scan.

Why?

Well the I/O cost of the operation remains the same, it has to read every page in the table and it considered the cost of traversing the B-Tree negligible, so the difference between seek and scan is not very great. Running the seek based query and scan based query in the same batch the relative percentages are 48% vs 52% – that is the scan scoring 52% even though they read the same number of rows.

Outputting the IO statistics when they are run side by side shows the same number of pages being read, but the seek is clearly being favoured and is slightly faster as far as SQL is concerned – it is quite weird to consider a seek of an entire index is more efficient than a scan of the index.

(1000000 row(s) affected)
Table 'tblIxTest1'. Scan count 1, logical reads 5995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1000000 row(s) affected)
Table 'tblIxTest1'. Scan count 1, logical reads 5995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So if you come across a covering index in a query plan that is scanning, it would be worth investigating as to whether it is intended. The chances are more likely the index field order is not supporting the predicates being used, than engine has chosen to tip the index like it would for the non-covering non-clustered indexes.

Categories: SQL Server Tags: Indexes, RandomString, SQL Server 2005, SQL Server 2008, Tipping Point

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

SQL Fascination

Archive

Can a Covering NC Index be Tipped?

Recent Posts

Blogroll

Email Subscription

Archives