<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>SQL Fascination &#187; RandomString</title>
	<atom:link href="http://sqlfascination.com/tag/randomstring/feed/" rel="self" type="application/rss+xml" />
	<link>http://sqlfascination.com</link>
	<description>Weirdness and oddities within SQL</description>
	<lastBuildDate>Mon, 06 Feb 2012 16:13:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='sqlfascination.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>SQL Fascination &#187; RandomString</title>
		<link>http://sqlfascination.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://sqlfascination.com/osd.xml" title="SQL Fascination" />
	<atom:link rel='hub' href='http://sqlfascination.com/?pushpress=hub'/>
		<item>
		<title>Can a Covering NC Index be Tipped?</title>
		<link>http://sqlfascination.com/2009/11/07/can-a-covering-nc-index-be-tipped/</link>
		<comments>http://sqlfascination.com/2009/11/07/can-a-covering-nc-index-be-tipped/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 17:56:03 +0000</pubDate>
		<dc:creator>Andrew Hogg</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Indexes]]></category>
		<category><![CDATA[RandomString]]></category>
		<category><![CDATA[SQL Server 2005]]></category>
		<category><![CDATA[SQL Server 2008]]></category>
		<category><![CDATA[Tipping Point]]></category>

		<guid isPermaLink="false">http://sqlfascination.com/?p=198</guid>
		<description><![CDATA[Non-clustered indexes normally have a &#8216;tipping point&#8217;, which is the point at which the query engine decides to change strategies from seeking the index with a nested loop operator back to a seek on the underlying table or choosing to just scan the underlying table and ignore the index. Kimberley Tripp wrote a great article [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlfascination.com&amp;blog=9662534&amp;post=198&amp;subd=andrewhogg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Non-clustered indexes normally have a &#8216;tipping point&#8217;, which is the point at which the query engine decides to change strategies from seeking the index with a nested loop operator back to a seek on the underlying table or choosing to just scan the underlying table and ignore the index. <a href="http://www.sqlskills.com/blogs/kimberly/">Kimberley Tripp </a>wrote a great article about <a href="http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx">&#8216;The Tipping Point</a>&#8216; , and the guidance is at about the 25-33% the query engine will change strategies.</p>
<p>If the non-clustered index is a covering index (it contains all the fields within the query) the query engine does not take the same decision &#8211; it makes sense that if any change in strategy occurs, it would have to be at a far higher figure, and as we are about to see, it will not take that decision and tip.</p>
<p>To test what strategy the engine would use I created a test situation of 2 separate tables, with different page counts, due to the padding column forcing the second table to use far more pages (5953 pages vs 9233)</p>
<pre><span style="color:#0000ff;">CREATE TABLE</span> [dbo].[tblIxTest1]( [PersonnelID] [int] <span style="color:#0000ff;">IDENTITY</span>(1,1)<span style="color:#808080;"> NOT NULL</span>, [FirstName] [char](30) <span style="color:#808080;">NULL</span>, [LastName] [char](30) <span style="color:#808080;">NULL</span>,
   [Department] [char](30) <span style="color:#808080;">NULL</span>, [SomePadding] [char](10) <span style="color:#808080;">NULL</span>
) <span style="color:#0000ff;">ON </span>[PRIMARY]</pre>
<p>And,</p>
<pre><span style="color:#0000ff;">CREATE TABLE</span> [dbo].[tblIxTest2]( [PersonnelID] [int] <span style="color:#0000ff;">IDENTITY</span>(1,1) <span style="color:#808080;">NOT NULL</span>, [FirstName] [char](30) <span style="color:#808080;">NULL</span>, [LastName] [char](30) <span style="color:#808080;">NULL</span>,
   [Department] [char](30) <span style="color:#808080;">NULL</span>, [SomePadding] [char](1000) <span style="color:#808080;">NULL</span>
) <span style="color:#0000ff;">ON</span> [PRIMARY]</pre>
<p>Next step was to insert some data, I needed random data to be able to ensure the index was not unbalanced in some way, so I broke out my useful little random string generation function. I should mention how to create this, a SQL function will not directly support the inclusion of a Rand() call within them, any attempt to do this results in the error:</p>
<pre><span style="color:#ff0000;">Msg 443, Level 16, State 1, Procedure test, Line 13
Invalid use of a side-effecting operator 'rand' within a function.</span></pre>
<p>However, there is nothing stopping a view from using this, and the function from using the view to get around the limitation: </p>
<pre><span style="color:#0000ff;">Create View</span> [dbo].[RandomHelper] <span style="color:#0000ff;">as Select</span> <span style="color:#ff00ff;">Rand</span>() <span style="color:#0000ff;">as</span> r</pre>
<p>And then the function can be generated to use this, it is not necessarily the most efficient random string generation function, but it works nicely.</p>
<pre><span style="color:#0000ff;">CREATE FUNCTION</span> [dbo].[RandomString] (@Length <span style="color:#0000ff;">int</span>) <span style="color:#0000ff;">RETURNS varchar</span>(100)
<span style="color:#0000ff;">WITH EXECUTE AS CALLER
AS
BEGIN
</span>  <span style="color:#0000ff;">DECLARE</span> @Result <span style="color:#0000ff;">Varchar</span>(100)
  <span style="color:#0000ff;">SET</span> @Result = <span style="color:#ff0000;">''
</span>  <span style="color:#0000ff;">DECLARE</span> @Counter <span style="color:#0000ff;">int</span>
  <span style="color:#0000ff;">SET</span> @Counter = 0
  <span style="color:#0000ff;">WHILE</span> @Counter &lt;= @Length
  <span style="color:#0000ff;">BEGIN</span>
     <span style="color:#0000ff;">SET </span>@Result = @Result + <span style="color:#0000ff;">Char</span>(<span style="color:#ff00ff;">Ceiling</span>((<span style="color:#0000ff;">select</span> R <span style="color:#0000ff;">from</span> randomhelper) * 26) + 64)       
     <span style="color:#0000ff;">SET </span>@Counter = @Counter + 1   <span style="color:#0000ff;">END</span>
  <span style="color:#0000ff;">RETURN</span>(@Result)
<span style="color:#0000ff;">END</span></pre>
<p>This now allows me to generate random data and insert it into the tables to get a nice data distribution, and this was run for both of the tables.</p>
<pre><span style="color:#0000ff;">insert into </span>tblIxTest1 <span style="color:#0000ff;">values</span> (dbo.RandomString(20),dbo.RandomString(20),dbo.RandomString(20),<span style="color:#ff0000;">''</span>)
<span style="color:#0000ff;">go </span>1000000</pre>
<p>Two NC indexes are now needed, one for each table and both are identical and cover just the FirstName and PersonnelID fields within the table.</p>
<pre><span style="color:#0000ff;">CREATE NONCLUSTERED INDEX</span> [IX_Test1] <span style="color:#0000ff;">ON</span> [dbo].[tblIxTest1] ( [FirstName] <span style="color:#0000ff;">ASC</span>, [PersonnelID] <span style="color:#0000ff;">ASC</span>
)<span style="color:#0000ff;">WITH</span> (<span style="color:#0000ff;">STATISTICS_NORECOMPUTE</span>  = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">SORT_IN_TEMPDB</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">IGNORE_DUP_KEY</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">DROP_EXISTING</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">ONLINE</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">ALLOW_ROW_LOCKS</span>  = <span style="color:#0000ff;">ON</span>, <span style="color:#0000ff;">ALLOW_PAGE_LOCKS</span>  = <span style="color:#0000ff;">ON</span>) <span style="color:#0000ff;">ON</span> [PRIMARY]
GO
<span style="color:#0000ff;">CREATE NONCLUSTERED INDEX</span> [IX_Test2] <span style="color:#0000ff;">ON</span> [dbo].[tblIxTest2] ( [FirstName] <span style="color:#0000ff;">ASC</span>, [PersonnelID] <span style="color:#0000ff;">ASC</span>
)<span style="color:#0000ff;">WITH</span> (<span style="color:#0000ff;">STATISTICS_NORECOMPUTE</span>  = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">SORT_IN_TEMPDB</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">IGNORE_DUP_KEY</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">DROP_EXISTING</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">ONLINE</span> = <span style="color:#0000ff;">OFF</span>, <span style="color:#0000ff;">ALLOW_ROW_LOCKS</span>  = <span style="color:#0000ff;">ON</span>,<span style="color:#0000ff;"> ALLOW_PAGE_LOCKS</span>  = <span style="color:#0000ff;">ON</span>) <span style="color:#0000ff;">ON</span> [PRIMARY]
GO</pre>
<p>The setup is complete and it is pretty easy to now show the NC covering index is not going to tip, the most extreme where clause is where I am allowing every record to be returned:</p>
<pre><span style="color:#0000ff;">select</span> personnelid , firstname <span style="color:#0000ff;">from</span> tblixtest1 <span style="color:#0000ff;">where</span> firstname &gt;= 'a' <span style="color:#0000ff;">and</span> firstname &lt;= <span style="color:#ff0000;">'zzzzzzzzzzzzzzzzzzzzz'</span></pre>
<p>This still produces a query plan with a seek strategy, regardless of which of my two tables it was executed on:</p>
<pre>select personnelid , firstname from tblixtest1  where firstname &gt;= 'a' and firstname &lt;= 'zzzzzzzzzzzzzzzzzzzzz'  
    |--Index Seek(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1]), SEEK:([FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] &gt;= [@1] AND [FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] &lt;= [@2]) ORDERED FORWARD)</pre>
<p>If we just select the entire table, unsurprisingly at that point it chooses to perform an index scan.</p>
<pre><span style="color:#0000ff;">select</span> personnelid , firstname <span style="color:#0000ff;">from </span>tblixtest1</pre>
<p>Results in the following plan:</p>
<pre>select personnelid , firstname from tblixtest1   |--Index Scan(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1])) </pre>
<p>The row counts on both queries were identical at 1 million. Slightly more interesting is that if I use a Like clause instead of a direct string evaluation, the behaviour alters slightly when selecting all the values:</p>
<pre><span style="color:#0000ff;">select </span>personnelid , firstname <span style="color:#0000ff;">from </span>tblixtest1 <span style="color:#0000ff;">where</span> firstname like '<span style="color:#ff0000;">[a-z]%</span>'</pre>
<p>Gives the query plan:</p>
<pre>select personnelid , firstname from tblixtest1  where firstname like '[a-z]%'  
   |--Index Scan(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1]),  WHERE:([FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] like '[a-z]%'))</pre>
<p>So the query engine is potentially making an optimisation that it knows the like clause covers 100% and adopts an automatic scan, but it is not really very clear why it has this optimisation path. If the like clause changes to [a-y] then it reverts back to a seek, so it looks specific to covering all the values within the like statement. If a between statement is used, it remains a seek regardless.</p>
<p>So the result is that a Non-clustered covering index is very unlikely to tip, you either have to not give it a where clause, or use a like statement across all the values available, it will steadfastly refuse to seek and choose the scan.</p>
<p>Why?</p>
<p>Well the I/O cost of the operation remains the same, it has to read every page in the table and it considered the cost of traversing the B-Tree negligible, so the difference between seek and scan is not very great. Running the seek based query and scan based query in the same batch the relative percentages are 48% vs 52% &#8211; that is the scan scoring 52% even though they read the same number of rows.</p>
<p>Outputting the IO statistics when they are run side by side shows the same number of pages being read, but the seek is clearly being favoured and is slightly faster as far as SQL is concerned &#8211; it is quite weird to consider a seek of an entire index is more efficient than a scan of the index.</p>
<pre>(1000000 row(s) affected)
Table 'tblIxTest1'. Scan count 1, logical reads 5995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1000000 row(s) affected)
Table 'tblIxTest1'. Scan count 1, logical reads 5995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.</pre>
<p>So if you come across a covering index in a query plan that is scanning, it would be worth investigating as to whether it is intended. The chances are more likely the index field order is not supporting the predicates being used, than engine has chosen to tip the index like it would for the non-covering non-clustered indexes.</p>
<br />Posted in SQL Server Tagged: Indexes, RandomString, SQL Server 2005, SQL Server 2008, Tipping Point <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andrewhogg.wordpress.com/198/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andrewhogg.wordpress.com/198/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andrewhogg.wordpress.com/198/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andrewhogg.wordpress.com/198/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andrewhogg.wordpress.com/198/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andrewhogg.wordpress.com/198/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andrewhogg.wordpress.com/198/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andrewhogg.wordpress.com/198/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andrewhogg.wordpress.com/198/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andrewhogg.wordpress.com/198/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andrewhogg.wordpress.com/198/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andrewhogg.wordpress.com/198/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andrewhogg.wordpress.com/198/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andrewhogg.wordpress.com/198/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlfascination.com&amp;blog=9662534&amp;post=198&amp;subd=andrewhogg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlfascination.com/2009/11/07/can-a-covering-nc-index-be-tipped/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8215e290861f1c44a457d26c4f24af70?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andrewhogg</media:title>
		</media:content>
	</item>
	</channel>
</rss>
