Archive

Posts Tagged ‘SQL Server 2005’

Rolling a Partition Forward – Part 1

December 9, 2009 Leave a comment

I have covered how to layout a partitioned table across filegroups previously, but have not gone through the steps of rolling a partitioned window – it sounds a simple process but with all the file group and pre-requisites for it to run smoothly anyone starting with partitioned tables could probably use a little guide. As you are about to see the process is quite intricate so I will go through the load process on this post and the unload on the next.

Because no one case fits all, I have made some assumptions / limitations to provide a guide, specifically:

  • The main partitioned table has a clustered index.
  • The layout is following the mechanism of keeping a staging filegroup and spare filegroup as detailed in the layout post.
  • The rollout process intends to remove the oldest data / partition.
  • The process is designed for large loads, not single inserts.

So let’ s see what it takes to prepare and get data into a partitioned table:

  • Create a staging table on your dedicated ETL filegroup, of an identical column schema to your partitioned table.
  • Load the data into the staging table.
  • Move the staging table to the spare filegroup, using a clustered index creation. (The need for the spare was covered in the layout post)
  • Add any additional Non-Clustered indexes required to match the partitioned table indexes.
  • Constrain the data so that it is considered trusted – the constraint must ensure all values are within the partition boundary you intend to place it within.
  • Set the Partition Schema Next Used Filegroup
  • Split the Partition Function
  • Switch the staging table into the main partitioned table

That was all just to bulk load data into a partitioned table – a long list and plenty of opportunity for it to go wrong, but most of these steps use T-SQL that you will be very familiar with – it is only the last 3 items that use less common SQL and are harder to automate, since there is no built-in tools to do the work for you.

Setting the Next Used Filegroup

The intention when setting the filegroup is to declare where the partition should locate data for the partition function split that is about to occur. Whilst you can discover what the previous setting might be, it is not advisable to rely on it but set it every time, just before performing a partition function split. The syntax for the command is:

ALTER PARTITION SCHEME YourPartitionSchemeName NEXT USED [YourSpareFG]

Splitting the Partition Function

This splitting of the partition function is in effect the creation of an extra dividing section on the number line / date line representing the partitioned table. If you split a partition that already has data the operation will be quite expensive since can be forced to move data between filegroups, so it is common in a rolling window scenario that you split to handle the incoming data, which is always in advance of your existing data, e.g. If you are storing sales data partitioned by the month/year of the sales date, and currently only hold up until November, you would not insert any data for December until the partition for December had been created.

The syntax forward:

ALTER PARTITION FUNCTION YourPartitionFunctionName() SPLIT RANGE (YourBoundaryValue)

But when importing new data in an automated fashion, you might not know whether the new partition split has already been performed or not, so how can you check whether the new boundary value is already created in the partition function? DMV’s can provide the answer:

SELECT count(value) as ValueExists FROM sys.partition_range_values
INNER JOIN sys.PARTITION_FUNCTIONS ON  sys.PARTITION_FUNCTIONS.function_id  = sys.partition_range_values.function_id
WHERE name = 'YourPartitionFunctionName' AND value = YourBoundaryValue

A returned value of 0 would indicate it did not exist, whilst a 1 would indicate a boundary value had already been created.

Switching the Staging Table In

Switching the staging table into the newly created partition looks relatively easy but needs the partition number:

ALTER TABLE yourStagingTable SWITCH TO YourPartitionedTable PARTITION PartitionNumber

Where do you get the partition number from? The partition number is the boundary ID, and is numbered starting at 1 from the furthers left partition sequentially upwards. If you know the boundary value you have set for the partition, you can get the boundary id using the DMV’s again

SELECT boundary_id
FROM sys.partition_range_values
INNER JOIN sys.partition_functions ON sys.partition_functions.function_id  = sys.partition_range_values.function_id
WHERE  name = 'YourPartitionFunctionName' AND value= YourBoundaryValue

These additional DMVs allow you to get access to the data you need to automate the process in stored procedures, finding out the boundary IDs in one step, to be used in the next step etc.

These are the trickier parts of the process to automate that need the help of the DMVs. In the next post I will go through the unloading of the old data.

How Can You Spot the Procedure Cache Being Flooded?

December 3, 2009 Leave a comment

This comes from a question I had a couple of days ago – the SQL Server: Buffer Manager : Page Life Expectancy provides a performance counter that indicates the current lifetime of a page within memory. As data pages and query object pages are being added to the buffer pool they will of course build up and SQL will come under memory pressure as a result. The normal advice is that this figure should be above 300 seconds, indicating that a page should stay in memory for at least 5 minutes.

This figure however, includes both the data cache and the procedure cache – which means you can not determine whether the pages being flushed are a result of churning data pages or you are in a situation where ad hoc queries are flooding the procedure cache. You can of course look at the procedure cache using DMV’s and see the number of objects grow and then shrink, but this is not particularly scientific, nor is it measurable within a trace.

The page life expectancy can easily be traced within Perfmon, but how do you measure the procedure cache? well are a couple of events you can trace in SQL profiler, the primary one I would like to be working do not seem to properly register the event, whilst the secondary does at least work. The two counters are SP:Cache Remove and SP:Cache Insert.

The SP:Cache Remove has 2 Event Sub Classes listed in documentation produced by the SQL Programmability team, sub class 2 is for a deliberate procedure cache flush, such as a DBCC FreeProcCache command, sub class 1 is for when a compiled plan is removed due to memory pressure. In testing the deliberate procedure cache flush does show up in the profiler traces, with an event subclass value of ‘2 – Proc Cache Flush’ – but after a number of tests, I can not ever get the event to be raised when the procedure cache is under memory pressure. If it did then we  would have exactly what I was after, an easy, traceable and recordable way to show a procedure cache under too much pressure.

The SP:Cache Insert is more of a backup mechanism to show the procedure cache is being flooded, but only on the basis that you would count the number of times this event shows up within a trace over a period of time. In essence a SP:Cache Insert is only going to occur if a query does not have a matching query plan within the cache. A large number of these within a short period of time is also going to be an indication that the procedure cache is potentially being flooded.

Combine a large number of SP:Cache Inserts with a low Page Life Expectancy and you can suspect you definitely have a procedure cache flooding problem.

So there is a kind of mechanism to determine whether a low page life expectancy is from data page churn or query page churn, but if the SP:Cache Remove subclass 1 event actually worked, it would be a lot easier. Once you know your plan cache is being flooded, you are then looking to check whether forced parameterization is the worth using to eliminate the issue.

Prodata SQL Academy Events

December 2, 2009 Leave a comment

If you haven’t seen them advertised, Bob Duffy from Prodata is running a series of SQL Academy half day training session in Dublin, hosted at the Microsoft Auditorium in their offices in Leopardstown – the events are level 300 which suits the half day slot allocated for the sessions – yesterday’s was about performance tuning an optimisation so myself and a colleague took a short flight over and enjoyed the excellent Irish hospitality. The talk was recorded so there will no doubt be a webcast published at some point published by Technet in Ireland. The talk primarily went through using perfmon counters and wait states – and the available tools that can make this a lot easier by wrapping up and correlating results from different logging mechanisms.

I would recommend keeping an eye out for the cast when it appears, since troubleshooting a production environment is all about using non-intrusive means to understand what is crippling the systems – memory, cpu, IO etc. If you are not practised at this form of troubleshooting it is very difficult to know which performance counters and wait states to observe amongst the thousands that exist – as well as which DMV’s can give you the critical information to diagnose the problems. (It was quite interesting that the demonstration performance issue he was looking at was fundamentally a combination of a missing index but more critically was a lack of query parameterisation since it was in simple mode. The counters used to diagnose this problem, and the symptoms that you might encounter I have previously written about.)

The wait-state side of the talk was very interesting, I often use a combination of DMV’s and perfmon in the field to diagnose, but have only used a certain amount of the wait-state information and do not delve into it as deeply – I will definitely be adding a few more wait states to the list for the future.

The next event is on February 16th and covers SQL Analysis Services – registration is already open.

Inconsistent Date Literal Parameterization Behaviour

November 12, 2009 Leave a comment

I have mentioned query parameterization before and the process by which SQL extracts literal values from a query and re-writes the query in effect to use parameters to get a query plan cache hit, which negates the need to recompile the plan costing both time and CPU. There are a lot of good articles and book chapters that cover the topic.

What has confused me for a while is witnessing date literals within a query being parameterized on one query and not on another, even though both databases have parameterization set to ‘Simple’ mode. When a date literal is not parameterized the chances of getting a query plan cache hit is obviously very low which has performance impacts. The problem to date has been that I had been unable to ascertain the commonality and requirements that allowed the query to get parameterized and when it just kept the literal. I was paying too much attention to the query and as it turns out not enough to the table.

Well, after a number of hours getting to know the brick wall very well I finally tracked it down, and oddly it had nothing to do with the query I was submitting, but I could reproduce reliably by using an unrelated non-clustered index, which is confusing to say the least and I can not yet think of any reason why, is it a bug or just ‘Weird and Odd’.

The following steps reproduce the issue in both SQL Server 2005 and 2008.

Create a database, default settings, nothing specific, then the second step is to create our test table, a simple structure is suffice.

CREATE TABLE [dbo].[paramtest](  [id] [int] IDENTITY(1,1) NOT NULL, [somedate] [datetime] NOT NULL, [somefield] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_paramtest] PRIMARY KEY CLUSTERED
(
 [id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

We need some data to work against just to make sure we are selecting results, and we can insert these trivially as follows: 

insert into [paramtest] values (getdate(),'a')
go 10000

So we now have 10k rows within the table, and a clustered primary index on the identity column.

The test starts with freeing up the procedure cache, and then running the select statement, the datetime I used was roughly in the middle of the range of values I had inserted but is not a deciding factor in the query plan results.

dbcc freeproccache select * from paramtest where somedate > '2009-11-12 21:14:50.000'

Using a standard query plan cache extraction query the specific line of the xml plan we are interested in is the SQL Statement.

<StmtSimple StatementText="(@1 varchar(8000))SELECT * FROM [paramtest] WHERE [somedate]>@1" StatementId="1" StatementCompId="1" StatementType="SELECT" StatementSubTreeCost="0.0379857" StatementEstRows="3831.48" StatementOptmLevel="TRIVIAL">

From it you can see the literal varchar value was extracted as @1 with a type of varchar(8000) and the query altered to use this parameter – this is exactly the behaviour we would expect from parameter sniffing.

Next step is to create a non-clustered index on the varchar ‘somefield’ – completely unrelated to the date literal being used, and should have no impact on the query at all.

CREATE NONCLUSTERED INDEX [ix_test] ON [dbo].[paramtest] ([somefield] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

Free the procedure cache up again and rerun the query

dbcc freeproccache select * from paramtest where somedate > '2009-11-12 21:14:50.000'

Extract the query plan again from the cache, but this time it is noticable different, the parameterisation has not occurred. The literal has

<StmtSimple StatementText="select * from paramtest  where somedate > '2009-11-12 21:14:50.000'" StatementId="1" StatementCompId="1" StatementType="SELECT" />

 To revert to the old plan, drop the index and clear the cache again, then run the query once more.

DROP INDEX [ix_test] ON [dbo].[paramtest] WITH (ONLINE = OFF)

Then clear the cache again and run the query

select * from paramtest where somedate > '2009-11-12 21:14:50.000'

And we are back to being parameterized.

So the application of a single non-clustered index on a separate field to the one being queried is preventing the simple parameterization mode from parameter sniffing the date literal – this makes absolutely no sense, and you can play around with it a lot more knowing what it causing the effect on the query plan. Even placing the additional non-clustered index on the identity field, which already has a clustered index results in the parameterization failing. If this behaviour is be design, then it makes for an interesting design or limitation on the parameterization.

As soon as the database is in ‘Forced’ parameterization mode, the literal was converted each time, so this looks specific to simple mode, but is not explainable, just demonstratable.

Can a Covering NC Index be Tipped?

November 7, 2009 Leave a comment

Non-clustered indexes normally have a ‘tipping point’, which is the point at which the query engine decides to change strategies from seeking the index with a nested loop operator back to a seek on the underlying table or choosing to just scan the underlying table and ignore the index. Kimberley Tripp wrote a great article about ‘The Tipping Point‘ , and the guidance is at about the 25-33% the query engine will change strategies.

If the non-clustered index is a covering index (it contains all the fields within the query) the query engine does not take the same decision – it makes sense that if any change in strategy occurs, it would have to be at a far higher figure, and as we are about to see, it will not take that decision and tip.

To test what strategy the engine would use I created a test situation of 2 separate tables, with different page counts, due to the padding column forcing the second table to use far more pages (5953 pages vs 9233)

CREATE TABLE [dbo].[tblIxTest1]( [PersonnelID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [char](30) NULL, [LastName] [char](30) NULL,
   [Department] [char](30) NULL, [SomePadding] [char](10) NULL
) ON [PRIMARY]

And,

CREATE TABLE [dbo].[tblIxTest2]( [PersonnelID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [char](30) NULL, [LastName] [char](30) NULL,
   [Department] [char](30) NULL, [SomePadding] [char](1000) NULL
) ON [PRIMARY]

Next step was to insert some data, I needed random data to be able to ensure the index was not unbalanced in some way, so I broke out my useful little random string generation function. I should mention how to create this, a SQL function will not directly support the inclusion of a Rand() call within them, any attempt to do this results in the error:

Msg 443, Level 16, State 1, Procedure test, Line 13
Invalid use of a side-effecting operator 'rand' within a function.

However, there is nothing stopping a view from using this, and the function from using the view to get around the limitation: 

Create View [dbo].[RandomHelper] as Select Rand() as r

And then the function can be generated to use this, it is not necessarily the most efficient random string generation function, but it works nicely.

CREATE FUNCTION [dbo].[RandomString] (@Length int) RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
  DECLARE @Result Varchar(100)
  SET @Result = ''
  DECLARE @Counter int
  SET @Counter = 0
  WHILE @Counter <= @Length
  BEGIN
     SET @Result = @Result + Char(Ceiling((select R from randomhelper) * 26) + 64)       
     SET @Counter = @Counter + 1   END
  RETURN(@Result)
END

This now allows me to generate random data and insert it into the tables to get a nice data distribution, and this was run for both of the tables.

insert into tblIxTest1 values (dbo.RandomString(20),dbo.RandomString(20),dbo.RandomString(20),'')
go 1000000

Two NC indexes are now needed, one for each table and both are identical and cover just the FirstName and PersonnelID fields within the table.

CREATE NONCLUSTERED INDEX [IX_Test1] ON [dbo].[tblIxTest1] ( [FirstName] ASC, [PersonnelID] ASC
)WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Test2] ON [dbo].[tblIxTest2] ( [FirstName] ASC, [PersonnelID] ASC
)WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

The setup is complete and it is pretty easy to now show the NC covering index is not going to tip, the most extreme where clause is where I am allowing every record to be returned:

select personnelid , firstname from tblixtest1 where firstname >= 'a' and firstname <= 'zzzzzzzzzzzzzzzzzzzzz'

This still produces a query plan with a seek strategy, regardless of which of my two tables it was executed on:

select personnelid , firstname from tblixtest1  where firstname >= 'a' and firstname <= 'zzzzzzzzzzzzzzzzzzzzz'  
    |--Index Seek(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1]), SEEK:([FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] >= [@1] AND [FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] <= [@2]) ORDERED FORWARD)

If we just select the entire table, unsurprisingly at that point it chooses to perform an index scan.

select personnelid , firstname from tblixtest1

Results in the following plan:

select personnelid , firstname from tblixtest1   |--Index Scan(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1])) 

The row counts on both queries were identical at 1 million. Slightly more interesting is that if I use a Like clause instead of a direct string evaluation, the behaviour alters slightly when selecting all the values:

select personnelid , firstname from tblixtest1 where firstname like '[a-z]%'

Gives the query plan:

select personnelid , firstname from tblixtest1  where firstname like '[a-z]%'  
   |--Index Scan(OBJECT:([FilteredIndexTest].[dbo].[tblIxTest1].[IX_Test1]),  WHERE:([FilteredIndexTest].[dbo].[tblIxTest1].[FirstName] like '[a-z]%'))

So the query engine is potentially making an optimisation that it knows the like clause covers 100% and adopts an automatic scan, but it is not really very clear why it has this optimisation path. If the like clause changes to [a-y] then it reverts back to a seek, so it looks specific to covering all the values within the like statement. If a between statement is used, it remains a seek regardless.

So the result is that a Non-clustered covering index is very unlikely to tip, you either have to not give it a where clause, or use a like statement across all the values available, it will steadfastly refuse to seek and choose the scan.

Why?

Well the I/O cost of the operation remains the same, it has to read every page in the table and it considered the cost of traversing the B-Tree negligible, so the difference between seek and scan is not very great. Running the seek based query and scan based query in the same batch the relative percentages are 48% vs 52% – that is the scan scoring 52% even though they read the same number of rows.

Outputting the IO statistics when they are run side by side shows the same number of pages being read, but the seek is clearly being favoured and is slightly faster as far as SQL is concerned – it is quite weird to consider a seek of an entire index is more efficient than a scan of the index.

(1000000 row(s) affected)
Table 'tblIxTest1'. Scan count 1, logical reads 5995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1000000 row(s) affected)
Table 'tblIxTest1'. Scan count 1, logical reads 5995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So if you come across a covering index in a query plan that is scanning, it would be worth investigating as to whether it is intended. The chances are more likely the index field order is not supporting the predicates being used, than engine has chosen to tip the index like it would for the non-covering non-clustered indexes.

When is Bulk Logged Mode Not What it Says?

November 6, 2009 Leave a comment

Somewhat of a ‘doh’ moment when dealing with a SQL server in bulk logged mode, the transaction log was behaving very weirdly and growing very large for smaller index operations. When performing an additional operation, the actual percentage of log space used went down, and it was a repeatable test. At no point would you expect an operation to actually result in the log space showing as less space used. So something was up with the scenario, and I was asked to check it.

In bulk logged mode index creations are classed as a minimally logged operation, whether online or offline, and a drop index is a bit of a mixture, the index page deallocation is fully logged, but the heap rebuild is listed as a minimally logged operation. There was no obvious reason why the database was behaving as it was.

The steps to recreate this situation on your own computrer / server are rather simple, create a database for testing purposes and preallocated the log space as 200Meg. The database was set to bulk logged mode. I then created a basic table:

create table [tblOnlineIndexTest]
(
  id int,
  padding char(500)
)

The table does not have a primary key so a uniquifier will be created, which we are not worried about. To create a number of rows, I simply ran the following SQL to generate some data, just over 60 meg of it.

insert into tblOnlineIndexTest (id, padding) values (1, REPLICATE('a',100))
go 100000

To check the log space, I used the DBCC command:

dbcc sqlperf(logspace)

26.28% is used. The next step was the creation of a clustered index on the table:

create clustered index [PK_tblOnlineIndexTest] on [tblOnlineIndexTest] (id)
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [primary]

The log was checked again and the space used was 21.34%. I twigged the underlying issue at this point, but continued to test a bit more to be sure.

Next I dropped the index and the log space jumped up to 53.27% of the total log space, and the log had grown in size, even though it was not at 100% of the space. The log has to reserve undo space in advance of the transaction to make sure it can always record an undo, so that a rollback does not get stopped by the transaction log being full / having no space left.

Last step was to recreate the same index and this time the log space dropped to 2.23% but the actual log file has grown by 66 Meg at this point. So the log got larger, but the actual amount of data in it was smaller.

The problem was a simple one, the database was not actually in bulk logged mode, just like a database set as ‘full’ logging, until the first full database backup is taken, the database is still in the simple logging mode. A transaction log has to be replayed against a starting position of a full backup, so the lack of a full backup on the database automatically prevents the bulk logging mode from actually being active. 

When I was presented the initial problem to look at, I did not asked whether the initial database backup had been taken – I assumed it had been which was clearly a bad assumption. Whilst the SQL Server identified itself in bulk logged mode, the actual reality was that it was in simple mode so the transaction log will be automatically clearing down committed transactions from the log and reusing the space.  Just because the database says it is in bulk logged or full modes, does not mean it actually is yet.

Simple vs Forced – Query Parameterization

October 31, 2009 Leave a comment

On the drive back from a relaxing week off I had decided to write about Query Parameterization – this is the process where the query optimizer works on ad-hoc queries and chooses to take literal values used within a query predicate, and turn it into a parameter on the fly.

This optimised query is then the one used to check the query cache, and not the original query string, so it can result in a far higher percentage of query plan cache hits than would otherwise occur on ad-hoc queries.

 There are two modes to this feature, and neither of them is ‘off’, you automatically have ‘simple’ parameterization turned on, and can increase this to ‘forced’ mode for your database if you desire. The difference in the two modes is that the simple mode will only deal with relatively simple literal values within the predicates, and will not consider every literal a candidate for parameterization.

So, overall it sounds a good thing but there are some known problems to parameterization, or parameter sniffing as it is oft called – the query plan generated for a given value of the parameter is not necessarily the optimal plan for another. Consider a search on a personnel database where the EmployeeAge is between 95 and 99 – Assuming an index is in place, you can already expect the data quantities to be low to nill, so the index will not tip and an index seek is the expected behaviour.

If the next time the query is executed the value is from 20 to 60, the number of people matching that value would be very high, yet the query plan in cache has already chosen the method by which the query will be executed, and it is no longer the most optimal.

From a user’s perspective this will lead what appears to be random / non-deterministic behaviour – sometimes the query will be very fast and other times it will be slow. There will not necessarily be a pattern to the behaviour because the query might be removed from cache when it is under memory pressure, or for a wide variety of other reasons the plan might be invalidated, the simplest being a data statistics invalidation.

So with the knowledge that parameter sniffing can cause issues, when should you consider investing the time into testing this setting? The time and effort to test could be considerable so I have written how I would come to that decision.

 The first thing I check is the CPU utilisation on the SQL Server, in theory there should be some kind of issue forcing this to be considered and one of the issues that can occur from insufficient parameterization is an increase in the CPU utilisation due to a high query compilation rate. There are a lot of other reasons that can cause a high CPU such as incorrect indexing leading to table scans etc, but for the purpose of this explanation, we can assume that has already been checked and was not enough.

The second performance counters I would check are:

SQLServer : SQL Statistics: Auto-Param Attempts/Sec
SQLServer : SQL Statistics: Batch Request / Sec

I could also check the plan cache hit ratio, or the SQL Compilations / sec figures, but having the Auto-Params and Batch figure per sec allows you to do a rough calculation on the ratio of incoming batches compared to the number of queries that auto-paramerterization is being attempted on (successfully or not). The higher the ratio, then the more ad-hoc queries are already being affected by the ‘simple’ mode parameterization.

If I see a high ratio, the next check would be on the plan cache statistics, primarily:

SQL Server : Plan Cache : Cache Object Counts for SQL Plans
SQL Server : Plan Cache : Cache Pages for SQL Plans

This is to try get a judge of how many plans are sitting in the cache / whether the cache is under memory pressure. If you have a high number of SQL plan objects / cached pages then you can calculate whether you are under memory pressure. You can also look at the buffer manager performance counter for the page life expectancy value, but this could be a bit misleading since data pages will also be affecting this.

At this point I would start pulling the query plan cache, and checking the query text for a sample of plans, to check whether there was a high number of literals remaining within the submitted query, since these are the ones that the simple parameterisation failed to convert to parameters. When pulling the query plan cache, sorting them in SQL text order, allows you to easily spot when a near identical queries except for unconverted literals have been submitted but resulted in a different query plan.

There is no prescriptive guidance on how many of these near identical queries it takes before you consider it, but clearly a single duplicate is not an issue, whilst thousands of them could be quite an issue and be resulting in a high number of unnecessary SQL plan compilations, which is increasing the CPU utilisation of the server.

As you can tell, this setting is certainly not one that should be changed without some considerable forethought and then thorough testing to ensure the situation you have got is actually worth the switch, and preferably some very good instrumented testing in a realistic test environment to ensure the benefits you are getting from the increased level of parameterisation, are not being lost by occurences of non-optimal plan cache hits.

The Sequence of an Index Uniquifier

October 20, 2009 Leave a comment

During a training session today, I was asked about the structure of the Uniquifier, and whether it was a straight identity column. Off the top of my head I couldn’t remember the exact structure, I considered it a 4 byte int, but was not sure whether it acted as a pure identity value or acted in a more complex manner when incrementing, so decided to investigate it tonight.

To start from the beginning, an index uniquifier is the term given to the field that is automatically generated by SQL Server when you create a clustered index, but the index key is not specified as unique. Since each record in the table has to be uniquely identifiable, SQL will automatically assigned a 4 byte field to the row to make it unique, commonly called the ‘Uniquifier’. At this point I am sure English scholars will be frowning, pondering on the nature of the word and whether it qualifies as English; however that the term used so we will run with it.

It is actually quite easy to see this field in action, let’s create a simple table:

CREATE TABLE dbo.unique_test  (  
firstname char(20) NOT NULL,  
surname char(20) NOT NULL  
)  ON [PRIMARY] GO
CREATE CLUSTERED INDEX [ix_test] ON [dbo].[unique_test]
 (
[firstname] ASC
)
ON [PRIMARY]

The clustered index is not unique, by design, so let’s start adding duplicate rows to see the effect:

insert into unique_test values ('John', 'Smith')
go 10

The table now contains 10 rows, each with the same details. This does not cause any undue concern, because each row is actually still unique – the way to show this is using the DBCC INC and DBCC Page commands, I’ve cut the output down since it is so wide.

dbcc ind ('testdb','unique_test',1)
PageFID PagePID     IAMFID IAMPID      PageType
------- ----------- ------ ----------- --------
1       41          NULL   NULL        10      
1       174         1      41          1       

The output shows a data page numbered 174 for my example and the IAM page with an ID of 41. We can crack open the page and view the contents very easily using DBCC Page.

dbcc dbcc traceon(3604)
dbcc page (idtest,1,174,3)

The output is quite large, but in essence, the first record is stored with the following details: 

UNIQUIFIER = [NULL]                 
Slot 0 Column 1 Offset 0x4 Length 20
firstname = John                    
Slot 0 Column 2 Offset 0x18 Length 20

 The second record:

Slot 1 Column 0 Offset 0x33 Length 4
UNIQUIFIER = 1                      
Slot 1 Column 1 Offset 0x4 Length 20
firstname = John                    
Slot 1 Column 2 Offset 0x18 Length 20
surname = Smith  

 The third record:

Slot 2 Column 0 Offset 0x33 Length 4
UNIQUIFIER = 2                      
Slot 2 Column 1 Offset 0x4 Length 20
firstname = John                    
Slot 2 Column 2 Offset 0x18 Length 20
surname = Smith                     

And so forth. The first record’s uniquifier is visible and clearly named within the data page, but set to null. The second copy of the same value receives the uniquifier of one, the third copy receives a 2 etc.  This count is maintained separately for each duplication, so the insert of a new name multiple times will also receive its own counter, beginning at null and working upwards, 1,2,3 etc. So just because the uniquifier is 4 bytes, this does not limit the total number of rows in the table to ~2.1 billion, but does logically limit the total number of duplicates to 2.1 billion. I must confess to not having tested that limit, generating 2.1 billion rows of duplicate data is not trivial and a scrapbook calculation predicts 435 hours of processing on a virtual pc. I suspect the error message it raises when it hits the limit would be interesting.

If we remove all the rows from the table and then add 10 more does the uniquifier reset? Easy to test but the short answer was no, the uniquifier continued to rise, 10 thru 19.

I was a bit suspicious of this since any requirement for the uniquifier to rise / remember what existed before requires it to be stored somewhere – it has to survive a crash after all, but there is no apparent place the current count is stored. If there was, you wouldn’t be storing just 1 value, you would be forced to store a value for each record key that had duplicates. This could run into thousands of separate counters being maintained per clustered key so it just doesn’t make sense that it is stored, it would be a very noticable overhead.

When checking the DBCC Ind for the empty table it insisted it still had a data page, but the only contents of the data page was a single ghost record – a row that has been marked as deleted. The ghost record was the for the ‘John Smith’ with the highest uniquifier before, was this coincidence? The other ghost records had not hung around, so why did this one persist?

I dropped and recreated the table again, inserted 10 rows and then deleted them. Checking DBCC Ind the table still showed a HoBT IAM allocation page for the table and a data page, the data page contained a single ghost record, the one with a Uniquifier of 9 – the highest given out when 10 duplicates were added. Even waiting some considerable time the ghost record was not cleaned up, so it appears that it will not delete it.

If I added another duplicate row, it picked up the next number in the sequence (10) and shortly after the ghost record was removed from the page. Very convenient and not a coincidence at all – the memory of the last uniquifier given out  persists as a ghost record, even if all the duplicates for the table have been removed.  What seems strange is this ghost record hanging about, persisting an entire record, to keep the duplicate count for that key, when no instances of it remain on the table.

It can not possibly do this for every key since the space overhead would become very noticable again, so how does it choose what to persist, the last entry? unfortunately it doesn’t appear that simple at all, after a number of tests it appeared to only be interested in keeping the ghost entry for the record that had the highest key value, so alphabetically, the one closed to ‘Z’ for my example.

Conclusion? On the evidence, whilst the other ghost records still persist for a short time, even deleting and then adding more duplicates can see the number continue from where it left of, but given a short time for the ghost records to be removed the uniquifier will restart the sequence back at Null,1,2 etc. Except in the case of the highest entry from the index perspective, that ghost record decides to stick around until there is another entry using the same key, continuing the sequence, at which point the ghost record then finally disappears.

I can not think of any sensible reason why it would do this, can you?

Overall, the uniquifier is a cost overhead of not having a unique index, and at a cost of 4 bytes, an int identity column makes a lot of sense – for all purposes it acts the same and serves the same purpose but in a far more visible manner – so it really does not make much sense to rely on the uniquifier provided for you, take control and create your own.

Guidance on How to Layout a Partitioned Table across Filegroups

October 15, 2009 4 comments

SQL Server’s Table Partitioning was one of my favourite features within the 2005 release. It really brought SQL into the mainstream when it came to holding very large data quantities and allowed us to talk with confidence about large tables containing ‘Billions’ of rows and not be limited to the ‘Millions’. From extensive use and a couple of SQL Labs specifically surrounding the partitioning feature there are some rules and best practises I use to try maximise the benefit / flexibility of the partitioned table, without causing uneccesary drops in performance or increases in the disk space requirements.

Many examples of table partitions focus heavily on using date range based partition functions and schema, with a common layout mechanism of stripping the weeks / months across a number of file groups. The file groups are re-used and in the example picture, you can see 2 years worth of data stripped across 12 filegroups. InitialLayout

This is pretty common and has an alluring charm of simplicity, but it is going to hurt when you start rolling the window. The underlying problem is that there is no gap.

For a good number of systems it would be unacceptable to remove a month of data unless you had already inserted the new month sucessfully. So the system starts to roll a new month of data in, and is required to use the same filegroups, the layout transforms into the following:

AddedMonth

The file group has to expand by 50% to accomodate the new data, before the old data can be removed – and once the old data is removed the filegroups now look like:

AfterRemoval

So the 50% of space is now wasted unless you use a shrink, which is probably the worst thing you can do to your filegroup and data files at that point in time. Shrink can fragment the data to the extreme and is to be avoided at all costs. Which means you will have to maintain a 50% space penalty for the layout on every filegroup. That might not sound a lot, but on a large database in an enterprise with mirrored SAN’s,that additional 50% is going to cost a substancial amount.

There are also other issues, SQL allows you to backup at a filegroup level and since the bulk of the data is historic and will not alter, you are forced to re-back up historic data (Jan 08) when you backup the recently inserted Jan 09 data. So there is an impact on backup space, backup times and restore times.

The simplicity of the initial layout makes it seem like a good idea, but the side-effects are not pleasant. You can alter the layout and choose to have 6 Filegroups, each storing 4 months of data, and then the expansion is only from 4 to 5, so a 25% overhead. It is better, but still is a cost overhead, The ultimate extreme is to then place it all in one filegroup, but there are a number of difficulties and contention points with that.

A better approach is to use a different file group per month, but then also create an additional spare filegroup, so that no filegroup is forced to expand, as shown:BetterLayout

The difference here is that we have one free filegroup that is not a part of the current partition scheme / function definition, but will be allocated as the ‘Next Used’ filegroup for the partition scheme, so that when we split the partition function that filegroup is brought into the scheme and used. The old data that is going to be removed, will provide an empty filegroup that will be used for the next month’s import. In essence the head of the partition function is chasing the tail and will never catch up.

The expansion of size for this is 1/n where n is the number of partitions to be stored, so for our 24 month example 1/24th expansion – a considerable saving. Even better is that you can choose to utilise file group backups for the older static months of data.

This though is not perfect for a couple of reasons, primarily data loading and indexing. To make the ‘scene’ better there should be a further filegroup dedicated to the data staging / loading that is going to occur to bring the data into the partitioned table.WithStaging

The purpose of this is twofold:

  • The disks used for loading are seperate from the disks used for the actual data so that we can maintain Quality of Service on the main table.
  • The data wants to be loaded as a heap for speed, but is likely to be sorted prior to insertion, an in place sort would expand one of the main file groups which is to be avoided.

By loading into a staging filegroup, you give yourself the opportunity to then use a clustered index to sort the data and move it to the appropriate file group, prior to being constrained and switched in to the main partitioned table. If you had loaded the data into FG25 and then tried to apply a clustered index, it would of doubled the size of the filegroup again as it needs to write all the new rows out and commit them, before the heap could be deleted. That would put you back at square one wasting 50% of the disk space.

The staging filegroup does cost us another file group worth of space, so the 24 initial filegroups has grown to 26, which is still a smaller expansion than the potential 50%.

So some simple guidlines are:

  • For a partition function that is going to have N partitions, create N+1 Filegroups. 
  • ETL / Stage your data into a dedicated staging file group
  • Move your data using a clustered index creation to the ‘spare’ filegroup you always have.
  • Switch new data in, then switch old out creating the new ‘spare’

It is not entirely flawless however – there is a problem in using such a design if the quantity of data per partition is going to vary by a large amount; you then have to provision each filegroup to have enough space to cope with the fluctations, which in itself can result in wasted space and starts chipping away at the gains made.

It works best as a technique on datasets that remain relatively consistent in terms of the number of rows per partition and as the number of partitions goes up, the savings increase.

Finding Next Identity Value, and a Wild Goose Chase.

October 9, 2009 3 comments

A question asked on stack overflow was to find the next identity value that would occur on a table, without being required to add a record to work it out. The problem lies in that if the highest row is deleted, the number is not reused so any answers using the existing rows can be incorrect.

Logically the value must be stored, so first place I checked was the DMV’s. That stores the currently used value, but does not store the next value.

The wild goose chase started there…

  • I used the dedicated admin console to pull all the system tables, expecting it to be in sys.syshobtcolumns, no joy. Dumped the whole system table contents before and after an insert looking for the difference and didn’t spot it.
  • Took a dump of every page in the file before and after inserting a new row and having the textural dumps compared in a text comparison application, still no joy.
  • Started dumping out log records using the following script and pulled the identity calls:
select *
from ::fn_dblog(null, null)
where operation = 'LOP_IDENT_NEWVAL'
  • After a couple of hours running around the pages trying to find it, I realised I should of stuck with the DMV and I really went the wrong way around it.

 The DMV has the right answer it seems, but as two fields you have to combine to get the answer.

create table foo (MyID int identity not null, MyField char(10))
insert into foo values ('test')
go 10

-- Inserted 10 rows
select Convert(varchar(8),increment_value) as IncrementValue,
   Convert(varchar(8),last_value) as LastValue
from sys.identity_columns where name ='myid'

-- insert another row
insert into foo values ('test')

-- check the values again
select Convert(varchar(8),increment_value) as IncrementValue,
   Convert(varchar(8),last_value) as LastValue
from sys.identity_columns where name ='myid'

-- delete the rows
delete from foo

-- check the DMV again
select Convert(varchar(8),increment_value) as IncrementValue,
   Convert(varchar(8),last_value) as LastValue
from sys.identity_columns where name ='myid'

-- value is currently 11 and increment is 1, so the next insert gets 12
insert into foo values ('test')
select * from foo

Result:
MyID        MyField
----------- ----------
12          test      

(1 row(s) affected)

So adding the increment to the last value will predict the next value correctly, assuming someone else does not grab it in the mean time which is why it is not a good idea to use in code, but if you need to investigate a table and want to know what it thinks is next, without actually inserting a row and affecting the table, then it is useful.

All that and the easy way is then:

select ident_current(‘foo’) + ident_incr(‘foo’)

Ah well, was fun investigating it – but what a wild goose chase to find it was an easy answer.