It is often said that a primary key should be immutable, and this advice is echoed on a multitude of sites some of which strengthen it to a ‘law’ – but we know with databases that absolutes are rare and it is very difficult to be 100% prescriptive. There is then no mention of the clustering key being immutable alongside it, which strikes me as strange since it is just as important.
What happens within SQL Server to the row if you change the clustered key?
- If you change the clustered key value the row must be physically moved and can result in page splits / fragmentation.
- A change of the clustered key requires all the non-clustered indexes to be updated to reflect the clustering key choice.
And if you change the primary key?
- A change of the primary key, has to be reflected in each of the other tables that use the key as a linking mechanism.
- The primary key must still uniquely identify the row within the table.
Clearly different issues, but why does the primary key immutability get so much attention and not the clustering key? The default behaviour of SQL Server is that the primary key becomes the clustering key, so in essence all 4 points get applied, but you can choose a different primary key to the clustering key.
What sort of expense are we talking about if we allow the clustering key to not be immutable and started altering a row’s clustered key value? To get a better understanding of what is going on under the hood, I’ll construct an example and check the transaction log.
Creating a simple table in SQL and inserting a few rows to set up the test is pretty easy:
CREATE TABLE [dbo].[MyTable]( [MyID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [SecondName] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED ( [FirstName] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]
Insert a few rows and then issue a simple update statement.
update mytable set firstname = 'TestFirstName', Secondname = 'TestSecondName' where MyID = 2
Inspect the transaction log and it is noticable that the log does not contain a LOP_MODIFY_ROW or LOW_MODIFY_COLUMNS within it, but contains a LOP_DELETE_ROWS and a LOP_INSERT_ROWS. Instead of just modifying the data, SQL has removed the row and reinserted it. A few other items appear with the transaction, LOP_BEGIN_XACT and LOP_COMMIT_XACT as you would expect to start and commit the transaction. There is also a LOP_SET_BITS on the LCX_PFS which is not surprising to see either, since we have potentially affected the free space level of the page the data was inserted into.
That maps to exactly what we expect from a high level logical perspective – the row has to be moved and there is no LOP_MOVE_ROW operation. This results in the row being placed into the transaction log twice, as a before and after.
What happens if we perform an update that does not include the clustered key?
update mytable set Secondname = 'AnotherTestSecondName' where MyID = 3
This time the log only includes 2 entries, the LOP_BEGIN_XACT / LOP_COMMIT_XACT and a single LOP_MODIFY_ROW which is more as you would expect.
Size wise, the transaction log entry length for the first alteration was 96 + 148 + 56 + 160 + 52 = 512 bytes. For the second entry it was only 96 + 144 + 52 = 292. So the alteration used more log space and due to write ahead logging it must be committed to the disk, but the actual difference for a single row does not look too significant.
Well, whilst it does not look significant, you have to remember that the row being modified was very small. As previous examples have shown the LOP_DELETE_ROWS and LOP_INSERT_ROWS include the entire contents of the row being removed / added, so with a larger row the entire contents of the row would be added to the log twice, compared to the simple modification. That would start to get expensive.
So altering the clustering key is clearly expensive for the transaction log in comparison to a normal update, and this example did not have additional non-clustered indexes added to the table, which would also then require even more entries to deal with the removal and re-insertion of the non-clustered index values.
Given a choice I would make both immutable; the primary key shouldn’t be the only one to get special treatment and be designed to be immutable, make the clustering key immutable as well.
One of the long time annoyances about the Date functions in SQL is that a number of them do not function as developer would wish, the prime example I’m going to focus on is the DateDiff function. The way it operates is very contrary to what a developer might expect or want from it – the function counts the number of boundaries that crossed for the specific units selected, not the number of whole units between the dates, as an example:
declare @date1 datetime declare @date2 datetime set @date1 = '20100101 23:00:00' set @date2 = '20100102 01:00:00' select datediff(d, @date1, @date2)
And the result is 1, since the units selected was days, the boundary line is predictably at midnight, so even though the time span is only 2 hours, it would count as 1 day – that is not intuiative. Now this is all documented so we cannot complain or grumble. If you wanted to know whether a full day has passed, you used hourly units instead and made sure you had the logic to deal with this.
All of this leaves you with a pretty bad resolution however, you can get the hours difference, but minutes and seconds are not available – so you have to datediff on those units and do some maths. It really makes for a ham-fisted way of getting a duration.
So people work around the problem by converting the date to the numeric equivalent and manipulating that directly.
declare @date1 datetime set @date1 = '20100101 12:15:30' select convert(float,@date1) 40177.5107638889
The decimal part represents the percentage through the day, which is not really how the underlying binary storage stores it, it uses a number to represent the number of 1/300ths of a second since the day started.
This format was very forgiving though, if you wanted to add a day, instead of using DateAdd, you could just add 1 to the number, very convienient.
It does however make it easier to create a pseudo-timespan by deducting one date’s numeric representation from another dates, although the code is somewhat long-winded. As a side note, make sure you convert to float and not real – real has not go sufficient accuracy for this to work.
declare @date1 datetime declare @date2 datetime set @date1 = '20100101 12:00:00' set @date2 = '20100102 13:15:35' declare @result float set @result = convert(float,@date2) - convert(float,@date1) declare @DurationDays float declare @DurationTime float declare @DurationHours float declare @DurationMinutes float declare @DurationSeconds float set @DurationDays = floor(@result) set @DurationTime = (@result - floor(@result) ) set @DurationTime = @DurationTime * 86400 set @DurationHours = floor(@DurationTime / 3600) set @DurationTime = @DurationTime - @DurationHours * 3600 set @DurationMinutes = floor(@DurationTime/60) set @DurationTime = @DurationTime - @DurationMinutes * 60 set @DurationSeconds = @DurationTime select @DurationDays as Days, @DurationHours as Hours , @DurationMinutes as Minutes, @DurationSeconds as Seconds Days Hours Minutes Seconds ----------------- ------------------ ----------------- ----------------- 1 1 15 35.0000002188608
Bit of a hack and was it really any shorter or better? Debatable. Whilst it can get time span information out, when used within SQL 2008 using the new datetime2 types, the wheels fall off:
declare @date1 datetime2(7) set @date1 = '20100101 12:00:00' select convert(float,@date1) Msg 529, Level 16, State 2, Line 3 Explicit conversion from data type datetime2 to float is not allowed.
And that is where the problem comes in – the new datetime2 types will not allow the date to be converted to a number, and a number of these developer tricks no longer work.
Most if not all the tricks can be re-written to use multiple date functions with some mathematical logic – and it can be done without the code ever knowing if it had been embedded within a function / stored procedure. Where you would see a less transparent move to the datetime2 data types would be where developers had embedded some of the tricks directly into ad-hoc SQL, that will fail if the type is altered. In an ideal world, is that the code never contains these techniques of course, but we do not all live in that nirvana.
So on the one hand Datetime2 gives great accuracy and can reduce storage, but on the other hands, the tricks used in the past to deal with the inadequacies of the built-in Date functions no longer work.
What I would really like is a revamp of the Date functions and the introduction of a time span type – could be a long wait.
On the drive back from a relaxing week off I had decided to write about Query Parameterization – this is the process where the query optimizer works on ad-hoc queries and chooses to take literal values used within a query predicate, and turn it into a parameter on the fly.
This optimised query is then the one used to check the query cache, and not the original query string, so it can result in a far higher percentage of query plan cache hits than would otherwise occur on ad-hoc queries.
There are two modes to this feature, and neither of them is ‘off’, you automatically have ‘simple’ parameterization turned on, and can increase this to ‘forced’ mode for your database if you desire. The difference in the two modes is that the simple mode will only deal with relatively simple literal values within the predicates, and will not consider every literal a candidate for parameterization.
So, overall it sounds a good thing but there are some known problems to parameterization, or parameter sniffing as it is oft called – the query plan generated for a given value of the parameter is not necessarily the optimal plan for another. Consider a search on a personnel database where the EmployeeAge is between 95 and 99 – Assuming an index is in place, you can already expect the data quantities to be low to nill, so the index will not tip and an index seek is the expected behaviour.
If the next time the query is executed the value is from 20 to 60, the number of people matching that value would be very high, yet the query plan in cache has already chosen the method by which the query will be executed, and it is no longer the most optimal.
From a user’s perspective this will lead what appears to be random / non-deterministic behaviour – sometimes the query will be very fast and other times it will be slow. There will not necessarily be a pattern to the behaviour because the query might be removed from cache when it is under memory pressure, or for a wide variety of other reasons the plan might be invalidated, the simplest being a data statistics invalidation.
So with the knowledge that parameter sniffing can cause issues, when should you consider investing the time into testing this setting? The time and effort to test could be considerable so I have written how I would come to that decision.
The first thing I check is the CPU utilisation on the SQL Server, in theory there should be some kind of issue forcing this to be considered and one of the issues that can occur from insufficient parameterization is an increase in the CPU utilisation due to a high query compilation rate. There are a lot of other reasons that can cause a high CPU such as incorrect indexing leading to table scans etc, but for the purpose of this explanation, we can assume that has already been checked and was not enough.
The second performance counters I would check are:
SQLServer : SQL Statistics: Auto-Param Attempts/Sec SQLServer : SQL Statistics: Batch Request / Sec
I could also check the plan cache hit ratio, or the SQL Compilations / sec figures, but having the Auto-Params and Batch figure per sec allows you to do a rough calculation on the ratio of incoming batches compared to the number of queries that auto-paramerterization is being attempted on (successfully or not). The higher the ratio, then the more ad-hoc queries are already being affected by the ‘simple’ mode parameterization.
If I see a high ratio, the next check would be on the plan cache statistics, primarily:
SQL Server : Plan Cache : Cache Object Counts for SQL Plans SQL Server : Plan Cache : Cache Pages for SQL Plans
This is to try get a judge of how many plans are sitting in the cache / whether the cache is under memory pressure. If you have a high number of SQL plan objects / cached pages then you can calculate whether you are under memory pressure. You can also look at the buffer manager performance counter for the page life expectancy value, but this could be a bit misleading since data pages will also be affecting this.
At this point I would start pulling the query plan cache, and checking the query text for a sample of plans, to check whether there was a high number of literals remaining within the submitted query, since these are the ones that the simple parameterisation failed to convert to parameters. When pulling the query plan cache, sorting them in SQL text order, allows you to easily spot when a near identical queries except for unconverted literals have been submitted but resulted in a different query plan.
There is no prescriptive guidance on how many of these near identical queries it takes before you consider it, but clearly a single duplicate is not an issue, whilst thousands of them could be quite an issue and be resulting in a high number of unnecessary SQL plan compilations, which is increasing the CPU utilisation of the server.
As you can tell, this setting is certainly not one that should be changed without some considerable forethought and then thorough testing to ensure the situation you have got is actually worth the switch, and preferably some very good instrumented testing in a realistic test environment to ensure the benefits you are getting from the increased level of parameterisation, are not being lost by occurences of non-optimal plan cache hits.
SQL Server’s Table Partitioning was one of my favourite features within the 2005 release. It really brought SQL into the mainstream when it came to holding very large data quantities and allowed us to talk with confidence about large tables containing ‘Billions’ of rows and not be limited to the ‘Millions’. From extensive use and a couple of SQL Labs specifically surrounding the partitioning feature there are some rules and best practises I use to try maximise the benefit / flexibility of the partitioned table, without causing uneccesary drops in performance or increases in the disk space requirements.
Many examples of table partitions focus heavily on using date range based partition functions and schema, with a common layout mechanism of stripping the weeks / months across a number of file groups. The file groups are re-used and in the example picture, you can see 2 years worth of data stripped across 12 filegroups.
This is pretty common and has an alluring charm of simplicity, but it is going to hurt when you start rolling the window. The underlying problem is that there is no gap.
For a good number of systems it would be unacceptable to remove a month of data unless you had already inserted the new month sucessfully. So the system starts to roll a new month of data in, and is required to use the same filegroups, the layout transforms into the following:
The file group has to expand by 50% to accomodate the new data, before the old data can be removed – and once the old data is removed the filegroups now look like:
So the 50% of space is now wasted unless you use a shrink, which is probably the worst thing you can do to your filegroup and data files at that point in time. Shrink can fragment the data to the extreme and is to be avoided at all costs. Which means you will have to maintain a 50% space penalty for the layout on every filegroup. That might not sound a lot, but on a large database in an enterprise with mirrored SAN’s,that additional 50% is going to cost a substancial amount.
There are also other issues, SQL allows you to backup at a filegroup level and since the bulk of the data is historic and will not alter, you are forced to re-back up historic data (Jan 08) when you backup the recently inserted Jan 09 data. So there is an impact on backup space, backup times and restore times.
The simplicity of the initial layout makes it seem like a good idea, but the side-effects are not pleasant. You can alter the layout and choose to have 6 Filegroups, each storing 4 months of data, and then the expansion is only from 4 to 5, so a 25% overhead. It is better, but still is a cost overhead, The ultimate extreme is to then place it all in one filegroup, but there are a number of difficulties and contention points with that.
A better approach is to use a different file group per month, but then also create an additional spare filegroup, so that no filegroup is forced to expand, as shown:
The difference here is that we have one free filegroup that is not a part of the current partition scheme / function definition, but will be allocated as the ‘Next Used’ filegroup for the partition scheme, so that when we split the partition function that filegroup is brought into the scheme and used. The old data that is going to be removed, will provide an empty filegroup that will be used for the next month’s import. In essence the head of the partition function is chasing the tail and will never catch up.
The expansion of size for this is 1/n where n is the number of partitions to be stored, so for our 24 month example 1/24th expansion – a considerable saving. Even better is that you can choose to utilise file group backups for the older static months of data.
This though is not perfect for a couple of reasons, primarily data loading and indexing. To make the ‘scene’ better there should be a further filegroup dedicated to the data staging / loading that is going to occur to bring the data into the partitioned table.
The purpose of this is twofold:
- The disks used for loading are seperate from the disks used for the actual data so that we can maintain Quality of Service on the main table.
- The data wants to be loaded as a heap for speed, but is likely to be sorted prior to insertion, an in place sort would expand one of the main file groups which is to be avoided.
By loading into a staging filegroup, you give yourself the opportunity to then use a clustered index to sort the data and move it to the appropriate file group, prior to being constrained and switched in to the main partitioned table. If you had loaded the data into FG25 and then tried to apply a clustered index, it would of doubled the size of the filegroup again as it needs to write all the new rows out and commit them, before the heap could be deleted. That would put you back at square one wasting 50% of the disk space.
The staging filegroup does cost us another file group worth of space, so the 24 initial filegroups has grown to 26, which is still a smaller expansion than the potential 50%.
So some simple guidlines are:
- For a partition function that is going to have N partitions, create N+1 Filegroups.
- ETL / Stage your data into a dedicated staging file group
- Move your data using a clustered index creation to the ‘spare’ filegroup you always have.
- Switch new data in, then switch old out creating the new ‘spare’
It is not entirely flawless however – there is a problem in using such a design if the quantity of data per partition is going to vary by a large amount; you then have to provision each filegroup to have enough space to cope with the fluctations, which in itself can result in wasted space and starts chipping away at the gains made.
It works best as a technique on datasets that remain relatively consistent in terms of the number of rows per partition and as the number of partitions goes up, the savings increase.