Archive

Posts Tagged ‘Table Partitioning’

Guidance on How to Layout a Partitioned Table across Filegroups

October 15, 2009 4 comments

SQL Server’s Table Partitioning was one of my favourite features within the 2005 release. It really brought SQL into the mainstream when it came to holding very large data quantities and allowed us to talk with confidence about large tables containing ‘Billions’ of rows and not be limited to the ‘Millions’. From extensive use and a couple of SQL Labs specifically surrounding the partitioning feature there are some rules and best practises I use to try maximise the benefit / flexibility of the partitioned table, without causing uneccesary drops in performance or increases in the disk space requirements.

Many examples of table partitions focus heavily on using date range based partition functions and schema, with a common layout mechanism of stripping the weeks / months across a number of file groups. The file groups are re-used and in the example picture, you can see 2 years worth of data stripped across 12 filegroups. InitialLayout

This is pretty common and has an alluring charm of simplicity, but it is going to hurt when you start rolling the window. The underlying problem is that there is no gap.

For a good number of systems it would be unacceptable to remove a month of data unless you had already inserted the new month sucessfully. So the system starts to roll a new month of data in, and is required to use the same filegroups, the layout transforms into the following:

AddedMonth

The file group has to expand by 50% to accomodate the new data, before the old data can be removed – and once the old data is removed the filegroups now look like:

AfterRemoval

So the 50% of space is now wasted unless you use a shrink, which is probably the worst thing you can do to your filegroup and data files at that point in time. Shrink can fragment the data to the extreme and is to be avoided at all costs. Which means you will have to maintain a 50% space penalty for the layout on every filegroup. That might not sound a lot, but on a large database in an enterprise with mirrored SAN’s,that additional 50% is going to cost a substancial amount.

There are also other issues, SQL allows you to backup at a filegroup level and since the bulk of the data is historic and will not alter, you are forced to re-back up historic data (Jan 08) when you backup the recently inserted Jan 09 data. So there is an impact on backup space, backup times and restore times.

The simplicity of the initial layout makes it seem like a good idea, but the side-effects are not pleasant. You can alter the layout and choose to have 6 Filegroups, each storing 4 months of data, and then the expansion is only from 4 to 5, so a 25% overhead. It is better, but still is a cost overhead, The ultimate extreme is to then place it all in one filegroup, but there are a number of difficulties and contention points with that.

A better approach is to use a different file group per month, but then also create an additional spare filegroup, so that no filegroup is forced to expand, as shown:BetterLayout

The difference here is that we have one free filegroup that is not a part of the current partition scheme / function definition, but will be allocated as the ‘Next Used’ filegroup for the partition scheme, so that when we split the partition function that filegroup is brought into the scheme and used. The old data that is going to be removed, will provide an empty filegroup that will be used for the next month’s import. In essence the head of the partition function is chasing the tail and will never catch up.

The expansion of size for this is 1/n where n is the number of partitions to be stored, so for our 24 month example 1/24th expansion – a considerable saving. Even better is that you can choose to utilise file group backups for the older static months of data.

This though is not perfect for a couple of reasons, primarily data loading and indexing. To make the ‘scene’ better there should be a further filegroup dedicated to the data staging / loading that is going to occur to bring the data into the partitioned table.WithStaging

The purpose of this is twofold:

  • The disks used for loading are seperate from the disks used for the actual data so that we can maintain Quality of Service on the main table.
  • The data wants to be loaded as a heap for speed, but is likely to be sorted prior to insertion, an in place sort would expand one of the main file groups which is to be avoided.

By loading into a staging filegroup, you give yourself the opportunity to then use a clustered index to sort the data and move it to the appropriate file group, prior to being constrained and switched in to the main partitioned table. If you had loaded the data into FG25 and then tried to apply a clustered index, it would of doubled the size of the filegroup again as it needs to write all the new rows out and commit them, before the heap could be deleted. That would put you back at square one wasting 50% of the disk space.

The staging filegroup does cost us another file group worth of space, so the 24 initial filegroups has grown to 26, which is still a smaller expansion than the potential 50%.

So some simple guidlines are:

  • For a partition function that is going to have N partitions, create N+1 Filegroups. 
  • ETL / Stage your data into a dedicated staging file group
  • Move your data using a clustered index creation to the ‘spare’ filegroup you always have.
  • Switch new data in, then switch old out creating the new ‘spare’

It is not entirely flawless however – there is a problem in using such a design if the quantity of data per partition is going to vary by a large amount; you then have to provision each filegroup to have enough space to cope with the fluctations, which in itself can result in wasted space and starts chipping away at the gains made.

It works best as a technique on datasets that remain relatively consistent in terms of the number of rows per partition and as the number of partitions goes up, the savings increase.

How to Remember the Next Used Filegroup in a Partition Scheme

September 30, 2009 6 comments

Within SQL, the partitioned table feature provides an excellent way to store and roll forward data windows across large datasets without incurring huge loading and archiving penalties to the main table.

The process of loading, manipulating and decommissioning data from a partitioned table will need posts of their own – due to the depth of the information available and required as well as the detail needed to understand some of the best practises I have learnt from the field.

This entry is specifically relating to the ‘Next Used’ aspect of dealing with a partitioned table – for some time I wanted to know how could I tell what partition had been set as next used? There seems to be no record of the value – the BoL lists the Next Used facility as:

Specify the filegroup to be marked by the partition scheme as NEXT USED.

This is true in a simplistic sense, but the marking can not be at the filegroup level since the relation from partition schemes to filegroups can be many-to-many, so no single marking on a file group could suffice. I had never been able to figure out where to find that marking, and find a way to read what had been set.

I should add, that you really should not set the next used in one location of the code / stored proc and then perform the split of the partition function in another, it would be far safer to do them together, so the need to actually find it out is really borne out of investigative necessity in trying to understand for a given partition scheme that implements a rolling window that started to go bad, where does it think it should be splitting the partition to?

So the problem remained; it was given to Paul S. Randal  to figure out how we could see / infer this information outside of the dedicated admin console. He figured out the starting point in terms of which value in which DMV to start the solution with, and I ran with it from there to create a relatively easy way to get to it. So credit to Paul for finding out where to start on the issue.

So down to the scripts and some test cases:

There are 2 types of partition scheme to test:

  • Partition Function as Left.
  • Partition Function as Right.

When defining the scheme initially, you can also define it in two ways:

  • Partition scheme defined with the next used pre-set value.
  • Partition scheme defined with no pre-set next used value.

So there are 4 combinations to test initially, since we do not need data for this or are worried about the query plan, the main work is pure schema creation and checking.

First script is to just create a test database, nothing significant:

CREATE DATABASE[NextUsedTest] ON PRIMARY
( 
NAME = N'NextUsedTest', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL
\DATA\NextUsedTest.mdf' , 
SIZE = 3072KB , 
FILEGROWTH = 1024KB 
), 
FILEGROUP[PartFG1] ( 
NAME = N'PartFile1', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL
\DATA\PartFile1.ndf' , 
SIZE = 3072KB , 
FILEGROWTH = 1024KB ), 
FILEGROUP[PartFG2] ( 
NAME = N'PartFile2', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL
\DATA\PartFile2.ndf' , 
SIZE = 3072KB , 
FILEGROWTH = 1024KB 
), 
FILEGROUP[PartFG3] ( 
NAME = N'PartFile3', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL
\DATA\PartFile3.ndf' , 
SIZE = 3072KB , 
FILEGROWTH = 1024KB 
), 
FILEGROUP[PartFG4] ( NAME = N'PartFile4', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL
\DATA\PartFile4.ndf' , 
SIZE = 3072KB , 
FILEGROWTH = 1024KB 
), 
FILEGROUP[PartFG5] ( 
NAME = N'PartFile5', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL
\DATA\PartFile5.ndf' , 
SIZE = 3072KB , 
FILEGROWTH = 1024KB 
)
 LOG ON
( NAME = N'NextUsedTest_log', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL
\DATA\NextUsedTest_log.ldf', 
SIZE = 1024KB , 
FILEGROWTH = 10%
) GO

Now the database is in place, lets create the partition functions /schemes:

CREATE PARTITION FUNCTION [pfLeftNoSpare](int) 
AS RANGE LEFT FOR VALUES (200801, 200802, 200803, 200804)

CREATE PARTITION SCHEME [psLeftNoSpare] 
AS PARTITION[pfLeftNoSpare] 
TO([PartFG1], [PartFG2], [PartFG3], [PartFG4], [Primary])

CREATE PARTITION FUNCTION [pfLeftWithNextUsedSet](int) 
AS RANGE LEFT FOR VALUES (200801, 200802, 200803, 200804)

CREATE PARTITION SCHEME [psLeftWithNextUsedSet] 
AS PARTITION [pfLeftWithNextUsedSet] 
TO([PartFG1], [PartFG2], [PartFG3], [PartFG4], [Primary], [PartFG5])

CREATE PARTITION FUNCTION [pfRightNoSpare](int) 
AS RANGE RIGHT FOR VALUES (200801, 200802, 200803, 200804)

CREATE PARTITION SCHEME [psRightNoSpare] 
AS PARTITION[pfRightNoSpare] 
TO([Primary], [PartFG1], [PartFG2], [PartFG3], [PartFG4])

CREATE PARTITION FUNCTION [pfRightWithSpareNextUsedSet](int) 
AS RANGE RIGHT FOR VALUES (200801, 200802, 200803, 200804)

CREATE PARTITION SCHEME [psRightWithSpareNextUsedSet] 
AS PARTITION[pfRightWithSpareNextUsedSet] 
TO([Primary],[PartFG1], [PartFG2], [PartFG3], [PartFG4],  [PartFG5])
GO

Four partition functions and schemes, for the left schemes the next used file group appears on the far right when the next used it pre-set. When you use this mechanisms to preset, it issues a message for the operation:

Partition scheme 'psLeftWithNextUsedSet' has been created successfully. 
'PartFG5' is marked as the next used filegroup 
in partition scheme 'psLeftWithNextUsedSet'. 
Partition scheme 'psRightWithSpareNextUsedSet' has been created successfully. 
'PartFG5' is marked as the next used filegroup 
in partition scheme 'psRightWithSpareNextUsedSet'.

The trick to finding the next used is to look for the mismatching record – if you join the partition function values to the partition scheme there are scheme entries that do not have a corresponding partition function value, since it has not been set by performing the split.

Lets start with the Left based partition and compare the two.

 select FG.Nameas FileGroupName
    , dds.destination_id
    , dds.data_space_id
    , prv.value, ps.Name
 from sys.partition_schemes PS
 inner join sys.destination_data_spaces as DDS 
    on DDS.partition_scheme_id = PS.data_space_id
 inner join sys.filegroups as FG 
    on FG.data_space_id = DDS.data_space_ID 
 left join sys.partition_range_values as PRV 
    on PRV.Boundary_ID = DDS.destination_id and prv.function_id=ps.function_id 
 where PS.name = 'psLeftNoSpare'

The output is:

FileGroupName destination_id data_space_id value Name
PartFG1 1 2 200801 psLeftNoSpare
PartFG2 2 3 200802 psLeftNoSpare
PartFG3 3 4 200803 psLeftNoSpare
PartFG4 4 5 200804 psLeftNoSpare
PRIMARY 5 1 NULL psLeftNoSpare

And check the other Left defined partition:

 select FG.Name asFileGroupName
    , dds.destination_id
    , dds.data_space_id
    , prv.value, ps.Name  
from sys.partition_schemes PS
 inner join sys.destination_data_spaces as DDS 
    on DDS.partition_scheme_id = PS.data_space_id
 inner join sys.filegroups as FG 
    on FG.data_space_id = DDS.data_space_ID 
 left join sys.partition_range_values as PRV 
    on PRV.Boundary_ID = DDS.destination_id and prv.function_id=ps.function_id 
 where PS.name = 'psLeftWithNextUsedSet'

Results:

FileGroupName destination_id data_space_id value Name
PartFG1 1 2 200801 psLeftNoSpare
PartFG2 2 3 200802 psLeftNoSpare
PartFG3 3 4 200803 psLeftNoSpare
PartFG4 4 5 200804 psLeftNoSpare
PRIMARY 5 1 NULL psLeftNoSpare
PartFG5 6 6 NULL psLeftNoSpare

And the difference appears, the next used appears as the highest destination ID, but only when there are two data spaces assigned to the partition that do not have a value. The ‘primary’ entry that shows up as null is due to the partition scheme running from -ve infinity to +ve infinity, and whilst 4 lines are drawn on the number line, that divides the number line into 5 sections.

Running the same statements for the partitions declared using the right shows similar results, but the Primary entry is given destination 5 whilst the spare shows on 6 again.

The query is slightly awkward in that it must pick up the second entry of the list if it exists, using ‘orderings and tops’ will not pull the result we need.

select FileGroupName, Destination_ID, Data_Space_ID, Name  from
(
  select  FG.Name as FileGroupName
   , dds.destination_id
   , dds.data_space_id
   , prv.value
   , ps.Name
   , RANK() OVER (PARTITION BY ps.name order by dds.destination_Id) as dest_rank
  from sys.partition_schemes PS
  inner join sys.destination_data_spaces as DDS 
    on DDS.partition_scheme_id = PS.data_space_id
  inner join sys.filegroups as FG 
    on FG.data_space_id = DDS.data_space_ID 
  left join sys.partition_range_values as PRV 
    on PRV.Boundary_ID = DDS.destination_id and prv.function_id=ps.function_id 
  where prv.Value is null
 ) as a
 where dest_rank = 2

Results:

FileGroupName destination_id data_space_id Name
PartFG5 6 6 psLeftWithNextUsedSet
PartFG5 6 6 psRightWithSpareNextUsedSet

To test whether it picks up setting the next used, let’s set it on the partitions that did not previously have it.

ALTER PARTITION SCHEME psLeftNoSpare NEXT USED [PartFG5]
ALTER PARTITION SCHEME psRightNoSpare NEXT USED [PartFG5]

And re-run the query

FileGroupName destination_id data_space_id Name
PartFG5 6 6 psLeftNoSpare
PartFG5 6 6 psLeftWithNextUsedSet
PartFG5 6 6 psRightNoSpare
PartFG5 6 6 psRightWithSpareNextUsedSet

To make it re-usable, I switched the query into a view

create view NextUseFileGroups
as
select FileGroupName, Destination_ID, Data_Space_ID, Name
 from
 (
   select  FG.Name as FileGroupName
    , dds.destination_id
    , dds.data_space_id, prv.value, ps.Name,
   RANK() OVER (PARTITION BY ps.name order by dds.destination_Id) as dest_rank
   from sys.partition_schemes PS
   inner join sys.destination_data_spaces as DDS
     on DDS.partition_scheme_id = PS.data_space_id
   inner join sys.filegroups as FG
     on FG.data_space_id = DDS.data_space_ID 
   left join sys.partition_range_values as PRV
     on PRV.Boundary_ID = DDS.destination_id and prv.function_id=ps.function_id 
   where prv.Value is null
 ) as a
 where dest_rank = 2

And a final check with removing the setting – you can blank a set next used value by specifying no value in the statement.

ALTER PARTITION SCHEME psLeftNoSpare NEXT USED
ALTER PARTITION SCHEME psRightNoSpare NEXT USED

Select from the view and the two file partition schemes / file groups no longer show up in the list as intended.

So finding out the ‘next used’ setting is possible, although there really is no need in normal operation of the partition window to have to find out, but as an investigative tool it could be useful.

The scripts were tested on both 2005 and 2008, so are good for both in terms of testing, or using the view.