I have covered how to layout a partitioned table across filegroups previously, but have not gone through the steps of rolling a partitioned window – it sounds a simple process but with all the file group and pre-requisites for it to run smoothly anyone starting with partitioned tables could probably use a little guide. As you are about to see the process is quite intricate so I will go through the load process on this post and the unload on the next.
Because no one case fits all, I have made some assumptions / limitations to provide a guide, specifically:
- The main partitioned table has a clustered index.
- The layout is following the mechanism of keeping a staging filegroup and spare filegroup as detailed in the layout post.
- The rollout process intends to remove the oldest data / partition.
- The process is designed for large loads, not single inserts.
So let’ s see what it takes to prepare and get data into a partitioned table:
- Create a staging table on your dedicated ETL filegroup, of an identical column schema to your partitioned table.
- Load the data into the staging table.
- Move the staging table to the spare filegroup, using a clustered index creation. (The need for the spare was covered in the layout post)
- Add any additional Non-Clustered indexes required to match the partitioned table indexes.
- Constrain the data so that it is considered trusted – the constraint must ensure all values are within the partition boundary you intend to place it within.
- Set the Partition Schema Next Used Filegroup
- Split the Partition Function
- Switch the staging table into the main partitioned table
That was all just to bulk load data into a partitioned table – a long list and plenty of opportunity for it to go wrong, but most of these steps use T-SQL that you will be very familiar with - it is only the last 3 items that use less common SQL and are harder to automate, since there is no built-in tools to do the work for you.
Setting the Next Used Filegroup
The intention when setting the filegroup is to declare where the partition should locate data for the partition function split that is about to occur. Whilst you can discover what the previous setting might be, it is not advisable to rely on it but set it every time, just before performing a partition function split. The syntax for the command is:
ALTER PARTITION SCHEME YourPartitionSchemeName NEXT USED [YourSpareFG]
Splitting the Partition Function
This splitting of the partition function is in effect the creation of an extra dividing section on the number line / date line representing the partitioned table. If you split a partition that already has data the operation will be quite expensive since can be forced to move data between filegroups, so it is common in a rolling window scenario that you split to handle the incoming data, which is always in advance of your existing data, e.g. If you are storing sales data partitioned by the month/year of the sales date, and currently only hold up until November, you would not insert any data for December until the partition for December had been created.
The syntax forward:
ALTER PARTITION FUNCTION YourPartitionFunctionName() SPLIT RANGE (YourBoundaryValue)
But when importing new data in an automated fashion, you might not know whether the new partition split has already been performed or not, so how can you check whether the new boundary value is already created in the partition function? DMV’s can provide the answer:
SELECT count(value) as ValueExists FROM sys.partition_range_values INNER JOIN sys.PARTITION_FUNCTIONS ON sys.PARTITION_FUNCTIONS.function_id = sys.partition_range_values.function_id WHERE name = 'YourPartitionFunctionName' AND value = YourBoundaryValue
A returned value of 0 would indicate it did not exist, whilst a 1 would indicate a boundary value had already been created.
Switching the Staging Table In
Switching the staging table into the newly created partition looks relatively easy but needs the partition number:
ALTER TABLE yourStagingTable SWITCH TO YourPartitionedTable PARTITION PartitionNumber
Where do you get the partition number from? The partition number is the boundary ID, and is numbered starting at 1 from the furthers left partition sequentially upwards. If you know the boundary value you have set for the partition, you can get the boundary id using the DMV’s again
SELECT boundary_id FROM sys.partition_range_values INNER JOIN sys.partition_functions ON sys.partition_functions.function_id = sys.partition_range_values.function_id WHERE name = 'YourPartitionFunctionName' AND value= YourBoundaryValue
These additional DMVs allow you to get access to the data you need to automate the process in stored procedures, finding out the boundary IDs in one step, to be used in the next step etc.
These are the trickier parts of the process to automate that need the help of the DMVs. In the next post I will go through the unloading of the old data.