Archive

Archive for September 12, 2010

Interval Partitioning in SQL Server 2008

September 12, 2010 2 comments

Another one of those features in Oracle that we do not natively have in SQL Server is interval partitioning, where you can automatically have partitions generated based on an interval automatically generated for you. The interval partitioning is yet again, another form of dynamic partitioning, so the thought was, could this be achieved within SQL Server?

My initial thought would be to use an instead of trigger, which would intercept the incoming values and take action on them if appropriate to extend the partition function. An initial look into the documentation suggests it will not be trivial – the BoL states:

"Additionally, the following Transact-SQL statements are not allowed inside the
body of a DML trigger when it is used against the table or view that is the
target of the triggering action. CREATE INDEX, ALTER INDEX, DROP INDEX, DBCC
REINDEX, ALTER PARTITION FUNCTION, DROP TABLE..."

The issue there being the alter partition function – any response to an incoming piece of data that is not properly covered by the existing partition mechanism will need to alter the partition information. Of course there are still ways around such restrictions but when experimenting, it seems the BoL is not correct to list the ‘ALTER PARTITION FUNCTION’ command in the restrictions.

There are a few caveats to the following snippets of code, I am not attempting to deal with the rolling window affect, or complex storage pattern of a partitioned table of which I have written about in other posts. This is purely designed to demonstrate that it could be achieved, not to provide a total solution to the problem. The partitions created on the fly will all go to the Primary filegroup as well etc.

So start with a very basic partition function and scheme:

CREATE PARTITION FUNCTION pfTest (datetime)
AS RANGE LEFT FOR VALUES ('20100104' , '20100111' , '20100118', '20100125')
CREATE PARTITION SCHEME psTest AS
PARTITION pfTest ALL TO ([PRIMARY])

And generate a table on the partition scheme:

CREATE TABLE IntervalTest (
MyID int identity(1,1) not null,   
MyField Varchar(200),
MyDate datetime
) ON psTest(MyDate)

The next step is an ‘Instead of’ trigger, which has to intercept the incoming data from the inserted table, and extend the partition function if required:

CREATE TRIGGER tr_IntervalTest ON IntervalTest INSTEAD OF INSERT
AS
 BEGIN     

  -- get the current maximum partition value
  DECLARE @max_part_dt datetime;   DECLARE @max_inserted_dt datetime;
  DECLARE @weeks_to_add int;

  SET @max_inserted_dt = (SELECT max(MyDate) FROM inserted);

  SET @max_part_dt = (
SELECT max(convert(datetime,value))
from sys.partition_functions f
  inner join sys.partition_range_values rv on f.function_id = rv.function_id
  where name = 'pfTest');      
IF (@max_inserted_dt > dateadd(D,7,@max_part_dt))
  BEGIN
   -- need to potentially add multiple partition splits, it depends on
   -- how many weeks in advance the new data is.
   -- get a whole number of the weeks to add to ensure that we cover 
   -- the new data
   SET @weeks_to_add = ceiling(datediff(D,@max_part_dt, @max_inserted_dt) / 7.0)

   -- loop around splitting the partition function with the new weekly values
   -- that need to be covered
   WHILE @weeks_to_add > 0
   BEGIN
    -- increase the maximum partition date by 7 days and split the function
    SET @max_part_dt = dateadd(D,7,@max_part_dt);

    ALTER PARTITION SCHEME psTest NEXT USED [Primary];

    ALTER PARTITION FUNCTION pfTest() SPLIT RANGE (@max_part_dt);
    SET @weeks_to_add = @weeks_to_add - 1
   END;
  END;

  -- finally insert the values
  INSERT INTO IntervalTest (MyField, MyDate)
  SELECT MyField, MyDate
  FROM inserted;  
END     

The code is pretty self-explanatory, but I would point out that it is only covering an insert, not an update – and this is not production code but an experiment to see if it could be done (contrary to the BoL). To ‘productionize’ this would require significant work on exception handling, performance tuning, handling multiple filegroup partitioning, the list goes on, but all of it achievable.

A little test to insert a couple of values set ahead of the partition

insert into IntervalTest (MyField, MyDate)
select 'b', '20100505'
union
select 'c', '20100606'

And a check of the partition function values now show 23 values in the partition function, instead of the original  4 as follows:

2010-01-04 00:00:00.000, 2010-01-11 00:00:00.000, 2010-01-18 00:00:00.000,
2010-01-25 00:00:00.000, 2010-02-01 00:00:00.000, 2010-02-08 00:00:00.000,
2010-02-15 00:00:00.000, 2010-02-22 00:00:00.000, 2010-03-01 00:00:00.000,
2010-03-08 00:00:00.000, 2010-03-15 00:00:00.000, 2010-03-22 00:00:00.000,
2010-03-29 00:00:00.000, 2010-04-05 00:00:00.000, 2010-04-12 00:00:00.000,
2010-04-19 00:00:00.000, 2010-04-26 00:00:00.000, 2010-05-03 00:00:00.000,
2010-05-10 00:00:00.000, 2010-05-17 00:00:00.000, 2010-05-24 00:00:00.000,
2010-05-31 00:00:00.000, 2010-06-07 00:00:00.000

It has clearly created the partitions to cover the data being inserted and then performed that insertion.

So it can be done, but the constant cost of intercepting every insertion and update to provide this kind of dynamic partition is really not ideal, whether it could be made sufficiently efficient to work at the sort of scale that partitioning tends to be used at is debatable. I have it feeling that it would struggle – I would need to be lent a suitably sized server and SAN to test that one and see whether it could be made efficient enough.