Archive for October 13, 2009

What is the SQL Server 2008 DateTimeOffset Internal Structure?

October 13, 2009 Leave a comment

After decoding the DateTime2 internal structure I thought I would take a quick look at the DateTimeOffset structure, since it should not present too many difficulties and post it up quickly, but was surprised at the initial result of a test. It follows the same basic premise that the time portion is followed by the date portion and the time / date is based on a day count from the epoch time of 0001/01/01 and the time the number of intervals since midnight, where the interval is defined by the accuracy.

I was expecting the datetime offset value itself to occupy the additional 2 bytes quoted, and not affect the other values. Bad assumption, as soon as I cracked open a few examples I could immediately see that setting the offset also alters the underlying time / date component values as well.

Using the same comparison methods as before the underlying time value is clearly being adjusted:

'0001/01/01 00:00:00 -0:00' => 0x0700000000000000000000
'0001/01/01 00:00:00 -12:00' => 0x0700E034956400000030FD

So, even though an offset is being stored, the underlying time is also being altered to match, and the internal storage is using UTC as the reference point. This makes sense as the most valid reference that you could use.

'0001-01-01 12:00:00.0000000 +00:00' => 0x0700E03495640000000000
'0001-01-01 00:00:00.0000000 -12:00' => 0x0700E034956400000030FD

The same time / date is generated for the two values, but the last two bytes hold the offset and have stored the offset used when the date was initially stored. The underlying storage of the time though is clearly identical in both,  so UTC is the common ground they each get stored again.

The final 2 bytes for the offset are pretty easy to decode, since the pattern is the same as before with a slight twist. The offset time records the number of minutes for the offset in hex, with the first byte of the two being the least significant as before with the time, so you end up reading the two bytes left-to-right and then decode that byte right-to-left.

The twist is that for positive offsets, the value increments 1,2,3, in hex as appropriate, but for negative values, it starts decrementing by considering -1 equal to ‘FFFF’, I’ve split the hex output into the individual components by adding some spaces to make it easier to read. (Accuracy Code, Time Value, Date Value, Offset Used)

'2001-01-01 12:00:00.0000000 +00:01' => 0x07   009A717164   75250B  0100
'2001-01-01 12:00:00.0000000 +00:00' => 0x07   00E0349564   75250B  0000
'2001-01-01 12:00:00.0000000 -00:01' => 0x07   0026F8B864   75250B  FFFF

Since the offsets supported at only +14 hours to -14 hours, there is no risk of the two ranges overlapping. When I think about this a bit more, it is acting as a signed number, -1 being 11111111111 etc. So the 2 bytes at the end is a signed int of the number of minutes offset.

There are a number of time zones in the world that do not occur at exact hourly intervals from UTC, some are on the half hour mark such as Caracas (-4:30) or Delhi (+5:30) to name a few, whilst Kathmandu (+5:45) is even more specific.  In theory the format allows offsets specified to even greater levels of distinction, although I am not sure as to why you would wish to use it. Do you really want an offset of +3:17 minutes? That is potentially scary to consider that as a valid input to the value.

That has made me wonder as to why the accuracy was set so high, when in reality 15 minute intervals would of been sufficient, with a -14 to +14 range, that is 113 different values inclusively, which could be accomodated within a single byte.

So why spend 2 bytes on the format, when 1 was enough? Was it to just be compatible to the ISO format in some way that required it? Not sure.