The PDC is just about over, the final sessions have finished and the place is emptying rapidly – the third day has included a lot of good information about SQL Azure, the progress made to date on it as well as the overall direction – including a new announcement by David Robinson, Senior PM on the Azure team about a project codenamed ‘Houston’ .
During the sessions today the 10Gb limit on a SQL Azure database was mentioned a number of times, but each time was caveated with the suggestion that this is purely the limit right now, and it will be increased. To get around this limit, you can partition your data across multiple SQL Azure databases, as long as your application logic understands which database to get the data from. There was no intrinsic way of creating a view across the databases, but it immediately made me consider that if you were able to use the linked server feature of the full SQL Server, you could link to multiple Azure databases and created a partitioned view across the SQL Azure databases – got to try that out when I get back to the office but I do not expect it to work.
SQL Azure handles all of the resilience, backup, DR modes etc, and it remains hidden from you – although when connected to the SQL Azure database you do see a ‘Master’ database present. It is not really a ‘Master’ in the same way that we think of one, and it quickly becomes apparent how limited that version of the ‘Master’ really is – it exists purely to give you a place to create logins and databases. It could have been called something else to make it a bit clearer but one of the SQL Azure team said it was to keep compatibility to other 3rd party applications that expected there to be a master.
SQL Azure supports transactions as mentioned before, but given the 10GB limit currently on a database you will be partitioning your data across databases. That will be a problem, because the system does not support distributed transactions, so any atomic work that is to be committed on multiple databases at once it going to have to be controlled manually / crufted in code, which is not ideal and a limitation to be aware of.
Equally cross database joins came up as an area with problems – they can be made, but it appears there are performance issues – interested to start running some more tests there and see whether you can mimic a partitioned view across databases using joins. The recommendation was to duplicate reference data between databases to avoid joins, so lookup tables would appear in each database in effect, removing the cross database join.
On the futures list:
- The ability to have dynamic partition splits looked interesting, regular SQL server does not have this facility within a partitioned table – so if Azure can do it across databases then this might come up on the SQL roadmap as a feature – that could be wishful thinking.
- Better tooling for developers and administrators – that is a standard future roadmap entry.
- Ability to Merge database partitions.
- Ability to Split database partitions.
So SQL Azure has grown up considerably and continues to grow, in the hands-on-labs today I got to have more of a play with it and start testing more of the subtle limitations and boundaries that are in place. Connecting to an azure database via SQL Server Management Studio is trivial and the object explorer contains a cut down version of the normal object tree, but includes all the things you would expect such as tables, views and stored procedures.
Some limitations of the lack of master and real admin access become apparent pretty fast, no DMV support, no ability to check your current size. No ability to change a significant number of options, in fact, the bulk of the options are not even exposed.
Two of my personal favourites I took an immediate look at, maxdop and parameterization.
- Maxdop is set at 1, although you can not see it, and attempting to override it throws an error from the query windows, telling you that it is not permitted. Do not plan on parallel query execution, you will not get it.
- I attempted to test the query parameterisation using the date literal trick and it appeared to remain parametrized, as though the database is in ‘forced’ parameterisation mode, so is more likely to get parameter sniffing problems but I have not been able to concretely prove it as yet, but the early indication is the setting is ‘Forced’
One other interesting concept was that a table had to have a clustered index, it was not optional if you wanted to get data into the table, although is did not stop me from creating a table without a clustered index, I had not attempted to populate data into it to see this limit in action – a case of too much to do and so little time.
On one of the final talks about SQL Azure, David Robinson announced a project codenamed ‘Houston’ – (there will be so many ‘we have a problem’ jokes on that one) which is basically a silverlight equivalent of SQL Server Management Studio. The concept comes from the SQL Azure being within the cloud, and if the only way to interact with it is by installing SSMS locally then it does not feel like a consistent story.
From the limited preview, it only contains the basics but it clearly let you create tables, stored procedures and views, edit them, even add data to tables in a grid view reminiscent of Microsoft Access. The UI was based around the standard ribbon bar, object window on the left and working pane on the right. It was lo-fi to say the least but you could see conceptually where it could go – given enough time it could become a very good SSMS replacement, but I doubt it will be taken that far. There was an import and Export button on the ribbon with what looked to be ‘Excel’ like icons but nothing was said / shown of them. Date wise ‘Targetting sometime in 2010’, so this has some way to go and is not even in beta as yet.
So that was PDC09, excellent event, roll on the next one!
MaxDop is in some sense a bit of a misnomer, in that you would think ‘Max Degree of Parallelism’, set by the system administrator would be the last and final word on the matter; That is your maximum, and there are no choices to be made.
However, whilst at the SQL Immersion event in Dublin hosted by Prodata I had made an off hand comment about increasing the thread count beyond the MaxDop setting whilst creating an on-line index on an OLTP based system that had MaxDop 1 set.
That gained me some quizzical looks, in that most assume MaxDop is set and has to be adhered to, so surely what I was indicating was not possible? Well – yes it is, and there is even a KB that relates to it and the difference between SQL Server 2000 /2005 and 2008. http://support.microsoft.com/default.aspx/kb/329204
I should also mention that the BoL is less than precise about the situation, against SQL 2008 the BoL states for ‘Max Degree of Parallelism Option’:
Set max degree of parallelism to 1 to suppress parallel plan generation.
And against the ‘Degree of Parallelism Page’ :
For example, you can use the MAXDOP option to control, by extending or reducing, the number of processors dedicated to an online index operation. In this way, you can balance the resources used by an index operation with those of the concurrent users.
So that’s entirely clear…
You can infer (aside from the KB) that something is not what it seems when it states ‘by extending or reducing’ – how can you extend it? that would not make logical sense if it was a hard limit.
So we have a slightly bizarre situation in which you can override the server level setting with your own value. Initial thought is, this is a bit dangerous isn’t it?
Would you want the average user or developer to start implementing the appropriate query hint and override you? They would get more CPU so they wouldn’t hesitate (but not guarenteed better performance for it, since they would all be doing it)
To perform a test, I used a large partitioned table since the parallelism that occurs when querying a partitioned table is far easier to predict and engineer specifix scenarios to. The table had ~688 million rows in it and the selection was asking for a simple row count from 2 of the available partitions. The server used was an 8-core.
Select Count(*) From LargePartitionedTable Where MyPartitionColumn in (200801, 200901)
The baseline case was setting the server to a maxdop of 1
sp_configure 'max degree of parallelism', 1 go reconfigure go
The query plan is not particularily complex and the two key parts are that the clustered index scan.
The baseline case of using MaxDop 0 was then set up and the same query run.
sp_configure 'max degree of parallelism', 0 go reconfigure go
As expected, the query ran up 8 threads (8 core machine) and only 2 of the threads processed any work, since SQL 2005 threading model is to assigned 1 thread per partition, when more than 1 partitioned is requested.
So the base scenarios are in place, now the query is altered slightly to override the MaxDop upwards.
sp_configure ‘max degree of parallelism’, 1
Select Count(*) From tblODS Where TimelineID in (200801, 200802) Option (maxdop 4)
If the override works, the thread count will go up from 1, and as per the KB article it does.
So the MaxDop set by an administrator does not need to be obeyed – but does this apply to everyone? That is really the key issue and unfortunately it does not appear to fall the way you would wish it to. Placing the SQL into Mixed Authentication mode I created a user account, and gave it only the permission to select from the table and the show plan permission, nothing else.
Result? The query parellelled again, indicating there is no special permission required for a user to overide the setting.
The facility to override it is useful, on an OLTP where you want to increase the indexing speed by using more threads it is essential – but to not require it as a granted permission to the service accounts / jobs performing those tasks – that seems slightly bizarre.
These tests were performed using SQL 2005, so I need to test SQL 2008 to see if the lack of controls continue to exist.