Sunday, March 4, 2012

MS SQL Timezone Function Performance Testing

In some previous posts I had provided different methods for approaching the problem of having a timezone conversion function in MS SQL.  I looked at an all SQL solution, a C# CLR solution that looked up parms from a table, a C# CLR function that looked up parms from a XML file and a C# function that had the parm info hard coded into the dll.  Since all four methodologies have their own over-head associated to them I wanted to do some stress testing to see which approach is most efficient.  I came across a very useful tool in helping me to perform the stress tests on each of these functions.  The tool is called SQLQueryStress and was created by Adam Mechanic.  You can download the tool here.  This tool allows you to have specify a query and put in the number of iterations and threads you want to test with.

In order to do a true analysis I first created a table that contained 2,000,000 records of date information.  In the SQL statements I used the “TOP” qualifier to control the number of records to return for each batch so that I can see how the performance degraded over sample size.  The sample sizes that I used were, 1, 100, 1000 and 10000 (I had tried using 1000000 though for some of the functions the performance was so slow that it took a LONG time to return the results so I focused on the smaller sample sizes).  The iteration/threads samples that I used for the tests included 1/100, 10/100 and 100/200.

I tried not to have any assumptions prior to the testing though to be honest I could not help it.  I had assumed that the all SQL solution would not perform as efficiently as the CLR versions since the CLR versions would be compiled.  I had thought that the solution where the parms were hard coded into the dll would perform the best.  The stress testing sure opened my eyes since it completely destroyed some of my assumptions going into this.

NOTE: The platform that I used to perform these tests is a Sony Vaio laptop (VGN-NS225J).  The version of MS SQL that I am using is MS SQL 2008 Express.  You will get very different results by using a server and full versions of MS SQL.  Also depending upon whether you are using a physical server vs. virtual server you will get different results.  I would assume that the distribution of the results will be similar to what I found no matter what platform you are using.

The first round of tests I started with were the 1 thread 100 iterations (the values are the average return per iteration).

sample size

baseline

SQL

API

CLR

XML

100 0.0123 0.081 0.1394 0.4195 0.8973
1000 0.0144 1.1343 1.2519 3.9662 8.3642
10000 0.0542 8.736 11.744 41.9087 82.6566

As you can see the baseline (no functions used) was very fast.  I was really surprised that the all SQL functions were the fastest though I did not anticipate that it would be so much faster than the others.

The next set of tests were a little more interesting and may be show casing the reason why I believe the SQL approach is the fastest.  This set of tests contained using 10 threads and 100 iterations.

sample size

baseline

SQL

API

CLR

XML

100 0.0055 0.1023 0.7798 0.5397 2.1959
1000 0.0064 1.2121 10.053 5.7433 18.4518
10000 0.0259 13.759 71.777 66.4025 211.254

As you would expect the baseline is still fast enough that the user does not even notice a delay.  The all SQL approach is still the fastest and is only a little slower than the previous set of tests.  The other approaches were drastically affected.  I believe that the SQL approach is similar throughout all the tests because of SQL statement caching.  if you are not familiar with this process MS SQL caches SQL returns for performance reasons in case the same statement is requested multiple times.  Accessing from memory is much faster then any of the other approaches.  This also shows that there is a bit of over head added to CLR functions that slows down those processes.  So even though that code is compiled it seems that the CLR is not as fast as native T-SQL.

For curiosity I did one additional test to see what the impact would be for a large number of iterations and threads to simulate high usage.  This test focused more on user load as oppose to data load.  This test is where I used the “TOP 1” to simulate the conversion of a single record by a large number of people within a short period of time.  The largest iteration set that the tool would allow me to use was 100 and the largest number of threads it allowed me to use was 200 (which would result in 20000 samples in a short period of time).

sample size

baseline

SQL

API

CLR

XML

1 0.0284 0.0413 0.3885 0.2342 0.4595

As you can see the all SQL function performed the best in this scenario.  In fact the delay in return would be nearly undetectable by the user (by average).  All of the processes were sub second in return so if this is the scenario you are looking at then any of the approaches would be fine since they would not be noticeable by the user.

So the final analysis is if you are in need of a function that will need to process for a large number of records for each query (usually for reporting or data analysis) then you want to use an all SQL approach.  if you are looking to implement a solution where the function will only be used for a small number of records at a time then any of the approaches would be fine depending upon your comfort level of each of the different architectures and administration for them.  The SQL function is still the fastest nut that may not be since the return for the others are also not noticeable.

With that being said, you may be able to get different results based upon modifying the architecture of the code.  You should stress test what ever solution you decide upon for the specific scenario that is a likely use for that solution.

No comments:

Post a Comment