March 2012 : Rob's Tech Notes

Wednesday, March 21, 2012

Database Normalization

For those IT professionals that know me know that I have a passion for data and data architecture. I also do many other IT things (for example I have a small web development company). One of my things that drives my passion for data is that it is the basis that drives efficiencies in almost all areas of IT and business. I find it very frustrating when we pay a bunch of money for an application and install it and discover that the data model is so poorly designed that the application performs poorly or it is difficult to get good info out of it. For this reason I put a lot of stress on data models on ever IT project that I work on.

The processes where by a person would go through tuning a data model is called normalization and denormalization.

Normalization is the process where you organize data using different steps or levels.

First Normal Form (1NF) contains the following steps:

Organize data set(s) into data sets specific content (e.g. data relative to persons and then data related to address).
Identify each record in each data set with a unique ID (primary key).
Remove duplicate columns with the same information

Second Normal Form (2NF) contains the following steps:

Meets 1NF criteria.
Remove subsets of data that create multiple rows (e.g. an employee is in multiple divisions, create a table that links the employee to the division with foreign keys).
Create a a table and foreign key relationships for data that is duplicated across data sets (e.g. the account, contact and order data sets all have address fields. The address info should be put into a table and then a foreign key to that table should then be stored in the account, contact and order data sets).

Third Normal Form (3NF) contains the following steps:

Meets 2NF criteria.
Any data element that could be repeated in multiple records for a single field should be put into its own table and then referenced via foreign key (e.g. The university name in a list of national students should be put into a university data set and then a reference should be used in the national student list).

Boyce-Codd Normal Form (BCNF) contains the following steps:

Meets 3NF criteria.
Create separate tables for data that is dependent upon a value other than the primary key (e.g. a table with address information might have street address, city, state and zip. The city and state are dependent on the zip code which should then be the key to another table that contains the city and state info to reduce duplication of data).

There are other Normal forms though most processes do not typically go beyond 3NF.

Denormalization is the process of violating Normal Forms for the purpose of performance. For those not familiar with the concept of "Query Cost" there is a performance hit for each table joined together in a query. For this reason there are reasons to violate these forms to improve query performance. One common example is to include a full address in the address table instead of just having the street address and zip code and another table with the zip code to state and city associations.

I hope that others find this information useful.

Monday, March 19, 2012

Playing with brandify.com

Today I received an email from Microsoft regarding a new service that is in BETA testing called Brandify. For those with small businesses that are still trying to get a strong online presence started and are not quite sure where you stand this service can help to provide some insight. I am aware of other similar services that are at a cost (you can search online for them). For now this service is currently free (there is nothing indicating that this will change though you never know what will happen).

When you first start the start-up wizard the process will ask you for for accounts to log into LinkedIn, Facebook and add search parms for Google. Then it allows you to select relevant entities in those environments. Once everything is selected the process then analyzes those environments checking for other sites that are linked to your page, how many people have "liked" your page, how many friends you have and so forth.

Once the analysis is done it provides a nice dashboard with a "social score" that rates how well your social identity is setup for your business and provides tips on what you can work on to help make things better.

I am still starting out and I have done a lot of things right though I still have a lot of work to help make my on-line identity better.

It is a cool services and I highly recommend playing with it.

Sunday, March 4, 2012

MS SQL Timezone Function Performance Testing

In some previous posts I had provided different methods for approaching the problem of having a timezone conversion function in MS SQL. I looked at an all SQL solution, a C# CLR solution that looked up parms from a table, a C# CLR function that looked up parms from a XML file and a C# function that had the parm info hard coded into the dll. Since all four methodologies have their own over-head associated to them I wanted to do some stress testing to see which approach is most efficient. I came across a very useful tool in helping me to perform the stress tests on each of these functions. The tool is called SQLQueryStress and was created by Adam Mechanic. You can download the tool here. This tool allows you to have specify a query and put in the number of iterations and threads you want to test with.

In order to do a true analysis I first created a table that contained 2,000,000 records of date information. In the SQL statements I used the “TOP” qualifier to control the number of records to return for each batch so that I can see how the performance degraded over sample size. The sample sizes that I used were, 1, 100, 1000 and 10000 (I had tried using 1000000 though for some of the functions the performance was so slow that it took a LONG time to return the results so I focused on the smaller sample sizes). The iteration/threads samples that I used for the tests included 1/100, 10/100 and 100/200.

I tried not to have any assumptions prior to the testing though to be honest I could not help it. I had assumed that the all SQL solution would not perform as efficiently as the CLR versions since the CLR versions would be compiled. I had thought that the solution where the parms were hard coded into the dll would perform the best. The stress testing sure opened my eyes since it completely destroyed some of my assumptions going into this.

NOTE: The platform that I used to perform these tests is a Sony Vaio laptop (VGN-NS225J). The version of MS SQL that I am using is MS SQL 2008 Express. You will get very different results by using a server and full versions of MS SQL. Also depending upon whether you are using a physical server vs. virtual server you will get different results. I would assume that the distribution of the results will be similar to what I found no matter what platform you are using.

The first round of tests I started with were the 1 thread 100 iterations (the values are the average return per iteration).

sample size	baseline	SQL	API	CLR	XML
100	0.0123	0.081	0.1394	0.4195	0.8973
1000	0.0144	1.1343	1.2519	3.9662	8.3642
10000	0.0542	8.736	11.744	41.9087	82.6566

As you can see the baseline (no functions used) was very fast. I was really surprised that the all SQL functions were the fastest though I did not anticipate that it would be so much faster than the others.

The next set of tests were a little more interesting and may be show casing the reason why I believe the SQL approach is the fastest. This set of tests contained using 10 threads and 100 iterations.

sample size	baseline	SQL	API	CLR	XML
100	0.0055	0.1023	0.7798	0.5397	2.1959
1000	0.0064	1.2121	10.053	5.7433	18.4518
10000	0.0259	13.759	71.777	66.4025	211.254

As you would expect the baseline is still fast enough that the user does not even notice a delay. The all SQL approach is still the fastest and is only a little slower than the previous set of tests. The other approaches were drastically affected. I believe that the SQL approach is similar throughout all the tests because of SQL statement caching. if you are not familiar with this process MS SQL caches SQL returns for performance reasons in case the same statement is requested multiple times. Accessing from memory is much faster then any of the other approaches. This also shows that there is a bit of over head added to CLR functions that slows down those processes. So even though that code is compiled it seems that the CLR is not as fast as native T-SQL.

For curiosity I did one additional test to see what the impact would be for a large number of iterations and threads to simulate high usage. This test focused more on user load as oppose to data load. This test is where I used the “TOP 1” to simulate the conversion of a single record by a large number of people within a short period of time. The largest iteration set that the tool would allow me to use was 100 and the largest number of threads it allowed me to use was 200 (which would result in 20000 samples in a short period of time).

sample size	baseline	SQL	API	CLR	XML
1	0.0284	0.0413	0.3885	0.2342	0.4595

As you can see the all SQL function performed the best in this scenario. In fact the delay in return would be nearly undetectable by the user (by average). All of the processes were sub second in return so if this is the scenario you are looking at then any of the approaches would be fine since they would not be noticeable by the user.

So the final analysis is if you are in need of a function that will need to process for a large number of records for each query (usually for reporting or data analysis) then you want to use an all SQL approach. if you are looking to implement a solution where the function will only be used for a small number of records at a time then any of the approaches would be fine depending upon your comfort level of each of the different architectures and administration for them. The SQL function is still the fastest nut that may not be since the return for the others are also not noticeable.

With that being said, you may be able to get different results based upon modifying the architecture of the code. You should stress test what ever solution you decide upon for the specific scenario that is a likely use for that solution.

Creating the Timezone function as a C# CLR function with hard coded parms

In my effort to find the most efficient method for creating a timezone conversion function in MS SQL I have tried a pure T-SQL solution, CLR function that queries SQL tables, a CLR function that uses an XML file and this article will show a CLR function where the timezone parms are hard coded in the dll.

I approached this CLR function in the same manner as the other processes. I have the administrative class to manage the timezone parms for the functions.

     public class TimeZoneParm

    {

        private string timeZoneCd;

        private string timeZoneName;

        private int offSetHr;

        private int offSetMi;

        private int dstOffSetHr;

        private int dstOffSetMi;

        private string dstEffDt;

        private string dstEndDt;

        public TimeZoneParm()

        {

        }

        public string TimeZoneCd

        {

            get { return timeZoneCd; }

            set { timeZoneCd = value; }

        }

        public string TimeZoneName

        {

            get { return timeZoneName; }

            set { timeZoneName = value; }

        }

        public int OffSetHr

        {

            get { return offSetHr; }

            set { offSetHr = value; }

        }

        public int OffSetMi

        {

            get { return offSetMi; }

            set { offSetMi = value; }

        }

        public int DstOffSetHr

        {

            get { return dstOffSetHr; }

            set { dstOffSetHr = value; }

        }

        public int DstOffSetMi

        {

            get { return dstOffSetMi; }

            set { dstOffSetMi = value; }

        }

        public string DstEffDt

        {

            get { return dstEffDt; }

            set { dstEffDt = value; }

        }

        public string DstEndDt

        {

            get { return dstEndDt; }

            set { dstEndDt = value; }

        }

    }

One of the divergences with this process from the other processes is that I have another class that contains all the parm information in it. One of the bad elements of this process is that anytime you want to add or change the parms you would have to recompile and reregister the dll and functions on the SQL server. The class definition for the parm data is below.

    public static class TimeZoneData

    {

        public static List<TimeZoneParm> GetList()

        {

            List<TimeZoneParm> lsTZP = new List<TimeZoneParm>();

            TimeZoneParm tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "AT";

            tzp.TimeZoneName = "Atlantic Time";

            tzp.OffSetHr =-4;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -3;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "AZ";

            tzp.TimeZoneName = "Arizona Time";

            tzp.OffSetHr =-7;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -7;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "CT";

            tzp.TimeZoneName = "Central Time";

            tzp.OffSetHr =-6;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -5;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "ET";

            tzp.TimeZoneName = "Eastern Time";

            tzp.OffSetHr =-5;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -4;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "HT";

            tzp.TimeZoneName = "Hawaii Time";

            tzp.OffSetHr =-10;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -10;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "MT";

            tzp.TimeZoneName = "Mountain Time";

            tzp.OffSetHr =-7;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -6;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "NT";

            tzp.TimeZoneName = "Newfoundland Time";

            tzp.OffSetHr =-4;

            tzp.OffSetMi = 30;

            tzp.DstOffSetHr = -2;

            tzp.DstOffSetMi = 30;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "PT";

            tzp.TimeZoneName = "Pacific Time";

            tzp.OffSetHr =-8;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -7;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            tzp = new TimeZoneParm();

            tzp.TimeZoneCd = "YT";

            tzp.TimeZoneName = "Yukon Time";

            tzp.OffSetHr =-8;

            tzp.OffSetMi = 0;

            tzp.DstOffSetHr = -8;

            tzp.DstOffSetMi = 0;

            tzp.DstEffDt = "03210200";

            tzp.DstEndDt = "11110100";

            lsTZP.Add(tzp);

            return lsTZP;

        }

        public static TimeZoneParm GetTimeZoneParm(string TimeZoneCode)

        {

            List<TimeZoneParm> lsTZP = GetList();

            TimeZoneParm tzp = new TimeZoneParm();

            foreach(TimeZoneParm t in lsTZP)

            {

                if (t.TimeZoneCd == TimeZoneCode)

                {

                    tzp = t;

                    break;

                }

            }

            return tzp;

        }

    }

The next divergence in this process is that since all of the information is compiled there is no need to query a table or file for the timezone parm info. You just need to make the api function call. The logic for the function that converts a date/time value to UTC time is the same as the other processes. It gets the parms, determines DST start and end dates, converts the provided date/time value to DST and standard for the provided timezone code and then determines whether to use the DST or standard date/time value.

    [SqlFunction(DataAccess = DataAccessKind.Read, IsDeterministic = true, Name = "API_UTC_TIME")]

    public static SqlDateTime fnApiUtcTime(SqlDateTime OrigDate, SqlString CurrentTimeZone)

    {

        try

        {

            TimeZoneParm tzp = TimeZoneData.GetTimeZoneParm(CurrentTimeZone.ToString());

            int offsethr = 0;

            int offsetmi = 0;

            int dst_offsethr = 0;

            int dst_offsetmi = 0;

            string dst_eff_key = "01110000";

            string dst_end_key = "01110000";

            offsethr = tzp.OffSetHr;

            offsetmi = tzp.OffSetMi;

            dst_offsethr = tzp.DstOffSetHr;

            dst_offsetmi = tzp.DstOffSetMi;

            dst_eff_key = tzp.DstEffDt;

            dst_end_key = tzp.DstEndDt;

            DateTime origDate = DateTime.Parse(OrigDate.ToString());

            int dst_eff_mon = int.Parse(dst_eff_key.Substring(0, 2));

            int dst_eff_wk = int.Parse(dst_eff_key.Substring(2, 1));

            int dst_eff_dw = int.Parse(dst_eff_key.Substring(3, 1));

            int dst_eff_hr = int.Parse(dst_eff_key.Substring(4, 2));

            int dst_eff_mi = int.Parse(dst_eff_key.Substring(6, 2));

            int dst_end_mon = int.Parse(dst_end_key.Substring(0, 2));

            int dst_end_wk = int.Parse(dst_end_key.Substring(2, 1));

            int dst_end_dw = int.Parse(dst_end_key.Substring(3, 1));

            int dst_end_hr = int.Parse(dst_end_key.Substring(4, 2));

            int dst_end_mi = int.Parse(dst_end_key.Substring(6, 2));

            DateTime dst_eff_dt_test = new DateTime(origDate.Year, dst_eff_mon, 1);

            dst_eff_dt_test = dst_eff_dt_test.AddDays((double)(-1 * DayOfWeekId(dst_eff_dt_test.DayOfWeek)));

            DateTime dst_eff_dt = dst_eff_dt_test.AddDays(dst_eff_wk * 7);

            dst_eff_dt = dst_eff_dt.AddHours(dst_eff_hr);

            dst_eff_dt = dst_eff_dt.AddMinutes(dst_eff_mi);

            dst_eff_dt = dst_eff_dt.AddDays(dst_eff_dw - 1);

            if (dst_eff_dt.Month != dst_eff_mon)

            {

                dst_eff_dt = dst_eff_dt.AddDays(-7);

            }

            DateTime dst_end_dt_test = new DateTime(origDate.Year, dst_end_mon, 1);

            dst_end_dt_test = dst_end_dt_test.AddDays((double)(-1 * DayOfWeekId(dst_end_dt_test.DayOfWeek)));

            DateTime dst_end_dt = dst_end_dt_test.AddDays(dst_end_wk * 7);

            dst_end_dt = dst_end_dt.AddHours(dst_end_hr);

            dst_end_dt = dst_end_dt.AddMinutes(dst_end_mi);

            dst_end_dt = dst_end_dt.AddDays(dst_end_dw - 1);

            if (dst_end_dt.Month != dst_end_mon)

            {

                dst_end_dt = dst_end_dt.AddDays(-7);

            }

            DateTime newDate = origDate.AddHours(offsethr * -1).AddMinutes(offsetmi * -1);

            DateTime newDateDst = origDate.AddHours(dst_offsethr * -1).AddMinutes(dst_offsetmi * -1);

            bool isDST = false;

            if (origDate >= dst_eff_dt)

            {

                if (origDate <= dst_end_dt)

                {

                    isDST = true;

                }

            }

            if (isDST)

            {

                return (SqlDateTime)newDateDst;

            }

            else

            {

                return (SqlDateTime)newDate;

            }

        }

        catch (Exception ex)

        {

            return OrigDate;

        }

    }

The next function is the one that converts a date/time value from UTC to a provided timezone code. This is similar to the previous function so I will not go into all the details of the function.

    [SqlFunction(DataAccess = DataAccessKind.Read, IsDeterministic = true, Name = "API_TZ_TIME")]

    public static SqlDateTime fnApiTzTime(SqlDateTime OrigDate, SqlString NewTimeZone)

    {

        try

        {

            TimeZoneParm tzp = TimeZoneData.GetTimeZoneParm(NewTimeZone.ToString());

            int offsethr = 0;

            int offsetmi = 0;

            int dst_offsethr = 0;

            int dst_offsetmi = 0;

            string dst_eff_key = "01110000";

            string dst_end_key = "01110000";

            offsethr = tzp.OffSetHr;

            offsetmi = tzp.OffSetMi;

            dst_offsethr = tzp.DstOffSetHr;

            dst_offsetmi = tzp.DstOffSetMi;

            dst_eff_key = tzp.DstEffDt;

            dst_end_key = tzp.DstEndDt;

            DateTime origDate = DateTime.Parse(OrigDate.ToString());

            int dst_eff_mon = int.Parse(dst_eff_key.Substring(0, 2));

            int dst_eff_wk = int.Parse(dst_eff_key.Substring(2, 1));

            int dst_eff_dw = int.Parse(dst_eff_key.Substring(3, 1));

            int dst_eff_hr = int.Parse(dst_eff_key.Substring(4, 2));

            int dst_eff_mi = int.Parse(dst_eff_key.Substring(6, 2));

            int dst_end_mon = int.Parse(dst_end_key.Substring(0, 2));

            int dst_end_wk = int.Parse(dst_end_key.Substring(2, 1));

            int dst_end_dw = int.Parse(dst_end_key.Substring(3, 1));

            int dst_end_hr = int.Parse(dst_end_key.Substring(4, 2));

            int dst_end_mi = int.Parse(dst_end_key.Substring(6, 2));

            DateTime dst_eff_dt_test = new DateTime(origDate.Year, dst_eff_mon, 1);

            dst_eff_dt_test = dst_eff_dt_test.AddDays((double)(-1 * DayOfWeekId(dst_eff_dt_test.DayOfWeek)));

            DateTime dst_eff_dt = dst_eff_dt_test.AddDays(dst_eff_wk * 7);

            dst_eff_dt = dst_eff_dt.AddHours(dst_eff_hr);

            dst_eff_dt = dst_eff_dt.AddMinutes(dst_eff_mi);

            dst_eff_dt = dst_eff_dt.AddDays(dst_eff_dw - 1);

            if (dst_eff_dt.Month != dst_eff_mon)

            {

                dst_eff_dt = dst_eff_dt.AddDays(-7);

            }

            DateTime dst_end_dt_test = new DateTime(origDate.Year, dst_end_mon, 1);

            dst_end_dt_test = dst_end_dt_test.AddDays((double)(-1 * DayOfWeekId(dst_end_dt_test.DayOfWeek)));

            DateTime dst_end_dt = dst_end_dt_test.AddDays(dst_end_wk * 7);

            dst_end_dt = dst_end_dt.AddHours(dst_end_hr);

            dst_end_dt = dst_end_dt.AddMinutes(dst_end_mi);

            dst_end_dt = dst_end_dt.AddDays(dst_end_dw - 1);

            if (dst_end_dt.Month != dst_end_mon)

            {

                dst_end_dt = dst_end_dt.AddDays(-7);

            }

            DateTime newDate = origDate.AddHours(offsethr).AddMinutes(offsetmi);

            DateTime newDateDst = origDate.AddHours(dst_offsethr).AddMinutes(dst_offsetmi);

            bool isDST = false;

            if (newDate >= dst_eff_dt)

            {

                if (newDate <= dst_end_dt)

                {

                    isDST = true;

                }

            }

            if (isDST)

            {

                return (SqlDateTime)newDateDst;

            }

            else

            {

                return (SqlDateTime)newDate;

            }

        }

        catch (Exception ex)

        {

            return OrigDate;

        }

    }

The last function is the main function that will be used to convert a date/time value from a provided timezone code to a provided timezone code. The process for this function is pretty simple. It uses the previous two functions to convert the date/time value to UTC from a provided timezone code and then uses the function to convert from UTC to the provided timezone code.

    [SqlFunction(DataAccess = DataAccessKind.Read, IsDeterministic = true, Name = "API_NEW_TIME")]

    public static SqlDateTime fnApiNewTime(SqlDateTime OrigDate, SqlString CurrentTimeZone, SqlString NewTimeZone)

    {

        try

        {

            SqlDateTime utcDate = fnApiUtcTime(OrigDate, CurrentTimeZone);

            SqlDateTime newDate = fnApiTzTime(utcDate, NewTimeZone);

            return newDate;

        }

        catch (Exception ex)

        {

            return OrigDate;

        }

    }

The next post will include performance results for all the processes to see which process is most efficient.