Sunday, 1 March 2009

Speeding up SQL Server Inserts by using transactions

When ADO.NET 2.0 was released, one of the new features being touted was the ability for SqlDataAdapter to submit updates/inserts in batches. Having spent time recently optimizing an Oracle batch solution, I decided to investigate the batch functionality provided by the SqlDataAdapter (The SQL Server specific DataAdapter).

Firing up Reflector I discovered this functionality is encapsulated in the internal class SqlCommandSet, which resides in the System.Data DLL. If you are not using the SqlDataAdapter/Datasets in your project then you are out of luck (or so I thought).

A quick Google for "SqlCommandSet" allowed me to find two posts by Oren Eini, who had exposed the functionality of this internal class via delegates (see here and here). I hadn't considered leveraging the functionality of internal types using delegates before (I have done similar things using reflection, but not delegates).

It turns out Oren's code has been included in Rhino Commons, in a wrapper class called SqlCommandSet.

Oren fails to mention the performance increase gained by using transactions in a batch solution. This post discusses the increases that can be gained by using a transaction across the whole insert operation.

Not wanting to have a dependency on Rhino Commons (I am sure it an excellent framework), and wanting to leverage the inbuilt-generic delegates of Func<> and Action<> (this is now best practise). I decided to write my own version:

#region license

// Modified by Richard OD to exploit .NET 3.5 08 March 2009

// Copyright (c) 2005 - 2007 Ayende Rahien (ayende@ayende.com)

// All rights reserved.

//

// Redistribution and use in source and binary forms, with or without modification,

// are permitted provided that the following conditions are met:

//

//     * Redistributions of source code must retain the above copyright notice,

//     this list of conditions and the following disclaimer.

//     * Redistributions in binary form must reproduce the above copyright notice,

//     this list of conditions and the following disclaimer in the documentation

//     and/or other materials provided with the distribution.

//     * Neither the name of Ayende Rahien nor the names of its

//     contributors may be used to endorse or promote products derived from this

//     software without specific prior written permission.

//

// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND

// ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED

// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE

// DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE

// FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL

// DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR

// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER

// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,

// OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF

// THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#endregion

 

 

using System;

using System.Data.SqlClient;

using System.Reflection;

 

namespace BatchUpdater

{

    public sealed class SqlCommandSetWrapper : IDisposable

    {

        private static readonly Type commandSetType;

        private readonly object commandSet;

        private readonly Action<SqlCommand> appenderDel;

        private readonly Action disposeDel;

        private readonly Func<int> executeNonQueryDel;

        private readonly Func<SqlConnection> connectionGetDel;

        private readonly Action<SqlConnection> connectionSetDel;

        private readonly Action<SqlTransaction> transactionSetDel;

 

        private int commandCount;

 

        static SqlCommandSetWrapper()

        {

            Assembly systemData = Assembly.Load("System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089");

            commandSetType = systemData.GetType("System.Data.SqlClient.SqlCommandSet");

        }

 

        public SqlCommandSetWrapper()

        {

            commandSet = Activator.CreateInstance(commandSetType, true);

            appenderDel = (Action<SqlCommand>)Delegate.CreateDelegate(typeof(Action<SqlCommand>), commandSet, "Append");

            disposeDel = (Action)Delegate.CreateDelegate(typeof(Action), commandSet, "Dispose");

            executeNonQueryDel = (Func<int>)Delegate.CreateDelegate(typeof(Func<int>), commandSet, "ExecuteNonQuery");

            connectionGetDel = (Func<SqlConnection>)Delegate.CreateDelegate(typeof(Func<SqlConnection>), commandSet, "get_Connection");

            connectionSetDel = (Action<SqlConnection>)Delegate.CreateDelegate(typeof(Action<SqlConnection>), commandSet, "set_Connection");

            transactionSetDel = (Action<SqlTransaction>)Delegate.CreateDelegate(typeof(Action<SqlTransaction>), commandSet, "set_Transaction");

        }

 

        public void Append(SqlCommand command)

        {

            commandCount++;

            appenderDel.Invoke(command);

        }

 

        public int ExecuteNonQuery()

        {

            return executeNonQueryDel.Invoke();

        }

 

        public SqlConnection Connection

        {

            get

            {

                return connectionGetDel.Invoke();

            }

            set

            {

                connectionSetDel.Invoke(value);

            }

        }

 

        public SqlTransaction Transaction

        {

            set

            {

                transactionSetDel.Invoke(value);

            }

        }

 

        public int CommandCount

        {

            get

            {

                return commandCount;

            }

        }

 

        public void Dispose()

        {

            disposeDel.Invoke();

        }

    }

}



Of course this class is by its very nature risky- you should only use it in your own solutions after performing adequate testing. Also note that any changes to future versions of the .NET framework could make this class fail at runtime.

I wanted to see the performance difference between batching SQL statements (via the SqlCommandSet class) and executing multiple commands. I also wanted to see what difference using transactions would make, and also the performance impact of using SQLBulkCopy.

I then wrote a test program to compare non batched inserts against SqlCommandSet and the SqlBulkCopy class.

Here are the results (note in my case the DB is on the same machine as the C# app- not the usual separate DB server):



By far the fastest was the SqlBulkCopy. This is unsurprising as this class is very similar to SQL Server's BCP program allowing it to bypass normal integrity checks and perform minimal logging. The next fastest technique is using the SqlCommandSet with a transaction open throughout the operation, followed by a transaction open throughout the operation using seperate SqlCommand objects.

The reason why the batched updates increases performance is due to every interaction with SQL Server requiring two-way communication with the database (handshaking). However sending too many statements in one batch can also hinder performance. In Programming Microsoft® ADO.NET 2.0 Core Reference David Sceppa recommends using a batch size of between 100 and 1000 for optimum performance.

Whilst it can be important to batch multiple updates to SQL Server, it is more important for performance to wrap them in a transaction. By default without specifying an explicit transaction, SQL Server will execute each statement in a separate transaction, regardless of whether or not the query is sent as a batch or each statement. Each transaction requires the log to be written to disk- writing this out in one go is going to be a lot more efficient than doing it for each individual statement.

Here is the client test code (it could do with a bit of a cleanup):

using System;

using System.Configuration;

using System.Data;

using System.Data.SqlClient;

using System.Diagnostics;

 

namespace BatchUpdater

{

    class Program

    {

        private static string connectionString = ConfigurationManager.ConnectionStrings["sqlServerDB"].ConnectionString;

        private const string INSERT_STATEMENT = "INSERT INTO SomeTable(SomeColumn) VALUES ('This is nice')";

 

        static void Main(string[] args)

        {

            // Truncate just to be sure

            TruncateTable();

            Stopwatch sw = new Stopwatch();

            int batchSize = 1000;

 

            sw.Start();

            PerformInsertsWithCommandSetWrapper(batchSize, true);

            sw.Stop();

            Console.WriteLine("Time with command set wrapper and transactions {0}", sw.Elapsed);

            sw.Reset();

 

            TruncateTable();

 

            sw.Start();

            PerformInsertsUsingSingleCommands(true);

            sw.Stop();

            Console.WriteLine("Time with single commands and transactions {0}", sw.Elapsed);

            sw.Reset();

 

            TruncateTable();

 

            sw.Start();

            PerformInsertsWithCommandSetWrapper(batchSize, false);

            sw.Stop();

            Console.WriteLine("Time with command set wrapper without transactions {0}", sw.Elapsed);

            sw.Reset();

 

            TruncateTable();

 

            sw.Start();

            PerformInsertsUsingSingleCommands(false);

            sw.Stop();

            Console.WriteLine("Time with single commands without transactions {0}", sw.Elapsed);

            sw.Reset();

 

            TruncateTable();

 

            DataTable testData = GetTestData();

            sw.Start();

            UseBcp(testData);

            sw.Stop();

            Console.WriteLine("Time with BCP {0}", sw.Elapsed);

 

            Console.ReadLine();

        }

 

        private static void PerformInsertsUsingSingleCommands(bool useTransactions)

        {

            using (SqlConnection con = new SqlConnection(connectionString))

            {

                con.Open();

                SqlTransaction tran = null;

                if(useTransactions) tran = con.BeginTransaction();

                for (int i = 0; i < 50000; i++)

                {

 

                    SqlCommand cmd2 = new SqlCommand(INSERT_STATEMENT, con);

                    if(useTransactions) cmd2.Transaction = tran;

                    cmd2.ExecuteNonQuery();

 

                }

                if (useTransactions)

                {

                    tran.Commit();

                    tran.Dispose();

                }

            }

        }

 

        private static void PerformInsertsWithCommandSetWrapper(int batchSize, bool useTransactions)

        {

            for (int i = 0; i < 50000; i = i + batchSize)

            {

                using (SqlConnection con = new SqlConnection(connectionString))

                using (SqlCommandSetWrapper wrapper = new SqlCommandSetWrapper())

                {

                    for (int j = 0; j < batchSize; j++)

                    {

                        SqlCommand cmd = new SqlCommand(INSERT_STATEMENT);

                        wrapper.Append(cmd);

                    }

 

                    wrapper.Connection = con;

                    con.Open();

                    SqlTransaction tran = null;

                    if(useTransactions)

                    {

                        tran = con.BeginTransaction();

                        wrapper.Transaction = tran;

                    }

                    wrapper.ExecuteNonQuery();

                    if (useTransactions)

                    {

                        tran.Commit();

                        tran.Dispose();

                    }

                }

            }

        }

 

        private static void TruncateTable()

        {

            using (SqlConnection cona = new SqlConnection(connectionString))

            {

                cona.Open();

                SqlCommand cmd = new SqlCommand("TRUNCATE TABLE SomeTable", cona);

                cmd.ExecuteNonQuery();

            }

        }

 

        private static DataTable GetTestData()

        {

            DataTable dt = new DataTable();

            dt.BeginLoadData();

            dt.Columns.Add("SomeColumn");

            for (int i = 0; i < 50000; i++)

            {

                dt.Rows.Add("This is nice");

            }

            dt.EndLoadData();

 

            return dt;

        }

 

        private static void UseBcp(DataTable dt)

        {

            SqlBulkCopy bcp = new SqlBulkCopy(connectionString);

            bcp.DestinationTableName = "dbo.SomeTable";

 

            bcp.WriteToServer(dt);

        }

    }

}



Of course it would be interesting to look at solutions when using LINQ to SQL or LINQ to Entities.

2 comments:

  1. Hey rich thanks for this.
    One question comes straight to mind - why use Delegate.invoke?

    Could you not just invoke the lambda directly?

    e.g.

    Action<T> caller;
    .
    .
    caller = x => {};
    .
    .
    .
    T foo = default(T);
    caller(foo);

    ReplyDelete
  2. Preet,

    I assume you meaning doing this:

    executeNonQueryDel();

    Instead of:

    executeNonQueryDel.Invoke();

    I considered doing that. The reason why I called Invoke, is because I prefer the syntax (it makes it more obvious from a casual glance that I'm using a delegate). Had I not done it this way, the resultant CIL would still be the same.

    ReplyDelete