Reliable Retry Aware BCP for SQL Azure
Introduction – Motivation
Microsoft has a number of tools to import data into SQL Azure but as of now none of those are retry-aware. In other words, if a transient network fault occurs during the data import process, an error is raised and the executable is terminated. Cleanup of the partially completed import and re-execution of the process is left to the user. This cleanup is often met with a level of frustration. I was thusly motivated to create a BCP type program (I call it bcpWithRetry) to import data into SQL Azure in a reliable manner by responding to those transient fault conditions and resubmitting any unsuccessful batches.
The second motivation for writing such an application was to support the forthcoming release of the SQL Azure Federations feature. SQL Azure Federations provide a number of connectivity enhancements that provide better connection pool management and cache coherency. The cost is that a ‘USE FEDERATION’ statement must be executed to route queries to the appropriate Federated Member which contains the desired shard of data. This T-SQL statement must be performed prior to executing any queries at the backend SQL Azure database. Currently none of our products support SQL Azure Federations nor offer options or extensions points upon which import data into a Federated Member. In the future I will provide a follow-up to this blog post with an update to the bcpWithRetry client in which I include support for SQL Azure Federations. You can learn more about SQL Azure Federations by reading Cihan Biyiloglu’s SQL Azure posts.
In Closing
I will finish off the blog with a section describing the code and walk-through of some of the application arguments. I have tested the executable with a number of the input dimensions and was able to achieve 200k rows/sec importing a simple comma separated file with 10 million lines into my local SQL Server database. The throughput is 14k/sec when importing the same file into a SQL Azure database. You can find the code here.
Code Walk-Through
The bcpWithRetry program will import data directly into SQL Azure in a reliable manner. The foundation for this program is the SqlBulkCopy class and a custom implementation of the IDataReader interface which provides the ability to read directly from a flat text file. The scope is simple, the program (DataReader) reads data from a flat file line-by-line and bulk imports (SqlBulkCopy) this data into SQL Azure using fixed size transactionally scoped batches. If a transient fault is detected, retry logic is initiated and the failed batch is resubmitted.
The TransientFaultHandling Framework from the AppFabricCat Team was used to implement the retry logic. It provides a highly extensible retry policy framework with custom retry policies and fault handling paradigms for the Microsoft cloud offerings like Windows Azure AppFabric Service Bus and SQL Azure. It catches ‘9’ different SQL Azure exceptions with a default retry policy out-of-the-box. There was no reason for me to either implement my own or shop anywhere else.
The snippet of the code below shows the technique used to import data into SQL Azure. The CSVReader implements the IDataReader interface while the SqlAzureRetryPolicy class offers the necessary retry policy to handle transient faults between the client and SQL Azure. The ExecuteAction method of the SqlAzureRetryPolicy class wraps a delegate which implements the bulk copy import of data into SQL Azure. The SqlRowsCopied event handler will output progress information to the user. The RetryOccurred event provides notification of a transient fault condition and provided the hooks upon which to inform the CSVReader to reread rows from an internal ring buffer.
using (CSVReader sr = new CSVReader(CSVFileName, columns, batchSize, Separator)) { _csvReader = sr; SqlAzureRetryPolicy.RetryOccurred += RetryOccurred; SqlAzureRetryPolicy.ExecuteAction(() => { using (SqlConnection conn = new SqlConnection(ConnStr)) { conn.Open(); using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.UseInternalTransaction | Options, null)) { bulkCopy.DestinationTableName = TableName; bulkCopy.BatchSize = batchSize; bulkCopy.NotifyAfter = batchSize; bulkCopy.SqlRowsCopied += new SqlRowsCopiedEventHandler(OnSqlRowsCopied); foreach (var x in columns) bulkCopy.ColumnMappings.Add(x, x); sw.Start(); Console.WriteLine("Starting copy..."); bulkCopy.WriteToServer(sr); } } }); }
The import facets of the CSVReader implementation is shown below. SqlBulkCopy will make a call to the IDataReader. Read () method to read a row of data and continue to do so until it fills its internal data buffer. The SqlBulkCopy API will then submit a batch to SQL Azure with the contents of the buffer. If a transient fault occurs during this process, the entirety of the batch will be rolled back. It is at this point that the transient fault handling framework will notify the bcpWithRetry client through its enlistment in the RetryOccurred event. The handler for this event merely toggles the CSVReader. BufferRead property to true. The CSVReader then read results from its internal buffer upon which the batch can be resubmitted to SQL Azure.
private bool ReadFromBuffer() { _row = _buffer[_bufferReadIndex]; _bufferReadIndex++; if (_bufferReadIndex == _currentBufferIndex) BufferRead = false; return true; } private bool ReadFromFile() { if (this._str == null) return false; if (this._str.EndOfStream) return false; // Successful batch sent to SQL if we have already filled the buffer to capacity if (_currentBufferIndex == BufferSize) _currentBufferIndex = 0; this._row = this._str.ReadLine().Split(Separator); _numRowsRead++; this._buffer[_currentBufferIndex] = this._row; _currentBufferIndex++; return true; }
Application Usage
The bcpWithRetry application provides fundamental but yet sufficient options to bulk import data into SQL Azure using retry aware logic. The parameters mirror those of the bcp.exe tool which ships with SQL Server. Trusted connections are enabled so as to test against an on-premise SQL Server database. The usage is shown below.
C:\> bcpWithRetry.exe
usage: bcpWithRetry {dbtable} { in } datafile
[-b batchsize] [-t field terminator]
[-S server name] [-T trusted connection]
[-U username] [-P password]
[-k keep null values] [-E keep identity values]
[-d database name] [-a packetsize]
...
c:\>bcpWithRetry.exe t1 in t1.csv –S qa58em2ten.database.windows.net –U mylogin@qa58em2ten –P Sql@zure –t , –d Blog –a 8192
10059999 rows copied.
Network packet size (bytes): 8192
Clock Time (ms.) Total : 719854 Average : (13975 rows per sec.)
Press any key to continue . . .
Reviewers : Curt Peterson, Jaime Alva Bravo, Christian Martinez, Shaun Tinline-Jones
6 Comments
Leave a Reply
You must be logged in to post a comment.














Pingback: Dew Drop – May 12, 2011 | Alvin Ashcraft's Morning Dew
Cihan Biyikoglu and now you are writing about SQL Azure Federations as if it was a real product.
What is the timeline for an SQL Azure Federations CTP?
–rj
Hey Curt,
Thanks for the very cool app! I am having a “bit” of trouble:
Starting copy…
Row 1: The given value of type String from the data source cannot be converted to type bit of the specified target column.
The “-f” switch doesn’t seem to work, and the bit is not nullable. Any ideas?
Thanks!
Joe
Hey Joe. This is James. When I wrote the program I didn’t support format files, only character separation. Try ‘/?’ to get alist of parameters. I don’t know the effort to include that capability. I created a DataReader and fed that to SqlBulkCopy. The SqlBulkCopy only includes a ColumnMappings property to set the input/output ordinals. I have not looked into it, but SqlBulkCopy can also take a DataTable as input. The formatting may be applied in that matter.
Concerning bits, if I recall. SqlBulkCopy will call the GetValue(int index) from the IDataReader and perform the conversion. The native BCP that ships with SQL Sever export bits as ’1′ or ’0′ while the SqlBulkCopy API only accepts and converts strings ‘true’ ‘false’ to a bit. Thus try to change those ’1′, ’0′ values to true/false in your text file or augment the GetValue method to determine ordinal/type and convert on the fly.
Hi Roger -> Cihan is from the SQL Azure team. His blog will contain the most current information.
Hi James, altering the column from BIT to TINYINT was a quick fix. Awesome tool, thanks!