Best Practices for Handling Transient Conditions in SQL Azure Client Applications

The following post is intended to offer you a set of best practices centered around the development of reliable SQL Azure client applications. The primary focus of this paper is positioned towards handling transient conditions, namely, those intermittent faults, errors and exceptions that need to be accounted for when developing reliable applications for high-density multi-tenant environments such as SQL Azure.

Background

The developers who have already had the opportunity to start working with the Microsoft’s cloud-based relational database service, widely known as SQL Azure, may know that SQL Azure has introduced some specific techniques and approaches to implementing data access service layer in the applications leveraging the SQL Azure infrastructure.

One of the important considerations is the way how client connections are to be handled. SQL Azure comes with throttling behavior the attributes of which can manifest themselves when a client is establishing connections to a SQL Azure database or running queries against it. The database connections can be throttled internally by the SQL Azure fabric for several reasons, such as excessive resource usage, long-running transactions, and possible failover and load balancing actions, leading to termination of a client session or temporary inability to establish new connections while a transient condition persists. The database connections may also be dropped due to the variety of reasons related to network connectivity between the client and distant Microsoft data centers: quality of network, intermittent network faults in the client’s LAN or WAN infrastructure and other transient technical reasons.

The behavior in question was discussed in the article posted on the SQL Azure team blog back in May 2010. The article articulates the need for implementing retry logic in the client code in order to provide reliable connectivity to the SQL Azure databases. In one of our recent Azure customer projects, we have faced with multiple challenges related to this behavior. This experience has led to creating a generic, reusable framework for handling transient conditions using an extensible retry policy model. We hope that our learnings can be of use for many .NET developers working with SQL Azure.

Double Quote  Note
The information provided in this article reflects the real-world experience with the SQL Azure to date. It is likely that some of the transient conditions discussed below may never surface in a given client application. It is in the nature of a transient condition to be dependent on  and be driven by variable technical, environmental, behavioral and other unique characteristics of a particular application or its surrounding infrastructure.

Transient Conditions in SQL Azure

When handling exceptions in the client applications accessing the SQL Azure databases, it is important to differentiate between general errors and faults that require special treatment. Not every exception would be considered as a transient error. The client applications need to ensure that the application code will enter into retry state only when it’s strictly necessary.

Below are some examples of the transient conditions that may occur in the SQL Azure infrastructure and result in the termination of a client connection or query:

Error Number Error Message
40197 The service has encountered an error processing your request. Please try again.
40501 The service is currently busy. Retry the request after 10 seconds.
10053 A transport-level error has occurred when receiving results from the server. An established connection was aborted by the software in your host machine.
10054 A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)
10060 A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 – A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)
40613 Database XXXX on server YYYY is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of ZZZZZ.
40143 The service has encountered an error processing your request. Please try again.
233 The client was unable to establish a connection because of an error during connection initialization process before login. Possible causes include the following: the client tried to connect to an unsupported version of SQL Server; the server was too busy to accept new connections; or there was a resource limitation (insufficient memory or maximum allowed connections) on the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)
64 A connection was successfully established with the server, but then an error occurred during the login process. (provider: TCP Provider, error: 0 – The specified network name is no longer available.)
20 The instance of SQL Server you attempted to connect to does not support encryption.

For a complete list of errors, please refer to the Error Messages (SQL Azure Database) section in the MSDN documentation.

In order to determine whether or not a specific exception should be treated as “transient” when working with SQL Azure, the following guidelines must be adhered to:

  • Check the exception type first. The main exception type which user code will need to treat as an indicator of a transient fault is SqlException.
  • Filter out those SQL exceptions which do not indicate a transient error. The SqlException.Number property helps assert whether or not an exception should be considered as transient. Do not attempt to parse the exception text as it may vary between different releases of the .NET Framework Data Provider for SQL Server.
  • Verify if the error number belongs to the family of transient errors by checking it against a set of well-known error codes. The main error codes that need to be accounted for are listed above. In addition, check the up-to-date list of error codes indicating a loss of connection.

This guidance could easily be packaged into a fully reusable framework for handling connection loss and failed SQL commands due to transient conditions.

Transient Conditions Handling Framework

The framework that we have developed takes into account the end requirements for handling the possible transient conditions. Internally, the framework relies on the implementation of a “retry policy” which makes sure that only valid transient errors will be handled. The policy verifies whether or not an exception belongs to the legitimate category of transient faults before the client application enters into retry state.

At a glance, our implementation of the transient condition handling framework:

  • Provides the foundation for building highly extensible retry logic for handling a variety of transient conditions, not limited to SQL Azure.
  • Supports a range of pre-defined retry policies (fixed retry interval, progressive retry interval, random exponential backoff).
  • Supports separate retry policies for SQL connections and SQL commands for additional flexibility.
  • Supports retry callbacks to notify the user code whenever a retry condition is encountered.
  • Supports the fast retry mode whereby the very first retry attempt will be made immediately thus not imposing delays when recovering from short-lived transient faults.
  • Enables to define the retry policies in the application configuration files.
  • Provides extension methods to support retry capabilities directly in SqlConnection and SqlCommand objects.
  • Supports both synchronous and asynchronous invocation patterns.

The next sections drill down into specific implementation details and are intended to help the developers understand when and how they should make use of the transient condition handling framework referenced above. To follow along, download the full sample code from the MSDN Code Gallery.

Technical Implementation

The following class diagram depicts the underlying technical implementation highlighting all core components and their dependencies:

p1

The key components in the framework are the RetryPolicy<T> and ReliableSqlConnection classes and the ITransientErrorDetectionStrategy interface.

The RetryPolicy<T> class along with its abstract RetryPolicy counterpart encapsulate all the essential logic responsible for iterative execution of developer-defined actions which may result in a transient exception.

The ReliableSqlConnection class is implemented as a look-a-like of SqlConnection and provides a set of value-add methods to ensure that connections could be reliably established and commands could reliably executed against a SQL Azure database.

The ITransientErrorDetectionStrategy interface provides the base mechanism upon which different types of transient conditions can be described and packaged into a reusable policy object that performs validation on a given .NET exception against a well-known set of transient faults. Along with a transient error detection policy for SQL Azure, the framework also includes the transient condition detection strategies for AppFabric Service Bus, AppFabric Message Buffers and Windows Azure storage.

// Defines an interface which must be implemented by custom components responsible for detecting specific transient conditions.
public interface ITransientErrorDetectionStrategy
{
    /// <summary>
    /// Determines whether the specified exception represents a transient failure that can be compensated by a retry.
    /// </summary>
    /// <param name="ex">The exception object to be verified.</param>
    /// <returns>True if the specified exception is considered as transient, otherwise false.</returns>
    bool IsTransient(Exception ex);
}

In addition, the class library provides a set of C# extension methods enabling the .NET developers to open SQL Azure database connections and invoke the SQL commands from within a retry policy-aware scope. The extension methods can be useful in the event when the developers are unable to adopt their code to take advantage of the ReliableSqlConnection class. For instance, a developer might be using an existing data access framework (e.g. Enterprise Library) which returns the pre-initialized instances of SqlConnection class. In this case, the extension methods could help add the retry capabilities into the existing code without major re-work.

Usage Patterns

The following sections illustrate some common usage patterns that apply when building reliable SQL Azure client applications using the transient condition handling framework discussed above.

Configuring Retry Policies

There are two primary ways of setting up a retry policy in the transient condition handling framework:

  1. Create an instance of the RetryPolicy<T> class with required transient error detection strategy and appropriate configuration parameters specified at construction time.
  2. Describe the retry policy definitions in the application configuration file and use the provided configuration APIs to instantiate and return an instance of the appropriate retry policy.

The RetryPolicy<T> class allows creating different policies depending on particular needs. The class constructors accept variable input and return an instance of the respective retry policy configured as per specified initialization parameters:

public class RetryPolicy<T> : RetryPolicy where T : ITransientErrorDetectionStrategy, new()
{
    /// <summary>
    /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and default
    /// fixed time interval between retries.
    /// </summary>
    /// <param name="retryCount">The number of retry attempts.</param>
    public RetryPolicy(int retryCount) : this(retryCount, DefaultRetryInterval) {  /* ... */ }

    /// <summary>
    /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and time
    /// interval between retries.
    /// </summary>
    /// <param name="retryCount">The number of retry attempts.</param>
    /// <param name="intervalBetweenRetries">The interval between retries.</param>
    public RetryPolicy(int retryCount, TimeSpan intervalBetweenRetries) {  /* ... */ }

    /// <summary>
    /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and backoff
    /// parameters for calculating the exponential delay between retries.
    /// </summary>
    /// <param name="retryCount">The number of retry attempts.</param>
    /// <param name="minBackoff">The minimum backoff time.</param>
    /// <param name="maxBackoff">The maximum backoff time.</param>
    /// <param name="deltaBackoff">The delta value in the exponential delay between retries.</param>
    public RetryPolicy(int retryCount, TimeSpan minBackoff, TimeSpan maxBackoff, TimeSpan deltaBackoff) {  /* ... */ }

    /// <summary>
    /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and
    /// parameters defining the progressive delay between retries.
    /// </summary>
    /// <param name="retryCount">The number of retry attempts.</param>
    /// <param name="initialInterval">The initial interval which will apply for the first retry.</param>
    /// <param name="increment">The incremental time value for calculating progressive delay between retries.</param>
    public RetryPolicy(int retryCount, TimeSpan initialInterval, TimeSpan increment) {  /* ... */ }
}

The retry policies can also be defined in the application configuration. Each retry policy definition is accompanied with a friendly name and a set of parameters such as retry count and interval:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <configSections>
    <section name="RetryPolicyConfiguration" type="Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.Configuration.RetryPolicyConfigurationSettings, Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling" />
  </configSections>

  <RetryPolicyConfiguration defaultPolicy="FixedIntervalDefault" defaultSqlConnectionPolicy="FixedIntervalDefault" defaultSqlCommandPolicy="FixedIntervalDefault" defaultStoragePolicy="IncrementalIntervalDefault" defaultCommunicationPolicy="IncrementalIntervalDefault">
    <add name="FixedIntervalDefault" maxRetryCount="10" retryInterval="100" />
    <add name="IncrementalIntervalDefault" maxRetryCount="10" retryInterval="100" retryIncrement="50" />
    <add name="ExponentialIntervalDefault" maxRetryCount="10" minBackoff="100" maxBackoff="1000" deltaBackoff="100" />
  </RetryPolicyConfiguration>
</configuration>

Once the retry policy configuration is defined, a policy can be instantiated using the following 3 simple lines of code:

// Retrieve the retry policy settings from the application configuration file.
RetryPolicyConfigurationSettings retryPolicySettings = ApplicationConfiguration.Current.GetConfigurationSection<RetryPolicyConfigurationSettings>(RetryPolicyConfigurationSettings.SectionName);

// Retrieve the required retry policy definition by its friendly name.
RetryPolicyInfo retryPolicyInfo = retryPolicySettings.Policies.Get("FixedIntervalDefault");

// Create an instance of the respective retry policy using the transient error detection strategy for SQL Azure.
RetryPolicy sqlAzureRetryPolicy = retryPolicyInfo.CreatePolicy<SqlAzureTransientErrorDetectionStrategy>();

The RetryPolicy instances carry all the necessary “intellect” capable of recognizing the legitimate transient conditions when executing the user code as shown in the next two sections.

Reliably Opening SQL Azure Database Connections

In order to ensure that a connection to a SQL Azure database can be reliably established, one of the following approaches can be adopted:

  • Use the Open method from the ReliableSqlConnection class. Should the connection fail to be established from the first attempt, the associated retry policy will take effect and a request will be retried as per the specified retry policy;
  • Use the OpenWithRetry extension method against an instance of the SqlConnection class. Its behavior is similar to the above. Behind the scene, the specified retry policy will kick in and will re-try the request should a transient error be encountered.

Below are some examples of using the above approaches:

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString))
{
     // Attempt to open a connection using the specified retry policy.    
     conn.Open(sqlAzureRetryPolicy);
     // ... execute SQL queries against this connection ...
}

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString, sqlAzureRetryPolicy))
{
     // Attempt to open a connection using the retry policy specified at construction time.    
     conn.Open();
    // ... execute SQL queries against this connection ...
}

using (SqlConnection conn = new SqlConnection(connString))
{
    // Attempt to open a connection using the specified retry policy.
    // The extension method is used in this context since we are dealing with a SqlConnection instance.
    conn.OpenWithRetry(sqlAzureRetryPolicy);
    // ... execute SQL queries against this connection ...
}

Note that both approaches deliver the same end result. In comparison to the standard SqlConnection class, ReliableSqlConnection provides a few value-add capabilities such as retrieving the current session’s CONTEXT_INFO value for tracing purposes and executing SQL commands using general purpose ExecuteCommant<T> method.

Reliably Executing Queries Against SQL Azure Databases

When executing queries against a SQL Azure database, it is also important to handle situations when a connection may be terminated due to transient reasons previously discussed, .e.g query throttling. Should this occur, an attempt to retry the query may become a necessity. Note that not all queries could be safely retried, most importantly those which do not leave the data in a consistent state, for instance, when making updates to multiple tables without an outer transaction ensuring atomicity of the overall operation.

For those queries that could be safely retried, one of the following approaches can be adopted:

  • Use the ExecuteCommand or ExecuteCommand<T> method in the ReliableSqlConnection class. A failed command will be automatically retried as per the specified policy. The retry operation will also ensure that a SQL connection will be re-opened (if required) before attempting to re-run the failed SQL command.
  • Use the appropriate extension method available for the SqlCommand class such as ExecuteNonQueryWithRetry, ExecuteReaderWithRetry, etc.

Below are some examples of using the above approaches:

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString, sqlAzureRetryPolicy))
{
    conn.Open();

    SqlCommand selectCommand = new SqlCommand("select name, object_id from sys.objects where name = 'Application'", conn.Current);

    // Execute the above query using a retry-aware ExecuteCommand method which will
    // automatically retry if the query has failed (or connection was dropped)
    using (IDataReader dataReader = conn.ExecuteCommand<IDataReader>(selectCommand))
    {
        if (dataReader.Read())
        {
            string objectName = dataReader.GetString(dataReader.GetOrdinal("name"));
            tableObjectID = dataReader.GetInt32(dataReader.GetOrdinal("object_id"));
        }
    }
}

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString))
{
    conn.Open(sqlAzureRetryPolicy);

    IDbCommand selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "UPDATE Application SET [DateUpdated] = getdate()";

    // Execute the above query using a retry-aware ExecuteCommand method which will
    // automatically retry if the query has failed (or connection was dropped)
    int recordsAffected = conn.ExecuteCommand(selectCommand, sqlAzureRetryPolicy);
}

using (SqlConnection conn = new SqlConnection(connString))
{
    conn.Open();

    SqlCommand selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "select * from sys.objects where name = 'Application'";

    int tableObjectID = Int32.MinValue;

    // Execute the above query using a retry-aware ExecuteReaderWithRetry method which will
    // automatically retry if the query has failed (or connection was dropped)
    using (IDataReader dataReader = selectCommand.ExecuteReaderWithRetry(sqlAzureRetryPolicy))
    {
        if (dataReader.Read())
        {
            string objectName = dataReader.GetString(dataReader.GetOrdinal("name"));
            tableObjectID = dataReader.GetInt32(dataReader.GetOrdinal("object_id"));
        }
    }

    selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "select object_id from sys.objects where name = 'Application'";

    // Execute the above query using a retry-aware ExecuteScalarWithRetry method which will
    // automatically retry if the query has failed (or connection was dropped)
    object objectID = selectCommand.ExecuteScalarWithRetry(sqlAzureRetryPolicy);

    selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "UPDATE Application SET [DateUpdated] = getdate()";

    // Execute the above query using a retry-aware ExecuteNonQueryWithRetry method which
    // will automatically retry if the query has failed (or connection was dropped)
    int recordsAffected = selectCommand.ExecuteNonQueryWithRetry(sqlAzureRetryPolicy);
}

The usage patterns have so far been focused on fairly primitive ADO.NET examples. The following section offers slightly more advanced scenarios where the transient error handling framework could increase the reliability of SQL Azure client applications regardless how these applications access their data.

Advanced Usage Patterns

It is fair to expect that modern data-oriented software would not always be going down the path of using the plain ADO.NET APIs when accessing the application data. Many alternative technologies have been developed over the past few years to support advanced data access scenarios: Entity Framework, WCF Data Services, LINQ to SQL, ASP.NET Dynamic Data, just to name a few. All these technologies are intended to significantly reduce the complexity of data management and greatly simplify the way how rich data is modeled, queried and projected to the application domain-specific space.

Whenever SQL Azure is chosen as relational data platform interoperable with any of the above technologies, handling transient conditions will immediately become a requirement. Given that the data access is heavily abstracted with the use of these technologies, the approach to adding resistance against transient faults differs from what has been discussed up to this point.

Fortunately, the implementation of the retry policy model in the transient condition handling framework makes it easier to wrap any user code into a retryable scope. Should a transient fault be encountered, the entire scope will be re-run. This capability is delivered by the ExecuteAction and ExecuteAction<T> methods:

sqlAzureRetryPolicy.ExecuteAction(() =>
{
    // Invoke a LINQ2SQL query.
});

return sqlAzureRetryPolicy.ExecuteAction<IEnumerable<string>>(() =>
{
    // Invoke a LINQ query against the Entity Framework model.
    return result;
});

Note that retryable scope should present itself as an atomic unit of work. The scope could be invoked multiple times and it is therefore important to ensure  that it leaves the underlying data in a transactionally consistent state. In addition, the scope should not swallow exceptions, these are required for detecting transient conditions.

The following sample is borrowed from the MSDN Library and enriched with retry-aware logic where appropriate. This will increase the overall reliability of the client code making it robust and more resistant to potential connection or query throttling should the application database be hosted in SQL Azure.

// Define the order ID for the order we want.
int orderId = 43680;

// Create an EntityConnection.
EntityConnection conn = new EntityConnection("name=AdventureWorksEntities");

// Create a long-running context with the connection.
AdventureWorksEntities context = new AdventureWorksEntities(conn);

try
{
    // Explicitly open the connection inside a retry-aware scope.
    sqlAzureRetryPolicy.ExecuteAction(() =>
    {
        if (conn.State != ConnectionState.Open)
        {
            conn.Open();
        }
    });

    // Execute a query to return an order. Use a retry-aware scope for reliability.
    SalesOrderHeader order = sqlAzureRetryPolicy.ExecuteAction<SalesOrderHeader>(() =>
    {
        return context.SalesOrderHeaders.Where("it.SalesOrderID = @orderId",
                new ObjectParameter("orderId", orderId)).Execute(MergeOption.AppendOnly).First();
    });

    // Change the status of the order.
    order.Status = 1;

    // Delete the first item in the order.
    context.DeleteObject(order.SalesOrderDetails.First());

    // Save changes inside a retry-aware scope.
    sqlAzureRetryPolicy.ExecuteAction(() => { context.SaveChanges(); });

    SalesOrderDetail detail = new SalesOrderDetail
    {
        SalesOrderID = 1,
        SalesOrderDetailID = 0,
        OrderQty = 2,
        ProductID = 750,
        SpecialOfferID = 1,
        UnitPrice = (decimal)2171.2942,
        UnitPriceDiscount = 0,
        LineTotal = 0,
        rowguid = Guid.NewGuid(),
        ModifiedDate = DateTime.Now
    };

    order.SalesOrderDetails.Add(detail);

    // Save changes again inside a retry-aware scope.
    sqlAzureRetryPolicy.ExecuteAction(() => { context.SaveChanges(); });
}
finally
{
    // Explicitly dispose of the context and the connection. 
    context.Dispose();
    conn.Dispose();
}

In summary, the versatility of the transient condition handling framework comes with the ability to perform retry-aware operations in a variety of contexts, whether it’s a single SQL statement or a large unit of work. In all cases, the way how the transient faults are detected will be consistently similar.

Conclusion

The underlying fabric managing the SQL Azure nodes comes with specific elements of behavior which need to be fully understood by the client applications accessing the SQL Azure databases. The throttling behavior of SQL Azure forces to come up with a better way of handling connections and executing queries. This includes the need for handling transient exceptions to ensure that the client code is able to behave reliably in the event of SQL Azure database connections being throttled by the Resource Manager. There are also other intermittent conditions which need to be accounted for. Consequently, having a robust retry mechanism in the SQL Azure client applications becomes imperative.

We aimed to provide the community with validated best practices to help the .NET developers build a reliable data access layer taking into account these specific behavioral attributes of our cloud-based database infrastructure. Our best practices were presented in a form a reusable framework which developers could easily plug in and adopt in their solutions.

The accompanying sample code is available for download from the MSDN Code Gallery.  Note that all source code files are governed by the Microsoft Public License (Ms-PL) as explained in the corresponding legal notices.

Additional Resources/References

For more information on the topic discussed in this paper, please check out the following resources:

Authored by: Valery Mizonov
Reviewed by: James Podgorski, Michael Thomassy

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)

37 Comments

  1. August 18, 2011 at 2:44 pm

    Where can I find information on using this library for Azure storage? I can see the settings, but I see no extension methods for blob operations.

  2. Valery Mizonov
    August 18, 2011 at 3:48 pm

    Hi Oliver,

    We have a plenty of real-world examples of how to use this framework with Windows Azure queues and blobs in order to make storage operations more resilient to transient faults.

    Specifically, please review the following post and the accompanying sample code.

    http://windowsazurecat.com/2011/06/implementing-storage-abstraction-layer-to-support-very-large-messages-in-windows-azure-queues/

  3. August 18, 2011 at 8:27 pm

    Hi Valery,

    Thanks for the quick response (again). I had seen that article, but ignored it because I didn’t want larger messages in my queues :-) Anyway, along with adding the stuff from this article to app.config, this code seems to work for me (although only time will tell!).

    using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.Configuration;
    using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.AzureStorage;
    using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling;

    // Fetch retry policy appropriate to blob storage (see article above).
    RetryPolicyConfigurationSettings retryPolicySettings = ApplicationConfiguration.Current.GetConfigurationSection(RetryPolicyConfigurationSettings.SectionName);
    RetryPolicyInfo retryPolicyInfo = retryPolicySettings.Policies.Get(retryPolicySettings.DefaultStoragePolicy);
    strategy for SQL Azure.
    RetryPolicy blob_retry = retryPolicyInfo.CreatePolicy();

    // Retry blob operations like this:
    blob_retry.ExecuteAction(() => {
    blob.Delete();
    });

  4. Valery Mizonov
    August 18, 2011 at 9:29 pm

    Your code is correct, Oliver.

    I would also recommend to configure the storage API not to enforce any retry policies internally. Since you wrap a Delete operation into a retry-aware scope, the blob client should not attempt to perform any retries on its own.

    var storageAccount = new CloudStorageAccount(….);
    var blobStorage = storageAccount.CreateCloudBlobClient();
    blobStorage.RetryPolicy = RetryPolicies.NoRetry();

  5. Valery Mizonov
    August 18, 2011 at 9:36 pm

    Should you feel the need for more guidance or code samples, please let us know by submiting your proposal via http://windowsazurecat.com/submit-content-idea/

    Thanks.

  6. Jan Ove Halvorsen
    August 24, 2011 at 6:31 am

    Hi!
    Is there any particular things to do in order to use this framework in an azure worker role when connecting to sql azure? Is there something I need to do with the connection string, for example? After I implemented this framework in my worker role the “old” connection string, which was working 100% before, gives me an exception:
    System.Data.SqlClient.SqlException (0×80131904): A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 – The requested name is valid, but no data of the requested type was found.)

  7. Valery Mizonov
    August 24, 2011 at 10:32 pm

    Jan,

    I don’t want to make any unsafe assumptions when responding to your question, so let me ask for some clarifications. What do you refer to as “old connection string”? How did you integrate the framework with your code?

    • Jan Ove Halvorsen
      August 24, 2011 at 11:18 pm

      Hi!
      Before I implemented the framework I was using a simple SqlConnection approach with a connection string like this: “server=xxxxxxx.database.windows.net;database=yyyy;uid=zzzzzz;pwd=**********”
      This was working fine, except for some occasional exceptions due to transient errors.
      I wanted to use the mentioned framework and replaced my SqlConnection with the ReliableSqlConnection class, with approrpiate retry policies. I had to change the connection string like this: “Data Source=xxxxxxx.database.windows.net;Initial Catalog=yyyy;User ID=zzzzzzzz;Password=**********”. The original format is not supported by this framework. Anyway, since I implemented the framework I am not able to connect after publishing the worker role. The worker role works fine when running locally. Could it be that the “safe connection string” approach behaves differently from within azure than from outside?
      Thanks!

  8. Valery Mizonov
    August 25, 2011 at 7:28 am

    Thanks for the clarifications. Let me ask you some further questions:

    1. Have you configured the SQL Azure firewall? Before you can connect to your SQL Azure server for the first time, you must use the Windows Azure Platform Management Portal to configure the SQL Azure firewall. You will need to create a firewall rule that enables connection attempts from your local computer (which you obviously have already done as you are able to connect from DevFarbic) and from clients running Windows Azure (e.g. worker roles). You can enable connection attempts from Windows Azure services by clicking “Allow Microsoft Services access to this server” on the Firewall Settings tab.

    2. Did you include the server name into the User ID property in the connection string? You will need to append the SQL Azure server name to the login in the connection string using the @ notation. For example, if your login was named login1 and the fully qualified name of your SQL Azure server is servername.database.windows.net, the username parameter of your connection string should be: login1@servername.

    I will take a look why the original connection string was not accepted by ReliableSqlConnection.

  9. Valery Mizonov
    August 25, 2011 at 10:51 pm

    Jan,

    I double-checked and can confirm that ReliableSqlConnection supports your original connection string format. I found no issues during testing. Please advise on the error message that you are getting when using the old connection string.

  10. bryn
    September 7, 2011 at 11:29 pm

    Roughly once a day a database operation in our Azure application fails with a SQL -2 timeout error. We have a retry approach (our app pre-dates appfabric CAT samples) which looks at the exception to see if it is transient or not.

    At the time we wrote our retry code it was suggested that -2 error shouldn’t be retired but I have noticed that some retry implementations do

    (http://social.technet.microsoft.com/wiki/contents/articles/retry-logic-for-transient-failures-in-sql-azure.aspx
    http://code.msdn.microsoft.com/SQL-Azure-Retry-Logic-2d0a8401/sourcecode?fileId=41427&pathId=2020874875

    and some don’t

    http://code.msdn.microsoft.com/Reliable-Retry-Aware-BCP-a5ae8e40/sourcecode?fileId=22213&pathId=925802491

    What would be considered the correct approach?

    The sproc that is failing is looking up reference data (so it is very unlikely that contention is an issue) to re-populate data which has been expired from a cache.

  11. Jan Ove Halvorsen
    September 8, 2011 at 1:47 am

    Hi!
    In my original connection string I did not use the “best practice” user id format (@) since it was not really required. Since the ReliableSqlConnection class rewrites the connection string to use the ip-address this format is required. That was why it did’nt work in my case. My bad!
    Concerning the format: The issue I was seeing must have been related to the wrong user id format. I can confirm I am not anymore having a problem the original format now when using the proper user id format. Thanks for pointing me in the right direction!

  12. Valery Mizonov
    September 12, 2011 at 6:08 am

    Hi Bryn,

    As things stand today, we do not recommend compensating a SQL timeout error code -2 by a retry. Instead, it’s advisable to look into query plan, command timeout settings and reasons for long-running queries. We have briefly outlined our recommendations in this post:

    http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/7a50985d-92c2-472f-9464-a6591efec4b3/

    Having said that, we have recently encountered the timeout errors when stress-testing a Windows Azure-based solution that uses SQL Azure. We have been unable to isolate these issues to a particular query as many of our queries were timing out. We will be doing more testing over the next few weeks to see if we can repro these timeouts on demand. Once it’s achieved, we will share our findings with the community.

  13. September 15, 2011 at 4:31 pm

    Can this library work with WebMatrix.Data.Database? I cannot see how to hook it in.

    • bryn
      September 22, 2011 at 3:56 pm

      Hi Valery,

      Our timeouts were occuring on a simple select statement

      SELECT @Password = Merchant.Password
      FROM Merchant
      WHERE Merchant.ID = @ID

      The system was lightly loaded at the time and based on what I see in our EntLib logging the problem seems to happen in clusters every so often.
      I have also noticed some other oddness with the performance of SQL queries. Have details if you’re interested.
      Thanks
      Bryn

  14. Valery Mizonov
    September 17, 2011 at 6:23 pm

    Hello again, Oliver!

    One option would be to create an extension method for Execute, Query, QuerySingle and QueryValue. Each extension method will accept an additional parameter of type RetryPolicy. Inside the extension methods, you can expect to call the respective database operations as follows:

    public static Object QuerySingle(this Database db, string commandText, RetryPolicy retryPolicy, params Object[] args)
    {
    return retryPolicy.ExecuteAction<Object>(() => { return db.QuerySingle(commandText, args); });
    }

    However, there is a potential gotcha in the above code. If a SQL connection that is being maintained internally by the Database class becomes unusable, it needs to be recreated (closed and re-opened). We can’t retry a call to QuerySingle if the underlying connection is invalid.

    What I would do is enrich the extension method with something like this:

    public static Object QuerySingle(this Database db, string commandText, RetryPolicy retryPolicy, params Object[] args)
    {
    return retryPolicy.ExecuteAction<Object>(() =>
    {
    try
    {
    return db.QuerySingle(commandText, args); });
    }
    catch
    {
    // Ensure the underlying connection is explicitly closed in the event of any exception (both transient and non-transient)
    db.Connection.Close();
    // Let the exception bubble up so that it can be analyzed and the above action is retried if necessary.
    throw;
    }
    }

    A catch here is that the Database class will always ensure that there is a valid connection under the hood. It contains an internal method called EnsureConnectionOpen() that gets called every time you run a query or execute a non-query SQL statement. By closing a connection upon encountering an exception, we force the Database class to open a new connection. Since it’s all done in the context of a retry policy, all transient errors will be taken care of.

    Does this sound as a reasonable solution for you?

  15. September 18, 2011 at 8:08 pm

    This looks like a good solution. Thanks! I also have some legacy code containing multiple SQL statements within transactions. (Not using WebMatrix.) I guess I should I should wrap the code for each transaction in a call to retryPolicy.ExecuteAction().

  16. Wouter
    September 19, 2011 at 1:38 am

    Can someone give me a code example of how to use the SqlCommand.Begin- and EndExecuteNonQuery construction in combination with the Transient Fault Handling Framework?

    • Valery Mizonov
      October 1, 2011 at 12:48 pm

      Hi Wouter,

      Here is the example of the async invocation of the SQL commands using the Transient Fault Handling Framework:

      string connectionString = "Data Source=tcp:[YOUR-SQLAZURE-SERVERNAME].database.windows.net;Initial Catalog=[YOUR-SQLAZURE-DBNAME];User ID=[YOUR-SQLAZURE-USERNAME];Password=[YOUR-SQLAZURE-USERPASSWORD];Asynchronous Processing=True;MultipleActiveResultSets=True";

      RetryPolicy retryPolicy = new RetryPolicy(RetryPolicy.DefaultClientRetryCount);

      using(var conn = new SqlConnection(connectionString))
      {
      var command1 = new SqlCommand("Update app.PersistenceQueue Set QueueItemLastUpdated = getdate()", conn);
      var command2 = new SqlCommand("Update app.PersistenceQueue Set QueueItemStatus = 2", conn);

      var waitObject = new CountdownEvent(2);
      int result1 = 0, result2 = 0;

      retryPolicy.ExecuteAction
      (
      (cb) =>
      {
      if (conn.State != ConnectionState.Open) conn.OpenWithRetry();
      command1.BeginExecuteNonQuery(cb, null);
      },
      (ar) =>
      {
      try
      {
      result1 = command1.EndExecuteNonQuery(ar);
      }
      finally
      {
      waitObject.Signal();
      }
      },
      (ex) =>
      {
      Trace.TraceError(ex.Message);
      if (waitObject.CurrentCount > 0) waitObject.Signal();
      }
      );

      retryPolicy.ExecuteAction
      (
      (cb) =>
      {
      if (conn.State != ConnectionState.Open) conn.OpenWithRetry();
      command2.BeginExecuteNonQuery(cb, null);
      },
      (ar) =>
      {
      try
      {
      result2 = command2.EndExecuteNonQuery(ar);
      }
      finally
      {
      waitObject.Signal();
      }
      },
      (ex) =>
      {
      Trace.TraceError(ex.Message);
      if (waitObject.CurrentCount > 0) waitObject.Signal();
      }
      );

      waitObject.Wait();
      }

      Hope this helps.

      PS: Code snippets don’t look good when entered in comments. I tried to fix formatting with no luck.

  17. Pingback: Handling Transient Connection Failures in SQL Azure | Convective

  18. October 4, 2011 at 9:29 am

    How do we handle transient conditions with SQL Azure when using EF 4.1?
    http://bit.ly/r4jKPw

    does the post cover it all? or can we expect more explicit features in the “TransientFaultHandling” DLL from WindowsAzureCAT team? or from Microsoft?
    Thanks

  19. Pingback: Handy Library for Dealing with Transient Connectivity in the Cloud « Michael Washam Technical Evangelist – Azure | Server | Web

  20. Pingback: Windows Azure and Cloud Computing Posts for 11/15/2011+ - Windows Azure Blog

  21. Pingback: Transient Connections in Windows Azure: Useful Resources | MSDN Blogs

  22. Don Zoeggerle
    November 30, 2011 at 8:06 am

    Hi Valery,

    We are assessing the options to incorporate a retry logic (SQL Azure specific) to our Azure project. I have to admit that TransientFaultHandling framework provides all functionality we need (great work btw). One question though (I couldn’t find an example of that anywhere) -
    How do I handle failures after the retry logic has ended with no luck ?
    For example:
    using (ReliableSqlConnection conn = new ReliableSqlConnection(connString))
    {
    // Attempt to open a connection using the specified retry policy.
    conn.Open(sqlAzureRetryPolicy);
    // … execute SQL queries against this connection …
    }

    Should I put the call conn.Open(sqlAzureRetryPolicy); in an exception handling block (a typed exception perhaps) ? The genera question should be – how to determine if the retry succeeded or failed ?

  23. Pingback: Enterprise Library Windows Azure Integration Pack | Windows Azure Community Deutschland

  24. Valery Mizonov
    December 2, 2011 at 5:16 pm

    Hi Don,

    The answer is simple – after we exhaust all retry attempts, the last exception will be re-thrown to your application code. You should expect to handle it as per your established exception handling practices, e.g. write the detailed exception information into trace/event log for offline analysis, trigger an alert, recycle the role instance, etc.

    If it was possible to recover from a transient condition, your code will continue to run as normal.

    I would not wrap a call to conn.Open(sqlAzureRetryPolicy) into a try/catch block. The try/catch should sit outside of the scope in question.

    Valery

  25. Pingback: A SQL Azure tip a day (12) – KILL « SQL Service – Specialiserade Microsoft SQL Server konsulter

  26. William
    February 2, 2012 at 6:51 am

    Valery,

    Is there any way to incorporate transient retries when using the ASP.NET Universal Providers (System.Web.Providers) or the default ASP.NET Providers? We’re getting a large number of connection related errors when the system is accessing the session or membership data. I know we could add a retry policy to code that accesses session data and membership infomation, but we also get errors when the session state is accessed outside our own code e.g. on page load etc.

    Regards.

    Example stack trace:
    System.Web.HttpException (0×80004005): Exception of type ‘System.Web.HttpException’ was thrown. —> System.Data.UpdateException: An error occurred while updating the entries. See the inner exception for details. —> System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
    The statement has been terminated.
    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
    at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
    at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
    at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
    at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
    at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
    at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
    at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
    at System.Data.Mapping.Update.Internal.DynamicUpdateCommand.Execute(UpdateTranslator translator, EntityConnection connection, Dictionary`2 identifierValues, List`1 generatedValues)
    at System.Data.Mapping.Update.Internal.UpdateTranslator.Update(IEntityStateManager stateManager, IEntityAdapter adapter)
    — End of inner exception stack trace —
    at System.Data.Mapping.Update.Internal.UpdateTranslator.Update(IEntityStateManager stateManager, IEntityAdapter adapter)
    at System.Data.Objects.ObjectContext.SaveChanges(SaveOptions options)
    at System.Web.Providers.DefaultSessionStateProvider.DoGet(HttpContext context, String id, Boolean exclusive, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
    at System.Web.Providers.DefaultSessionStateProvider.GetItemExclusive(HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
    at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
    at System.Web.SessionState.SessionStateModule.PollLockedSessionCallback(Object state)
    at System.Web.SessionState.SessionStateModule.EndAcquireState(IAsyncResult ar)
    at System.Web.HttpApplication.AsyncEventExecutionStep.OnAsyncEventCompletion(IAsyncResult ar)

  27. Pingback: DataSets, TableAdapters, and Transient Retry Logic For SqlAzure

  28. Pingback: Best Practices: Implementing Auto-Retry for SQL Azure Connections « Zeros, Ones and a Few Twos

  29. Pingback: NHibernate.SqlAzure: Transient fault-handling NHibernate driver for use with Windows Azure SQL Database | Robert Daniel Moore's Blog

  30. Pingback: HOW TO: Use Reusable and Configurable Mock for SQL Azure Transient Fault Handling | Aditi Technologies

  31. Pingback: HOW TO: Use Reusable and Configurable Mock for SQL Azure Transient Fault Handling | Aditi Technologies Blog

  32. Pingback: Oscar Meszar | Breeze SQL Database (Azure) and Transient Errors (Part 1)

Leave a Reply