AppFabric Cache – Compressing at the Client
Introduction
The Azure AppFabric Cache price for a particular size has three quotas prorated over an hour period, the number of transactions, network bandwidth as measured in bytes and the number of concurrent connections. The three price points are bundled in basket form where the variable basket sizes are synonymous to the total cache size purchased. As the basket choices grow in size, so do the transaction, network throughput and number concurrent connection limits. See here for pricing details.
It is apparent that if one ‘squeezes’ the data before putting it into the AppFabric Cache they would be able to stuff more objects into cache for a given usage quota . Interestingly enough, the Azure AppFabric Cache API has a property on the DataCache class called IsCompressionEnabled, which implies that the Azure AppFabric Cache provides this capability out of the box. Upon further inspection with Red Gate .NET Reflector we find the IsCompressionEnabled property is a no-op.
But what is the impact of compressing the data before placing it into Azure AppFabric Cache? Compression algorithms are available in the .NET Framework; the tools to ascertain this impact are accessible to any .NET developer.
In this blog, we will be looking at that very topic; inquiring minds (customers) have asked our team, I am curious as well. We will take the AdventureWorks database and add some compressed and non-compressed Product and ProductCategory data into cache and perform the associated retrieval. Measurements will be taken to compare the baseline non-compressed data size and access durations to that of the compressed data. In general an educated man would say that the total cached used would be probably be less but the overall duration of the operators to be longer due to the overhead created by the compression of the CLR objects. Let’s find out.
Implementation
Note: These tests are not meant to provide exhaustive coverage, but rather a probing of the feasibility of compression to shave bytes from the data before it is sent to AppFabric Cache.
To keep things simple, static methods were created to perform serialization of the data similar to the paradigms implemented by AppFabric Cache and compress the data using the standard compression algorithms exposed by the DeflateStream class. In short, the CLR object was serialized and then compressed.
Serialization
The Azure AppFabric cache API utilizes the NetDataContractSerializer to serialize any CLR object marked with the Serializable attribute. The XmlDictionaryWriter and XmlDictionaryReader pair will encode/decode serialized data to/from an internal byte stream using binary XML format. The combination of the NetDataContractSerializer and binary XML formatting provides the flexibility of shared types and performance of binary encoding. On the other hand, byte arrays bypass these serialization techniques and are written directly to an internal stream.
Armed with this knowledge, the following static methods were created to serialize and deserialize the objects so as to be able to compare the relative size of encoded objects before and after the bytes are passed to the compression algorithm.
Serialization is performed by the NetDataContractSerializer and encoded in binary XML format using the XmlDictionaryWriter.
public static byte[] SerializeXmlBinary(object obj) { using (MemoryStream ms = new MemoryStream()) { using (XmlDictionaryWriter wtr = XmlDictionaryWriter.CreateBinaryWriter(ms)) { NetDataContractSerializer serializer; serializer = new NetDataContractSerializer(); serializer.WriteObject(wtr, obj); ms.Flush(); } return ms.ToArray(); } }
The DeSerializeXmlBinary method below is the used to ‘reverse’ engineer or decode the originating object. The XmlDictionaryReader is passed the data to be decoded. It is created with the XmlDictionaryReaderQuotas property set to Max, thus creating a reader without quotas which may limit the read size and node depth of the xml.
public static object DeSerializeXmlBinary(byte[] bytes) { using (XmlDictionaryReader rdr = XmlDictionaryReader.CreateBinaryReader(bytes, XmlDictionaryReaderQuotas.Max)) { NetDataContractSerializer serializer; serializer = new NetDataContractSerializer(); serializer.AssemblyFormat = FormatterAssemblyStyle.Simple; return serializer.ReadObject(rdr); } }
Compression
As stated early, the DeflateStream class implements the industry standard implementation of the LZ77 algorithm and Huffman coding algorithm to provide lossless compression and decompression for the serialized data. Besides, it comes with .NET 4 and greatly simplifies the effort. The code to implement the compression is straight forward and requires very little annotation. The only note is that that the object was serialized before it was compressed and visa-versa. Placing the serialization/deserialization code inside the CompressData/DecompressData methods is for simplification and illustration purposes only. Ideally one would create an extension method for the DataCache class to perform similar operations.
public static byte[] CompressData(object obj) { byte[] inb = SerializeXmlBinary(obj); byte[] outb; using (MemoryStream ostream = new MemoryStream()) { using (DeflateStream df = new DeflateStream(ostream, CompressionMode.Compress, true)) { df.Write(inb, 0, inb.Length); } outb = ostream.ToArray(); } return outb; } public static object DecompressData(byte[] inb) { byte[] outb; using (MemoryStream istream = new MemoryStream(inb)) { using (MemoryStream ostream = new MemoryStream()) { using (System.IO.Compression.DeflateStream sr = new System.IO.Compression.DeflateStream(istream, System.IO.Compression.CompressionMode.Decompress)) { sr.CopyTo(ostream); } outb = ostream.ToArray(); } } return DeSerializeXmlBinary(outb); }
Data/Model
The next challenge was to ‘populate’ some CLR objects that mimic a real world scenario. For this purpose the readily available AdventureWorks Community Sample Database for SQL Azure was used. A number of foreign keys were created so that querying for a Product can easily associate and return the corresponding ProductDescription and ProductModel data. In this way it was possible to fashion CLR objects of varying size by including or excluding associated entities from Product instance. ProductDescription was included in the test because it contained a relatively large amount of textual data, hence the probability for a higher compression ratio. The ProductCategory entity is relatively small and with its inclusion we now have a more rounded mix of data sizes to compare.
Figure 1 Data Model
The pre-populated database was created on a SQL Azure instance. An Entity Framework data model was generated from the database to ease the burden of writing data access code and materialization of objects for serialization and compression. In fact all Entity Framework entity types are marked Serializable, thus they meet the sole requirement for an object to be added to Azure AppFabric Cache. LINQ to Entities queries were created to retrieve data from the SQL Azure database. There is one little catch, ensure that LazyLoading is not enabled on the ObjectContext to ensure that unexpected data is not serialized. See this article for more information.
Test
A LoadTest project was created in Visual Studio 2010 to capture and record the number of bytes for a compressed and non-compressed object after accessing either the database or AppFabric Cache. As a preparation phase, the LoadTest project was running in an on-premises hyper-v instance and verified to be operational against the Azure Cloud services. The base hyper-v instance was created and prepared per the guidelines set in the Getting Started with Developing a Service Image for a VM Role. The prepared base VHD was then uploaded to the same Windows Azure data center as the SQL Azure and AppFabric Cache services used for the tests. In this way we have the following topology.
Figure 2 Test Environment
Test Cases
All tests were run in a single thread of operation. The time to retrieve (from SQL Azure or AppFabric Catch) or put (into AppFabric Cache) was measured together with a calculation of the object size in bytes as serialized by the NetDataContractSerializer and compressed by the DeflateStream class. The following tests were run.
- Retrieve Product entity from SQL Azure. Do this for all Products.
- Retrieve Product entity from SQL Azure including the ProductModel. Do this for all Products.
- Retrieve Product entity from SQL Azure including the ProductModel and ProductDescription. Do this for all Products.
- Add all Products in steps 1-3 into AppFabric Cache.
- Add all Products in steps 1-3 into AppFabric Cache after compression.
- Repeat steps 1-5 with ProductCategory objects.
- Get all Products and ProductCategory objects added in steps 5-6. Decompress as appropriate
Results
The test cases were run 3 times. The values were grouped by test case. The average duration in milliseconds and average bytes were computed. Results may vary depending upon the activity in the Window Azure data center. The keys for the tables are as follows.
- ProdCat – A ProductCategory object
- Product: A Product object
- ProdMode: A Product object which includes the ProductModel
- ProdDes: A Product object which includes ProductModel and ProductDescriptions
Data Compression
Figure 3 shows the size of bytes after object serialization for both the compressed and non-compressed bytes. As expected the advantages of compressing the data grows as the size of the object increases. The ProductDescriptions entries in the database have a large text fields and therefore compresses quite well.
Figure 3 Byte Size of Object after Serialization
Time to ‘Get’ and Object
Figure 4 displays the time to retrieve an object from the AppFabric Cache. The AppFabric SDK call executed was DataCache.Get. The chart shows that retrieval performance is similar between the compressed and non-compressed data transfer.
Figure 4 Time to Get an Object from Cache
Time to ‘Add’ and Object
Figure 4 displays the time to put an object of particular type into the AppFabric Cache. The call made was DataCache.Add. From the chart it is apparent that the numbers are quite comparable.
Figure 5 Time to Add an Object to Cache
Conclusion
The tests surprised me because I expected the data compression to have a more negative impact on the overall time to add and get objects from cache. Apparently, transferring less data over the wire offsets the additional weight imposed by the compression algorithm. In some cases there was a 4x compression ratio with tolerable response differences, but at the expense of CPU on the client side. Nevertheless compression of text, as with the product descriptions in this write-up, compress quite well while image or binary data will not. Your millage will vary, thus test first to determine if compression makes sense for your unique application. Overall, I would say that more comprehensive tests are required before a general endorsement for this technique can be made, but the results look promising.
Reviewers : Jaime Alva Bravo, Rama Raman, Curt Peterson
3 Comments
Leave a Reply
You must be logged in to post a comment.














Pingback: Dew Drop – May 29, 2011 | Alvin Ashcraft's Morning Dew
Great work!. The serialization and compression technique was extremely helpful to our application. We were able to fix the ‘Out Of memory’ exceptions caused by the default XML serialization done by WCF while performing Put and Get operations on the DataCache object. Also to consider is implementing the DataCacheFactory as a singleton or atleast throttle/minimize the number of instances created at runtime. This would help in preventing the ‘AppFabric memory low’ errors.
Thanks.
Would using isCompressionEnabled=”true” make any difference to these tests? e.g.