Wednesday, March 10, 2010

NCacheD (Part 1)

 

NCacheD (pronounced N-cache-D) is live on CodePlex at http://NCacheD.codeplex.com. The original intent of this project was to create a very simple, bare bones distributed cache system. It was meant to be an exploration into how such a system works and how it could be implemented in .NET. And to provide a skeleton for a more powerful cache system in the future.

As far as distributed cache systems go, Memcached and Velocity are the main options in .NET and NCacheD is intended to be another alternative. Memcached is known for it's raw speed and Velocity (or I think it's now called AppFabric caching) seems more intended for cloud computing (this might be inaccurate but at this time it's still only RC.)

Memcached gets it's speed from being small and being written in C. That has it's advantages for sure but for a .NET shop and .NET developers it can also be problematic. It can be hard for non-C developers to understand exactly what's going on with C code. And it can be even harder to modify and fit to your specific needs. If you want to run the code as-is, I think this is fine but if you want to integrate it into your system and especially into your code base, you want to use the same platform.

When I started thinking about how a distributed cache system must work I saw two important pieces or hurdles to overcome: the cache server must know the size of the objects it's holding so that the total size can be controlled to ensure as little paging as possible (we want to keep it all in RAM), and it must ensure that different clients will communicate with the same server for the same key value. If client 1 has an object with key "products2010" on server 1 and client two has the same object with key "products2010" then not only are we wasting space, but we run into serious problems when client 1 updates the cache and client 2 is unaware of any changes to "products2010".

Determining the size of an object in C# is not a trivial task. Because it is all managed code, there is no way of counting the bytes on the stack or heap. But I realized that determining the size of the objects in a distributed cache system would be pretty easy. In order to send the objects over the wire, they must first be serialized. And the length of the resultant byte array was in effect, the "size" of the object. I could also store the object as a byte array on the server since the serve would never care about the actual type of the object.

Ensuring that every cached item's key goes to the same server every time was also not too difficult. NCacheD uses a pretty simple method for doing this. It will check the total available size of each server (assuming all clients use all the same servers) and will "weigh" the servers by their sizes. It will then create a small lookup table of servers giving out slots based on weight. And when a request is made to determine the server by the key, the key is hashed, modulo, and used to determine the particular server.

With these two parts figured out, the next step was to create a simple cache class for the server (which turned out to be a wrapper around a ConcurrentDictionary<>), a client/server networking system for which I used WCF to keep it simple, and then the rest was plumbing which was greatly simplified by the TPL (Task Parallel Library.)

Next time I'll talk more about how the server is determined and the way the cache itself works.

Thursday, February 18, 2010

Consolidation, the Lost Art

While working with the Google AdWords API, I discovered a forgotten principle (or lost art) of software engineering. It is called “Consolidation”. This principle, I believe, must have somehow fallen through the cracks when people started talking about DRY, KISS, SRP, etc. But I believe it will revolutionize the way we write programs.

Up till now it was commonly thought that breaking logic apart into easily manageable pieces was a good way to build software. “Divide and conquer,” we said. We thought that having well-named, meaningful blocks of code was the best approach to take for creating flexible, maintainable applications. We always thought that two very simple lines of code would always be better than six that did the same thing.

We thought wrong.

The following description of “Consolidation” is taken from http://code.google.com/apis/adwords/articles/migrating.html#consolidation

image

Monday, January 25, 2010

Inspiration

 

"the first chess program put into practice was designed by legendary British mathematician Alan Turing in 1952, and he didn't even have a computer! He processed the algorithm on pieces of paper and this "paper machine" played a competent game." - Garry Kasparov