Wednesday, March 10, 2010

NCacheD (Part 1)

 

NCacheD (pronounced N-cache-D) is live on CodePlex at http://NCacheD.codeplex.com. The original intent of this project was to create a very simple, bare bones distributed cache system. It was meant to be an exploration into how such a system works and how it could be implemented in .NET. And to provide a skeleton for a more powerful cache system in the future.

As far as distributed cache systems go, Memcached and Velocity are the main options in .NET and NCacheD is intended to be another alternative. Memcached is known for it's raw speed and Velocity (or I think it's now called AppFabric caching) seems more intended for cloud computing (this might be inaccurate but at this time it's still only RC.)

Memcached gets it's speed from being small and being written in C. That has it's advantages for sure but for a .NET shop and .NET developers it can also be problematic. It can be hard for non-C developers to understand exactly what's going on with C code. And it can be even harder to modify and fit to your specific needs. If you want to run the code as-is, I think this is fine but if you want to integrate it into your system and especially into your code base, you want to use the same platform.

When I started thinking about how a distributed cache system must work I saw two important pieces or hurdles to overcome: the cache server must know the size of the objects it's holding so that the total size can be controlled to ensure as little paging as possible (we want to keep it all in RAM), and it must ensure that different clients will communicate with the same server for the same key value. If client 1 has an object with key "products2010" on server 1 and client two has the same object with key "products2010" then not only are we wasting space, but we run into serious problems when client 1 updates the cache and client 2 is unaware of any changes to "products2010".

Determining the size of an object in C# is not a trivial task. Because it is all managed code, there is no way of counting the bytes on the stack or heap. But I realized that determining the size of the objects in a distributed cache system would be pretty easy. In order to send the objects over the wire, they must first be serialized. And the length of the resultant byte array was in effect, the "size" of the object. I could also store the object as a byte array on the server since the serve would never care about the actual type of the object.

Ensuring that every cached item's key goes to the same server every time was also not too difficult. NCacheD uses a pretty simple method for doing this. It will check the total available size of each server (assuming all clients use all the same servers) and will "weigh" the servers by their sizes. It will then create a small lookup table of servers giving out slots based on weight. And when a request is made to determine the server by the key, the key is hashed, modulo, and used to determine the particular server.

With these two parts figured out, the next step was to create a simple cache class for the server (which turned out to be a wrapper around a ConcurrentDictionary<>), a client/server networking system for which I used WCF to keep it simple, and then the rest was plumbing which was greatly simplified by the TPL (Task Parallel Library.)

Next time I'll talk more about how the server is determined and the way the cache itself works.