Wire Speed Storage | Dataram Blog

~written by Jason Caulkins, Dataram Storage Blog Team Member

In my last blog I discussed how storage would evolve into something that looks a lot like main memory today. The challenge of storing data inside a server (for use in a scale-out datacenter, for example), is that you need to maintain data coherency, availability, performance and scalability, despite the inevitable hardware, communications and human errors (oh, yeah – and don’t forget cost and usability).

This is a very tall order.

So, while we cannot solve this equation today, we can at least break it down into some byte-sized bits.

Let’s start with coherence and performance. These are nearly mutually exclusive in a large, geographically disperse environments. The issue is really about latency. In addition to the latencies associated with connection speeds between remote sites, there are other, programmatic latencies to deal with. These are associated with ensuring atomicity.

Today, a file or block of data is locked when accessed by a node or application, to avoid changes and data corruption if another node or application is trying to access the same data at the same time. While this ensures coherency, the performance of the waiting node can suffer greatly, depending on how long the first application holds the lock and the efficiency of the locking mechanism itself. In the worst cases, you can end up in perpetual locks, which is a bad thing.

A better approach is to implement a journaling system where both nodes or applications are allowed to view/modify the data at the same time, but keep track of the changes so that another process can check for collisions, and roll back the changes as required. The idea here is that collisions are rare, so why pay the expensive locking penalty for every transaction? The down side of the journaling system is that it takes up storage space keeping track of redundant data changes. However, all one needs to do is compare the price of capacity vs. the price/benefit of performance. For critical transactions (stock trades, financial transactions), the collision detection/repair mechanism must find and fix collisions before a transaction can be committed, adding yet another variable to the equation.

The fundamental issue here is tradeoff. Physics and economics are at odds. There is no free lunch. In order to be low latency, you have to pay for things like solid state and very fast connections between sites. These systems have a great price/performance metric, but this makes the price/GB attribute go way up. If you just need bulk, local storage, slow, cheap, high-capacity mechanical drives provide the best price/GB, but have terrible price/performance metrics.

As long as there are different storage workloads and different fundamental storage technologies, it makes sense to tailor the storage system to the workload.

To further muddy the waters, storage workloads are vastly different in the same compute environment. Even the same applications change their workload characteristics depending on the number of users, the amount if non-storage system resources, and even on version of the application being used.

So the trick to solving this problem is to create a storage environment that has all the characteristics required to address the various workload, performance and economic challenges presented by the applications. This means that an advanced storage infrastructure must have elements of high price/performance as well as elements of low price per capacity. It must be intelligent enough to dynamically assign resources as the workloads demand, and must support the ability to modularly add performance elements as well as capacity elements not only on demand, but in a predictive manner, so that the system always provides just the right price/performance for a dynamic application workload driven by ever-changing business workloads.