Recently, I’ve been looking at changes that are unfolding for the ways personal users store and manage data, and server applications use, store and manage data. I see some significant trends that radically change the way new types of storage systems are structured.
When we think of storage architectures, we typically separate our thoughts about personal data stored on our local systems, and the big enterprise storage systems used to store all that “mission critical” data. I’m going to walk through some trends that will substantially affect both of these now disparate areas, and the inevitable merger of the systems that store and interact with both types of this data.
In the first installment of this short series, I will discus the changing needs for data in the enterprise, and the growing shift aware from central shared storage systems, which has the potential to radically change the cost of storage.
Enterprise Application Data
Enterprise data systems are designed to reliably and performantly store and retrieve data for applications. Historically, most enterprise application data goes ether into a database or a file store, depending if it is entity data or some sort of document or media object. The storage of these data types is typically on a high end shared storage NAS or SAN device that provides reliable, managed storage.
Application data consistency and reliability
To make it easier for the application developer, databases manage the consistency of the application data entities by providing specific guarantees for correctness and recoverability of the data — using a scheme of ACID transaction properties first described by Jim Gray in the late ’70s. (link: http://en.wikipedia.org/wiki/ACID). ACID promises to provide the following semantics to application developers:
Atomicity. All of the operations in the transaction will complete, or none will.
Consistency. The database will be in a consistent state when the transaction begins and ends.
Isolation. The transaction will behave as if it is the only operation being performed upon the database.
Durability. Upon completion of the transaction, the operation will not be reversed.
These properties have been the hallmark of application data storage and retrieval for the last 30 years, and are the principles by which all major databases operate – including MySQL, Oracle RDBMS, Postgres and other relational transaction systems.
Must we use expensive storage?
Most databases do not themselves make the data reliable, however. That is a task often left to the lower layers of storage, forcing the underlying storage systems to be absolute about guarantees of persistence and super low latency. The premise here is that if we can make the storage 100% reliable and recoverable, then the database can implement consistent and reliable data storage for the application.
Figure.1 Typical SAN for shared storage database
For manageability, we typically also need these systems to be shared (to centralize management) and recoverable (to survive failure of devices and to be able to rewind the clock for those unwanted application data mishaps). As a result, these storage systems are often very expensive, ranging from $2-$15 per gigabyte stored.
Scaling the database
The rigid consistency properties of ACID cause database systems to be hard to scale (up or out across multiple nodes). To maintain a full relation model, the only practical options for scaling are to add more hardware to a single instance of the database, or use a complex shared-storage cluster like Oracle RAC where multiple nodes connect and mediate access to a central shared SAN storage system. Both of these solutions require the storage to be scaled up, and only increase the importance of making the shared storage component super-reliable. The result is further increase in cost and complexity of the storage.
There are a growing number of options for scaling up the database layer. I’ll just touch on it here since while we are discussing the journey to cloud storage, and I’ll discuss these options in another post on database scaling. The main trend that is relevant here is about the changing mode of scaling databases by relaxing one of the three CAP criteria of data. (These were first conjectured by Eric Brewer of UCB, that you can only have at most two of the three properties of Consistency, Availability or Partitionability). Traditional relational databases provide all three properties, resulting in an expensive system. More modern distributed web systems approach database scaling by relaxing one of the three criteria, following a pattern referred to as BASE (basically available, soft state, eventually consistent) where some consistency is traded to allow horizontal scaling.
Caches and data grids replace databases for some (many) types of data in use
The characteristics of many applications we see today increasingly have allowed for a new storage tier between the place where the data is mostly at rest and the layer that is accessing it (i.e. the application runtime). Typically, one or more of the following attributes is present which allows this:
Read-mostly: much of the data is queried repetitively, so the result can be cached
The application can tolerate relaxed consistency: caches may not need to synchronously interlock with update mechanisms in many cases
The data is transient – or it’s life time is not infinite
The data is persisted for performance reasons: it is persisted but can be reconstructed if needed
For these reasons, there is a new set of caching technologies in between the backend store and the application. The first and most popular example is memcached — which provides a simple network based key-value store. Memcache is most commonly used as a cache of results from recent queries to a database, which significantly reduces the load on the database, reducing the need to scale up the backend system. Memcache is typically organized as a write-through cache — so updates are written into both memcache and the backend database, and query logic first checks memcache to see if the result is already available.
The other type of cache that is growing in popularity is a distributed transactional cache. This allows both read and update to go directly to the cache, and updates are destaged by the cache to the backend store. An example of this type of cache is the Gemfire data grid.
Figure.3 Gemfire Data Grid
In another post, I will show some comparisons between the popular caching and data grid engines.
I’m only going to touch on the subject of the attributes, options and architectures of distributed caches here as it pertains to the affect on the overall storage model. I hope to come back and do a deeper comparison of the new types of NoSQL data, and how those accesses map to caching and key-value storage technologies, including Redis, Gemfire and others later.
The main point is that there are new technologies which are increasingly taking advantage of local high performance abstractions (such as memory) for the majority of data-in-use, and then persisting them eventually to a backend store for mostly data-at-rest. A new layer of the hierarchy is appearing — the distributed RAM layer, underneath the application and in front of the data-at-rest service.
Also note that some of the roles of storage are now being separated into these layers. For example, “performance” can be mapped to the layer close to compute (e.g. the caching or local persistence layer), and other key storage capabilities like snapshot/resume are now moving to the lower data-at-rest layer.
Eliminating the need for expensive high performance shared storage
Most of these newer database architectures are specifically designed around the relaxed consistency models — which can allow them to eliminate the need for complex and expensive shared storage architecture. For example, the Casandra database is a scale-out multi-node database which no-longer needs to have shared storage, rather it uses low cost local storage to hold data and has built-in replication between nodes so that can survive a failure of a single host.
The storage systems used for these new database systems are very different to what we’ve seen in the past since they eliminate the need for low level uber-reliable, centralized storage and put the focus on low cost, low latency, high bandwidth local storage in the compute node. We’ll note later that this pattern of simple storage under managed data systems re-emerges consistently, and makes low cost local storage a key storage primitive. This is a significant industry changing event – since it will make traditional SAN an expensive legacy of the past (and in later post’s well see the preduction for NAS too).
Applications use of non-entity storage
While enterprise applications can often store most or all of their data in the database, some applications need to store documents or media, some do-so by embedding that data into their relational model (as binary blob objects) or use a shared NAS system. This data is traditionally a small component of the application’s storage requirements – the priority is typically the transaction throughput capacity and performance of the relational database and underlying storage system.
More modern web applications however have exactly the reverse — here, the majority of storage (measured in bytes) is often that used by media objects, such as images, audio, movies etc. This storage may often start out at low scale being stored in a traditional shared storage system (likely a file system based model, like NFS) — but are increasingly being stored in a large-scale blob oriented cloud-scale storage system.
Figure.4 Growth rate of new vs. legacy data
Figure.4 shows the growth rate of traditional shared SAN storage vs. the amount of storage for all media types. It’s not hard to see that in several years, there will be more blob-style storage managed than traditional entity.
Why not store all my new media in a POSIX file system, like NFS?
The lessons of the last 30 years tell us that a generic abstraction of a hierarchical file system with specific semantics about access can become a universal way of for applications and users to interact with the underlying storage subsystem. The POSIX hierarchical file system as we know it today has become the mainstay of storing and retrieving persistent objects. It provides a namespace that can be used to organize and find the objects, and methods for interacting with the namespace and objects themselves.
The POSIX hierarchical file systems provide fine grained access and specific mandates about access, concurrency, locking. This allows different implementations of file systems to be provided behind a common interface layer, and applications can communicate with different file system implementations without any change to code. There is virtually no restriction on the size, scalability or implementation artifacts of a file system.
So why are the new cloud storage abstractions are becoming so popular — in fact so popular, that you can argue that cloud blob storage is the new utility storage abstraction? The reason is that the stringent constraints of POSIX make the access methods, availability and scale-out of storage a significant challenge – and relaxing these semantics makes it much easier to built a cloud-scale storage solution, that can be accessed from any http connected client.
For example, consider building a NFS storage solution, which mandates semantics about directory access, such that if two applications attempt to create a file with the same name in the same directory, one of the applications will fail. In a single node system, this can be achieved with a simple mutually exclusive lock around the directory operation, but if we want to scale an NFS server to two nodes, then we have a distributed system concurrency problem, requiring the NFS servers on each node to hold distributed locks, requiring complex and hard-to-scale protocols between each server node. If we eliminate the namespace and give each object a unique ID as a name, then this situation can never occur – making it much easier to build a scalable server solution.
The start of a new storage hierarchy
As we can see, the relaxed consistency and many other new data semantics new applications is having a profound affect on the storage hierarchy. I think the common patterns of the future will be based on storage close to the compute, consisting of local disk, RAM and flash. Replication and scaleout will be handled by the upper layers of the data management systems — such as with Casandra or Gemfire. In the future, we can say goodbye to those expensive SAN architectures.
Figure.5 The emerging storage hiearchy
In the next post, I will build the case for cloud storage, covering the economics and architectural foundation for this new type of storage, and predict the some rather controversial futures for existing technologies, including NFS.
I’d like to invite you to comment!