In the past, enterprise storage and data management were driven by the aspiration to be “one-size-fits-all.” These systems were designed to cater to a wide range of workloads and access patterns, in addition to providing interoperability across platforms by complying to standards such as POSIX. Then came Web 2.0, an era of specialized solutions built by Web 2.0 companies designed specifically to solve a problem that otherwise did not have known solutions (especially at scale). The Google Filesystem work is my personal favorite example of a distributed filesystem that was custom designed for specific read-dominated workloads with at-least-once update semantics (instead of exactly-once semantics or byte-ordered replica fidelity). In recent years these Web 2.0 technologies have become hardened open-source initiatives with increasing mainstream adoption, driven in part by the potential of business differentiation using Big Data analytics. Also, somewhere along the way, we dropped the need for interoperability, i.e., POSIX, and essentially opened the floodgate of design permutations that no longer had to adhere to deliver a common set of guarantees to the application developer. While specialized “one-size-fits-one” works well for Web 2.0 companies with an army of engineers at their disposal, is it really going to be the case that a typical SMB customer will be dealing with five to eight different data management solutions (deployed on-premise or in the cloud) for their production workloads? The short answer is that it eventually boils down to the tradeoff between ease of unified manageability (“one-size-fits-many”) versus squeezing the best QoS provided by specialized (“one-size-fits-one”) solutions. The rest of the post is the long answer.
Lets start with the analogy of buying a car— they vary in mileage, horsepower, seating capacity, transmission, safety features, etc. In a similar way, enterprise data solutions are designed to provide different guarantees and tradeoffs. Lets make the discussion concrete by focusing on one such guarantee— data consistency, which I define as consisting of five different dimensions:
- Update Ordering: Defines the granularity at which the storage system will serialize the read and write operations. POSIX actually does not define ordering semantics. There are interesting taxonomy proposals for ordering IO operations on a per-object, per-replica, or the entire namespace.
- Read-Write Coherence: Defines the behavior when concurrent read and write operations are issued for the same record. Leslie Lamport defined a classic taxonomy for wait-free coherence models with three semantic models, namely: Safe, Regular, and Atomic registers.
- Write-Write Serialization: Defines how concurrent write-write operations are handled. POSIX today defines strict mutual exclusion semantics. Relaxed alternatives are Last Writer Wins semantics, Versioned updates, etc.
- Replica Consistency: Defines whether the replicas are updated at the time of acknowledging the write— typically divided into strong, eventual, and weak consistency.
- Transactions Guarantee: This typically is in the context of multi-object operations. Defines whether the storage system supports ACID-like semantics across multiple storage objects. The key aspects are the atomicity of the updates and the isolation guarantee (linearizable, serializable, read repeatable, read committed, read uncommitted). The transactions can be further specialized into read-only transactions, etc.
Back to the car buying analogy— if you need a sports car that goes really fast, you will probably be willing to compromise mileage, cost, seating capacity, etc. It essentially comes down to the tradeoffs that are most appropriate for your use-case. To illustrate, I have pictorially represented typical consistency tradeoffs for three use-cases: Publish-Subscribe (similar to Kafka), Batch Oriented (similar to Hadoop), and Relational SQL database. The inherent assumption in the one-size-fits-one world is that the stronger guarantee will come at the expense of one of the observable parameters namely IOPS, latency, RPO, RTO, $/IOPS, $/GB, IOPS/GB, etc. For instance, if the application can tolerate stale reads for RW Coherence, then a system that ensures atomic freshness will do so at the expense of either higher latency (by checking for the latest updates during the read operation), or employ a continuous polling in the background (generating network chattiness and higher resource usage per IOs served i.e, $/IOPS).
To sum it up, one-size-fits-one was a conscious effort as a part of Web 2.0, and the phenomenon is now at the cusp of mainstream adoption. For a typical enterprise customer, getting multiple systems up-and-running and managing their lifecycle and compliance is highly nontrivial— to say the least. Addressing this pain-point by turning to the cloud is a short-term lucrative option that typically comes with API lock-in (with some exceptions such as Google Big Table using HBase API). In main-stream adoption, the intuition is that unified manageability will trump the need to squeeze the last drop of QoS. The open question is how much of the trade-off is feasible both with regards to technical design as well as customer adoption?