Consistency, Availability and Partition Tolerance:
Why CAP is important –
One can only guarantee 'two out of three' -Consistency, Availability and Partition Tolerance. This is real and evidenced by the most successful websites. If it works for them there is no reason why the same trade-offs shouldn't be considered in everyday design in corporate environments.
If the business explicitly doesn't want to scale then fine, simpler solutions are available, but it's a conversation worth having. In any case these discussions will be about appropriate designs for specific operations.
Examples –
What makes following two products a great example is that they are modern designs and implementations of distributed, shared data systems but with two different philosophies regarding CAP:
Google BigTable is a CA system: it is strongly consistent and highly available, but can be unavailable under network partitions. BigTable has no replication at the database level, rather replication is handled underneath by GFS. (Any comments on Google Fusion Tables?)
Amazon Dynamo is an AP system: it is highly available, even under network partitions, but eventually consistent. Data is replicated within a single cluster, so even under partitions most data is available, however one node’s latest version might not match that of another, so every reader is only guaranteed to see every write eventually.
Original Presentation by Dr. Eric A. Brewers:
CAP Theorem (Good explanation):
Possible CAP Solution?:
Comments?
Applications that need to avoid collisions must generate unique timestamps themselves to achieve consistency. Different versions of a cell are stored in decreasing timestamp order, so that the most recent versions can be read.
ReplyDeleteChubby Lock management to stay coordinated which is clustered bust still single point of failure due to network.
GFS imparts all kinds of performance and availability advantages without costly RAID arrays, that’s how google can build big database with very small expenses and still scale to petabytes and thousands of servers. This is where google wins with Amazon as versions are consistent throughout the cluster, only 2 latest versions are kept.
I also read something about these database are forever, don’t know what google is planning.
Oracle still wins for commercial products which is more robust over the period of time but does not disk the local disk type, they are also building non-rdbms which might rule the world !!