NoSQL databases are highly scalable and flexible database management systems which allow you to store and process unstructured as well as semi-structured data which is not possible through RDBMS tools.
What was the problem earlier and how NoSQL is resolving it?
In order to better understand What is NoSQL, we should compare it with Relational databases :
Criteria | Relational Database Management | NoSQL Database Management |
Data model | Tables and schemas | Partition Keys to retrieve data |
ACID properties | Strictly followed | No strict adherence |
Scalability | Vertical scalability | Horizontal scalability |
Data manipulation | Using queries in SQL and executed by RDBMS | Using object-based APIs |
Velocity of data | Moderate | Very high |
Suitability | Structured data | Structured, semi-structured and unstructured data |
- NoSQL is agile because it does not create schemas nor it statically defines the data models
- Instead of tables it uses objects, collections and nested collections
- Deployed over multiple cheap Intel-based servers
- Immediate failover with the help of uni-directional and bi-directional replication of data
- Equipped with the big data, cloud, mobile and web technologies
- Trades conventional ACID properties to incorporate more flexibility and agility.
Since relational databases were developed many years ago when there was no internet and digitization was meant to be deployed on single big server. However with the advent of internet and digital economy this technology fell short in fulfilling the dynamic requirements and that is when NoSQL systems came into limelight.
Initially when the applications used relational databases, the developers found difficulties in matching the data structures supported by the two platforms. They had to convert the in-memory data structures into relational ones in order to transfer the data to and from the database. This reduced the agility and performance of the systems in a big way.
Ways to deploy NoSQL databases
It can be deployed in four different manners :
Columnar Databases – Reads and writes columns of data rather than the rows. Each column is comparable to a container in RDBMS where a Key defines a row and single row has multiple columns. . Database under Column Based are Accumulo, Cassandra, Druid, HBase, Vertica
Document Databases – These databases store and retrieve semi-structured data in the format of documents such as XML, JSON, etc. Some of the popular document databases like MongoDB provide a rich query language for ease of access and smooth transition of data models. Database under Document Based are Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB
Graph Databases – Stores data as entities and relations between them allowing faster traversal and joining operations to be performed. However these graphs can be built using SQL as well as NoSQL databases. Database under Graph Based are AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso
Key-Value Stores- Suitable for read-heavy workloads and compute-intensive workloads, these databases store critical data in memory which in turn improves the performance of the systems. Database for Key-Value Stores Nosql are Aerospike, Apache Ignite, ArangoDB, Berkeley DB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, SciDB, SDBM/Flat File dbm, ZooKeeper
The History of Cassandra
Apache Cassandra was developed at Facebook to power their Inbox Search feature by Avinash Lakshman (one of the authors of Amazon’s Dynamo) and Prashant Malik. It was released as an open source project on Google code in July 2008. In March 2009, it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project.
Releases after graduation include:
- 0.6, released Apr 12 2010, added support for integrated caching, and Apache Hadoop MapReduce
- 0.7, released Jan 08 2011, added secondary indexes and online schema changes
- 0.8, released Jun 2 2011, added the Cassandra Query Language (CQL), self-tuning memtables, and support for zero-downtime upgrades
- 1.0, released Oct 17 2011, added integrated compression, leveled compaction, and improved read performance
- 1.1, released Apr 23 2012, added self-tuning caches, row-level isolation, and support for mixed ssd/spinning disk deployments
- 1.2, released Jan 2 2013, added clustering across virtual nodes, inter-node communication, atomic batches, and request tracing
- 2.0, released Sep 4 2013, added lightweight transactions (based on the Paxos consensus protocol), triggers, improved compactions, CQL paging support, prepared statement support, SELECT column alias support
- 2.1 released Sep 10 2014
- 2.2 released July 20, 2015
- 3.0 released November 11, 2015
- 3.1 through 3.10 releases were monthly releases using a tick-tock-like release model, with even-numbered releases providing both new features and bug fixes while odd-numbered releases will include bug fixes only
- 3.11 released June 23, 2017 as a stable 3.11 release series and bug fix from the last tick-tock feature release
Apache Cassandra
Apache Cassandra is a free and open-source distributed wide column store NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous master less replication allowing low latency operations for all clients.
Features and Benefits
- Written in Java ( Cassandra 1.x – Java 6 or later ; Cassandra 2.x – Oracle Java 7 or later )
- Blend model of Amazon’s DynamoDB and Google’s BigTable
- Flexible Column-Family Data Model ( BigTable Feature)
- De-centralized and Distributed ( DynamoDB’s Feature)
- Peer to Peer Architecture
- Multi-data center replication
- Location Transparent
- Cloud-enabled
- Fault-Tolerant
- No single point of failure
- Designed to handle the failures
- Failover happens automatically or manually
- Elastic and Linearly Scalable
- High Performance
- Built-in data compression ( Google’s Snappy Algo.)
- Built-in caching layer
- Write optimised
- Tunable Consistency
- Choices from a very strong consistency to different levels of eventual consistency are governed by CAP Theorem which states the trade-off between CAP feature.
- CQL
- Open Source ( Apache Software Foundation ) ( For commercial versions – DataStax)
When Not to Use Apache Cassandra
- ❖ Lack of full ACID transaction support
- ❖ Lack of JOIN support and Range query(Search Method)
- ❖ Lack of enterprise security support ( Transparent Data encryption,
- Kerberos and LDAP integration not available)
- ❖ Anti-USE CASES:
- ❖ Applications handling business-critical data
- ❖ Applications requiring transactional commit/rollback capabilities
- ❖ Applications requiring granular level of security features