What is NoSQL?

Samit Baghla

6 years ago

NoSQL databases are highly scalable and flexible database management systems which allow you to store and process unstructured as well as semi-structured data which is not possible through RDBMS tools.

What was the problem earlier and how NoSQL is resolving it?

In order to better understand What is NoSQL, we should compare it with Relational databases :

Criteria	Relational Database Management	NoSQL Database Management
Data model	Tables and schemas	Partition Keys to retrieve data
ACID properties	Strictly followed	No strict adherence
Scalability	Vertical scalability	Horizontal scalability
Data manipulation	Using queries in SQL and executed by RDBMS	Using object-based APIs
Velocity of data	Moderate	Very high
Suitability	Structured data	Structured, semi-structured and unstructured data

NoSQL is agile because it does not create schemas nor it statically defines the data models
Instead of tables it uses objects, collections and nested collections
Deployed over multiple cheap Intel-based servers
Immediate failover with the help of uni-directional and bi-directional replication of data
Equipped with the big data, cloud, mobile and web technologies
Trades conventional ACID properties to incorporate more flexibility and agility.

Since relational databases were developed many years ago when there was no internet and digitization was meant to be deployed on single big server. However with the advent of internet and digital economy this technology fell short in fulfilling the dynamic requirements and that is when NoSQL systems came into limelight.

Initially when the applications used relational databases, the developers found difficulties in matching the data structures supported by the two platforms. They had to convert the in-memory data structures into relational ones in order to transfer the data to and from the database. This reduced the agility and performance of the systems in a big way.

Ways to deploy NoSQL databases

It can be deployed in four different manners :

Columnar Databases – Reads and writes columns of data rather than the rows. Each column is comparable to a container in RDBMS where a Key defines a row and single row has multiple columns. . Database under Column Based are Accumulo, Cassandra, Druid, HBase, Vertica

Document Databases – These databases store and retrieve semi-structured data in the format of documents such as XML, JSON, etc. Some of the popular document databases like MongoDB provide a rich query language for ease of access and smooth transition of data models. Database under Document Based are Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB

Graph Databases – Stores data as entities and relations between them allowing faster traversal and joining operations to be performed. However these graphs can be built using SQL as well as NoSQL databases. Database under Graph Based are AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso

Key-Value Stores- Suitable for read-heavy workloads and compute-intensive workloads, these databases store critical data in memory which in turn improves the performance of the systems. Database for Key-Value Stores Nosql are Aerospike, Apache Ignite, ArangoDB, Berkeley DB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, SciDB, SDBM/Flat File dbm, ZooKeeper

The History of Cassandra

Apache Cassandra was developed at Facebook to power their Inbox Search feature by Avinash Lakshman (one of the authors of Amazon’s Dynamo) and Prashant Malik. It was released as an open source project on Google code in July 2008. In March 2009, it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project.

Releases after graduation include:

0.6, released Apr 12 2010, added support for integrated caching, and Apache Hadoop MapReduce
0.7, released Jan 08 2011, added secondary indexes and online schema changes
0.8, released Jun 2 2011, added the Cassandra Query Language (CQL), self-tuning memtables, and support for zero-downtime upgrades
1.0, released Oct 17 2011, added integrated compression, leveled compaction, and improved read performance
1.1, released Apr 23 2012, added self-tuning caches, row-level isolation, and support for mixed ssd/spinning disk deployments
1.2, released Jan 2 2013, added clustering across virtual nodes, inter-node communication, atomic batches, and request tracing
2.0, released Sep 4 2013, added lightweight transactions (based on the Paxos consensus protocol), triggers, improved compactions, CQL paging support, prepared statement support, SELECT column alias support
2.1 released Sep 10 2014
2.2 released July 20, 2015
3.0 released November 11, 2015
3.1 through 3.10 releases were monthly releases using a tick-tock-like release model, with even-numbered releases providing both new features and bug fixes while odd-numbered releases will include bug fixes only
3.11 released June 23, 2017 as a stable 3.11 release series and bug fix from the last tick-tock feature release

Apache Cassandra

Apache Cassandra is a free and open-source distributed wide column store NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous master less replication allowing low latency operations for all clients.

Features and Benefits

Written in Java ( Cassandra 1.x – Java 6 or later ; Cassandra 2.x – Oracle Java 7 or later )
Blend model of Amazon’s DynamoDB and Google’s BigTable
Flexible Column-Family Data Model ( BigTable Feature)
De-centralized and Distributed ( DynamoDB’s Feature)
Peer to Peer Architecture
Multi-data center replication
Location Transparent
Cloud-enabled
Fault-Tolerant
No single point of failure
Designed to handle the failures
Failover happens automatically or manually
Elastic and Linearly Scalable
High Performance
Built-in data compression ( Google’s Snappy Algo.)
Built-in caching layer
Write optimised
Tunable Consistency
Choices from a very strong consistency to different levels of eventual consistency are governed by CAP Theorem which states the trade-off between CAP feature.
CQL
Open Source ( Apache Software Foundation ) ( For commercial versions – DataStax)

When Not to Use Apache Cassandra

❖ Lack of full ACID transaction support
❖ Lack of JOIN support and Range query(Search Method)
❖ Lack of enterprise security support ( Transparent Data encryption,
Kerberos and LDAP integration not available)
❖ Anti-USE CASES:
❖ Applications handling business-critical data
❖ Applications requiring transactional commit/rollback capabilities
❖ Applications requiring granular level of security features