NoSQL - CSVeda

NoSQL (Not Only SQL) is a non relational database that is different from traditional databases. NoSQL is idea for distributed data stores where huge amount of data needs to be stored. Companies like Facebook and Google are using NoSQL in their daily operations. The tables are stored in ASCII files in a NoSQL database.

NoSQL does not use SQL query language. NoSQL is still in its early phases of usage and development. NoSQL is a schema less database that is able to handle large amounts of data. NoSQL focuses on high performance scalable data storage which provides a low-level access to data management layer. This is used for easy interdependency on different programming languages.

NoSQL can be used to handle applications that require a high degree of scalability, data distribution and availability.

The interaction of layers is NoSQL is shown below:

Benefits of NoSQL

Scalable
Real Time analysis
Cheaper and low administration
Wide variety of data can be stored with the help of NoSQL
Bulk Upload
Flexible
Highly Reliable
Schema less
Distributed storage capabilities

Drawbacks of NoSQL

Data Loss can be an issue
Cannot use Structured Query Language
Lack of expertise and trained individuals
ACID transactions
No referential integrity

Characteristics of NoSQL

Does not follow the norms of RDBMS
Many NoSQL databases are open source and their availability is high
Can be run on clusters
In NoSQL, fields can be added to the database without the need to have any change in structure as NoSQL is schema less
Useful in handling non uniform data
Big data has created a need to implement data storage using the NoSQL database methods
It simplifies database access and increasing productivity.

NoSQL Data Models

It has 4 main types of data models:

Key-Value

The key – value data model is the model in which client will get the value by using key field. The user can delete, add or get a value for a key. It uses primary key access for various purposes.

Examples of key value databases include CouchBase, Riak, Project Voldemort, Amazon DynamoDB, HamsterDB etc. Please note that there might be several differences among the Key value databases.

Column-Oriented

These databases are based on the columns and every column has an individuality. Examples include SimpleDB, HBase, Cassandra, BigTable. Data is stored in a column specific file format. Performance on data aggregation queries is very high in column oriented databases.

Document

Document databases are self describing, hierarchical tree data structure that can contain collections and scalar values. Some examples of document databases include XML, JSON, BSON etc. Document databases store values in a key/value store. Document databases contain key value store which are examinable. They also provide indexing and searching capabilities. Document databases have a very different structure which is completely different from a relational database. Example is given below (JSON Data):

        {
                airline: "air china",
                hub: "beijing"
        },
        {
                airline: "air china",
                hub: "beijing"
        },
        {
                airline: "air india",
                hub: "delhi"
        },
        {
                airline: "aeroflot",
                hub: "moscow"
        }

Document databases have high performance and scalability. Some relational database properties like data integrity, locks and transactions are not available here.

Graph

Graph databases include graph structures for semantic queries with nodes, edges and properties to store and get data. These are majorly used to store data and relationships among their entities. An entity is basically a node of a graph which contains the value with its own properties. Relations are represented by edges which can also have their own properties. Graph is very organized and the data stored can be interpreted into various aspects based on their relationships. Traversing the relationships are very easy in graph databases. Some popular graph databases are Neo4J, FlockDB, Infinite Graph etc.

CAP Theorem

CAP theorem states that any distributed system can select 2 aspects from Consistency, High Availability and Partition tolerance as no system can achieve all three at the same time.

In distributed databases, 3 important aspects of the CAP theorem are:

Consistency

Indicates consistent data in the database after execution of an operation.

Availability

Indicates that a system is available without any downtime.

Partition Tolerance

Implies that the system continues to work even after the communication between server gets unreliable