NoSQL (Not Only SQL) is a non relational database that is different from traditional databases. NoSQL is idea for distributed data stores where huge amount of data needs to be stored. Companies like Facebook and Google are using NoSQL in their daily operations. The tables are stored in ASCII files in a NoSQL database.
NoSQL does not use SQL query language. NoSQL is still in its early phases of usage and development. NoSQL is a schema less database that is able to handle large amounts of data. NoSQL focuses on high performance scalable data storage which provides a low-level access to data management layer. This is used for easy interdependency on different programming languages.
NoSQL can be used to handle applications that require a high degree of scalability, data distribution and availability.
The interaction of layers is NoSQL is shown below:
Benefits of NoSQL
- Scalable
- Real Time analysis
- Cheaper and low administration
- Wide variety of data can be stored with the help of NoSQL
- Bulk Upload
- Flexible
- Highly Reliable
- Schema less
- Distributed storage capabilities
Drawbacks of NoSQL
- Data Loss can be an issue
- Cannot use Structured Query Language
- Lack of expertise and trained individuals
- ACID transactions
- No referential integrity
Characteristics of NoSQL
- Does not follow the norms of RDBMS
- Many NoSQL databases are open source and their availability is high
- Can be run on clusters
- In NoSQL, fields can be added to the database without the need to have any change in structure as NoSQL is schema less
- Useful in handling non uniform data
- Big data has created a need to implement data storage using the NoSQL database methods
- It simplifies database access and increasing productivity.
NoSQL Data Models
It has 4 main types of data models:
Key-Value
The key – value data model is the model in which client will get the value by using key field. The user can delete, add or get a value for a key. It uses primary key access for various purposes.
Examples of key value databases include CouchBase, Riak, Project Voldemort, Amazon DynamoDB, HamsterDB etc. Please note that there might be several differences among the Key value databases.
Column-Oriented
These databases are based on the columns and every column has an individuality. Examples include SimpleDB, HBase, Cassandra, BigTable. Data is stored in a column specific file format. Performance on data aggregation queries is very high in column oriented databases.
Document
Document databases are self describing, hierarchical tree data structure that can contain collections and scalar values. Some examples of document databases include XML, JSON, BSON etc. Document databases store values in a key/value store. Document databases contain key value store which are examinable. They also provide indexing and searching capabilities. Document databases have a very different structure which is completely different from a relational database. Example is given below (JSON Data):
{ airline: "air china", hub: "beijing" }, { airline: "air china", hub: "beijing" }, { airline: "air india", hub: "delhi" }, { airline: "aeroflot", hub: "moscow" }
Document databases have high performance and scalability. Some relational database properties like data integrity, locks and transactions are not available here.
Graph
Graph databases include graph structures for semantic queries with nodes, edges and properties to store and get data. These are majorly used to store data and relationships among their entities. An entity is basically a node of a graph which contains the value with its own properties. Relations are represented by edges which can also have their own properties. Graph is very organized and the data stored can be interpreted into various aspects based on their relationships. Traversing the relationships are very easy in graph databases. Some popular graph databases are Neo4J, FlockDB, Infinite Graph etc.
CAP Theorem
CAP theorem states that any distributed system can select 2 aspects from Consistency, High Availability and Partition tolerance as no system can achieve all three at the same time.
In distributed databases, 3 important aspects of the CAP theorem are:
Consistency
Indicates consistent data in the database after execution of an operation.
Availability
Indicates that a system is available without any downtime.
Partition Tolerance
Implies that the system continues to work even after the communication between server gets unreliable