Normalization is one of the key concepts in Data Base Management System (DBMS). After an initial design has been developed for a database structure, we need to refine it for further development. This step of refining our initial database is known as Normalization in DBMS.
Introduction to Normalization
The key idea of Normalization in DBMS is to reduce or eliminate the chances of redundancy (i.e., repetition of useful data) which may otherwise lead to multiple versions of the same data. Hence, Normalization forms the basis for designing of tables in a Relational Data Base Management System (RDBMS).
Normalization was first developed in 1972 by E.F. Codd. Normalization is based on a concept called Normal Forms (i.e., NFs).
Normalization is basically a process which involves two steps. These are:
- Put the data into a tabular form so as to eliminate repeating groups.
- Remove any kind of duplication from the related tables.
Normalization in DBMS- forms
A table is said to be in a particular normal form (NF) only if it satisfies a set of conditions (constraints) for that normal form and is already in the previous normal form.
There are currently five normal forms that are defined in DBMS. These are 1NF, 2NF, 3NF, 4NF and 5NF (where NF stands for Normal Form). Generally, a database must essentially follow first three normal forms. The 4NF and 5NF are rarely followed due to performance reasons and over optimization.
1NF, 2NF, 3NF were proposed by E.F. Codd. But later on, a stronger definition of 3NF is known as BCNF was proposed by Boyce and EF Codd hence the name BCNF (Boyce Codd Normal Form).You can go ahead and learn about all the normal forms in detail one by one in the articles that follow.
Need of Normalization in DBMS
Normalization of data is required so as to remove database anomalies. Anomalies are glitches in data base which lead to errors during insertion, modification, and/or deletion of data in tables.
Three types of anomalies that occur are as follows:
- Insertion Anomaly
- Modification Anomaly
- Deletion Anomaly
To better understand these anomalies let us take a sample database.
In simple terms, Insertion Anomaly is a problem when we are not allowed to enter data into our table until all non-related data is also not stored into the table. To explain it further let us take the help of the above example.
Here if we want to add a new sport to the above table, we will not be able to do so until we have a student who takes up that new sport. This is because of the fact that we will not be allowed to add data until we have a valid student number attached to it.
In simple terms, Modification/Update Anomaly is a problem which arises when we modify data in our table but, due to repetition of data in tables, data at some places reflect old values leading to ambiguity of data. To explain it further let us take the help of the same example.
Suppose we want to modify the fees of the sport “swimming” for students and increase it to Rs 750. Now, if we observe there are two records having the sport column values “swimming”. Say we update only the record of student with stu_no 105 then there will be an ambiguity in future about the fee of the sport.
Deletion Anomaly arises when we perform a delete operation on tables. In this other useful data related to the tuple also gets deleted which we would have otherwise needed.
For example, in the above table if a student with student id 244 quits his/her respective sport and we issue a delete command, then the information regarding the fees of that course (Squash in this case) will also get deleted. So in future we might not get information about the fees of this course when a new student might opt for it.
To avoid all these anomalies, the process of Normalization is carried out on data.
Advantages of Normalization
Normalization has a number of advantages. Some of them are as follows:
- Normalization helps eliminate or minimise data redundancy.
- It helps in better understanding of relations and data stored in them thereof.
- Data structure is easier to maintain.
- Normalization is a reversible process; hence no information is lost during transformation.
- It helps in efficient use of storage space.
- It provides flexibility of data due to efficient structuring.
Disadvantages of Normalization in DBMS
Some disadvantages of Normalization in DBMS are as follows:
- You should have good knowledge of your users’ requirements.
- Normal Forms of higher end i.e., 4NF and 5NF have performance issues.
- It is a time consuming and a complicated process to follow. You require a good amount of experience to create and optimally normalized database.
- Decomposition of data needs to be done carefully else it may lead to a poor database design.