Normalization and denormalization are the methods used in databases. The terms are differentiable where Normalization is a technique of minimizing the insertion, deletion and update anomalies through eliminating the redundant data. On the other hand, Denormalization is the inverse process of normalization where the redundancy is added to the data to improve the performance of the specific application and data integrity.
Normalization prevents the disk space wastage by minimizing or eliminating the redundancy.
Content: Normalization Vs Denormalization
Comparison Chart
Basis for comparison | Normalization | Denormalization |
---|---|---|
Basic | Normalization is the process of creating a set schema to store non-redundant and consistent data. | Denormalization is the process of combining the data so that it can be queried speedily. |
Purpose | To reduce the data redundancy and inconsistency. | To achieve the faster execution of the queries through introducing redundancy. |
Used in | OLTP system, where the emphasize is on making the insert, delete and update anomalies faster and storing the quality data. | OLAP system, where the emphasis is on making the search and analysis faster. |
Data integrity | Maintained | May not retain |
Redundancy | Eliminated | Added |
Number of tables | Increases | Decreases |
Disk space | Optimized usage | Wastage |
Definition of Normalization
Normalization is the method of arranging the data in the database efficiently. It involves constructing tables and setting up relationships between those tables according to some certain rules. The redundancy and inconsistent dependency can be removed using these rules in order to make it more flexible.
Redundant data wastes disk space, increases data inconsistency and slows down the DML queries. If the same data is present in more than one place and any updation is committed on that data, then the change must be reflected in all locations. Inconsistent data can make data searching and access harder by losing the path to it.
There are various reasons behind performing the normalization such as to avoid redundancy, updating anomalies, unnecessary coding, keeping the data into the form that can accommodate change more easily and accurately and to enforce the data constraint.
Normalization includes the analysis of functional dependencies between attributes. The relations (tables) are decomposed with anomalies to generate relations with a structure. It helps in deciding which attributes should be grouped in a relation.
The normalization is basically based on the concepts of normal forms. A relation table is said to be in a normal form if it fulfils a certain set of constraints. There are 6 defined normal forms: 1NF, 2NF, 3NF, BCNF, 4NF and 5NF. Normalization should eliminate the redundancy but not at the cost of integrity.
Definition of Denormalization
Denormalization is the inverse process of normalization, where the normalized schema is converted into a schema which has redundant information. The performance is improved by using redundancy and keeping the redundant data consistent. The reason for performing denormalization is the overheads produced in query processor by an over-normalized structure.
Denormalization can also be defined as the method of storing the join of superior normal form relations as a base relation, which is in a lower normal form. It reduces the number of tables, and complicated table joins because a higher number of joins can slow down the process. There are various denormalization techniques such as: Storing derivable values, pre-joining tables, hard-coded values and keeping details with master, etc.
Here the denormalization approach, emphasizes on the concept that by placing all the data in one place, could eliminate the need of searching those multiple files to collect this data. The basic strategy is followed in denormalization is, where the most ruling process is selected to examine those modifications that will ultimately improve the performance. And the most basic alteration is that adding multiple attributes to the existing table to reduce the number of joins.
Key Differences Between Normalization and Denormalization
- Normalization is the technique of dividing the data into multiple tables to reduce data redundancy and inconsistency and to achieve data integrity. On the other hand, Denormalization is the technique of combining the data into a single table to make data retrieval faster.
- Normalization is used in OLTP system, which emphasizes on making the insert, delete and update anomalies faster. As against, Denormalization is used in OLAP system, which emphasizes on making the search and analysis faster.
- Data integrity is maintained in normalization process while in denormalization data integrity harder to retain.
- Redundant data is eliminated when normalization is performed whereas denormalization increases the redundant data.
- Normalization increases the number of tables and joins. In contrast, denormalization reduces the number of tables and join.
- Disk space is wasted in denormalization because same data is stored in different places. On the contrary, disk space is optimized in a normalized table.
Conclusion
Normalization and denormalization are useful according to the situation. Normalization is used when the faster insertion, deletion and update anomalies, and data consistency are necessarily required. On the other hand, Denormalization is used when the faster search is more important and to optimize the read performance. It also lessens the overheads created by over-normalized data or complicated table joins.
Bilal Korir says
Thank you 🙂 I like the yellow section of your article. Good job guys
Riko says
Very clear and straight to the point. Thanks!!
shikha gupta says
Thanks for the awesome article. Very insightful!
SUJATHA CHITTINENI says
Thank You. Very clear and detailed explanation .
kasya says
Thank you very for the clear, precise informatoion!