Rules of Data Normalization

It’s been quite long now, we never looked at the RDBMS basics. I guarantee most of us can’t tell all the normalization rules at a single go even after having years and years of experience in any database. I myself is an example to this, I don’t remember all the 12 CODD rules. So lets get started with the topic. I would be giving examples till 3NF because these three NFs are very popular and important.

1NF: [Eliminate Repeating Groups]

  • Create a separate table for each set of related attributes, and assign a primary key to them
EmpID Name Skills
1 Dhaval Vora Php, MySQL
2 Khushbu Shah .Net, SQL Server

 

EmpID Name
1 Dhaval Vora
2 Khushbu Shah

 

SkillID EmpID SkillName
1 1 Php
2 2 SQL Server
3 2 .Net
4 1 MySQL

2NF: [Eliminate Redundant Data]

  • It must satisfy 1NF
  • If an attribute dependents on only part of a multivalued key, then move them to a separate table. In simple a mapping table.
  •  Here skills are multivalued key, means it would be repeated in the 3rd table above.
SkillID EmpID SkillName
1 1 Php
2 2 SQL Server
3 2 .Net
4 1 MySQL

 

SkillID SkillName
1 Php
2 SQL Server
3 .Net
4 MySQL

 

SkillEmpID SkillID EmpID
1 1 1
2 4 1
3 2 2
4 3 2

3NF: [Eliminate Columns Not Dependent On Key]

  • Must satisfy 2NF.
  • If an attribute do not contribute to a description of the key or can be independent then move them out to a different tables.
  • Best example is zip code, city, state and country. These attributes are most unique and can never be repeated or changed. So they do not contribute anything to EmpID. Means they are not dependent on EmpID, but EmpID is dependent on them.
EmpID Name ZipCode Area City State
1 Dhaval Vora 380009 CG Road Ahmedabad Gujarat
2 Khushbu Shah 380054 Thaltej Ahmedabad Gujarat

 

LocID ZipCode Area City State
10 380009 CG Road Ahmedabad Gujarat
20 380054 Thaltej Ahmedabad Gujarat

 

EmpID Name LocID
1 Dhaval Vora 10
2 Khushbu Shah 20

BCNF: [Boyee-Codd Normal Form]

  • Must satisfy 3NF.
  • If there are no significant dependencies between candidate attributes, then move them out into a different tables.
  • Note: Any relation is in 3NF is also in BCNF.
  • BCNF is very specific, when 3NF misses inter dependencies between non-key attributes.

4NF: [Isolate independent Multiple Relationships]

  • No table should contain two or more 1:N or N:M relationships that are not directly related.

 5NF: [Isolate Semantically Related Multiple Relationships]

  • There may be practical constraints  on information that justifying logically related M:M relationships.

ONF: [Optimal Normal Form]

  • A model limited to only simple facts.

 DKNF: [Domain Key Normal Form]

  • This level is simply a model taken to the point, where there are no opportunities for modifications.

Note: In this BigData and NoSQL days, we like to keep the data in de-normalized form for better get and set operation. I am sure you must have come across where you had to de-normalize the data to improve the performance. We will discuss about de-normalization in coming posts.

 

Prasad Sahoo

Prasad Sahoo, is an enthusiast of Microsoft technologies in general and a passionate database professional, mainly focusing on SQL Server performance tuning and business intelligence. He has contributed a long time of his career working on SQL Server and other RDBMS. If he is not busy with SQL stuff then he must be watching some Hollywood movies.

Leave a Reply

%d bloggers like this: