Normalization Concept

 

The concept of normalization was created by Edgar F. Codd, the founder of the relational database model. During the design of the tables, we can achieve performance improvement by using normalization at an appropriate level. However, data consistency and integrity are maintained.

Normalization procedures consist of certain levels. It does not go to the other level before the operations in one level are completed.

Table and column based operations are performed at each level in order to maintain more atomic data. Basically there are 6 normalization levels. However, we will describe the 3 most commonly used levels of normalization.

Normalization Level 1:

Normalization Level 1 can be considered as the lowest level. For a database to have normalization at this level, the following rules must be followed.

  • The absence of repetitive columns in the same table
  • Only one value in each column
  • An identifier key column for each row

We can see the reasons for these requirements in the examples below.

 

Name Surname Phone 1 Phone 2 Phone 3
Nurullah Cakir (123) 4567890 (123) 1234567 (123)1231231
Hakan Gurbaslar (123) 1111111

 

In the above example, the person named Nurullah Cakir has 3 phone information. However, because the person named Hakan Gürbaşlar has only one phone information, the “Phone 2” and “Phone 3” fields are empty. If a person with 4 or more phones is added to the Table, the new columns will have to be opened. This will unnecessarily increase the number of columns in the table. Therefore, there should be no repetitive columns in the same table.

 

Name Surname Phone
Nurullah Cakir (123) 4567890 , (123) 1234567 , (123)1231231
Hakan Gurbaslar (123) 1111111

 

If we create the table as above, the number of columns will not be increased. However, if we want to find the person with the phone “(123) 1234567” or to delete the “(123) 1234567” phone, the operation will be more complicated.

 

Name Surname Phone
Nurullah Cakir (123) 4567890
Nurullah Cakir (123) 1234567
Nurullah Cakir (123) 1231231
Hakan Gurbaslar (123) 1111111

 

When we create the table as above, the phone information will be in one column and a single phone information is stored on each row. However, we do not know whether the records belonging to Nurullah Cakir belong to the same person. Therefore, we can add a new column and determine if these records belong to same people.

 

ID Name Surname Phone
1 Nurullah Çakır (123) 4567890
1 Nurullah Çakır (123) 1234567
1 Nurullah Çakır (123)1231231
2 Hakan Gürbaşlar (123) 1111111

 

Normalization Level 2:

In order to be able to follow the steps in this level of normalization, Normalization Level 1 procedures must be completed.

Then we have to do the following checks:

  • Creating different tables for records that contain the same subset
  • Identify the foreign key to define the relationship between the sub table and the actual table.

When we look at the table in our first normalization level, we see that there are repeated records. Then we have to keep the contact information in a separate table, and the phone information of the contacts in a separate table.

Let’s create the contact table as follows.

 

ID Name Surname
101 Nurullah Cakir
102 Hakan Gurbaslar

 

Convert the phone table to the following.

ID Phone
1 (123) 4567890
2 (123) 1234567
3 (123)1231231
4 (123) 1111111

 

In this table, it is not clear which phone belongs to which person. When we add the ID value that we created for the contact table, the table is as follows.

 

ID Contact ID Phone
1 101 (123) 4567890
2 101 (123) 1234567
3 101 (123)1231231
4 102 (123) 1111111

 

This means that we can access the ID in the Contact table from the phone number. We also access all of the contact’s information from the ID value in the contact table.

 

Normalization Level 3:

In order to be able to follow the steps in this level of normalization, Normalization Level 2 procedures must be completed.

At this level, it is also checked that each non-primary column is not connected to another column.

Add the city and county columns to the contact table.

 

ID Name Surname City County
101 Nurullah Cakir Ankara Yenimahalle
102 Hakan Gurbaslar Ankara Yenimahalle

 

City column(its not primary key) is related to the County column(its not primary key too).

Therefore we need to create a separate table for City and County information.

In OLTP systems, performance will increase as the level of normalization increases.

In OLAP systems, performance will decrease as the level of normalization increases.

dbtut
Author: dbtut

We are a team with over 10 years of database management and BI experience. Our Expertises: Oracle, SQL Server, PostgreSQL, MySQL, MongoDB, Elasticsearch, Kibana, Grafana.

Leave a Reply

Your email address will not be published. Required fields are marked *