The concept of normalization was created by Edgar F. Codd, the founder of the relational database model. During the design of the tables, we can achieve performance improvement by using normalization at an appropriate level. However, data consistency and integrity are maintained.
Normalization procedures consist of certain levels. It does not go to the other level before the operations in one level are completed.
Table and column based operations are performed at each level in order to maintain more atomic data. Basically there are 6 normalization levels. However, we will describe the 3 most commonly used levels of normalization.
Normalization Level 1:
Normalization Level 1 can be considered as the lowest level. For a database to have normalization at this level, the following rules must be followed.
- The absence of repetitive columns in the same table
- Only one value in each column
- An identifier key column for each row
We can see the reasons for these requirements in the examples below.
Name | Surname | Phone 1 | Phone 2 | Phone 3 |
Nurullah | Cakir | (123) 4567890 | (123) 1234567 | (123)1231231 |
Hakan | Gurbaslar | (123) 1111111 |
In the above example, the person named Nurullah Cakir has 3 phone information. However, because the person named Hakan Gürbaşlar has only one phone information, the “Phone 2” and “Phone 3” fields are empty. If a person with 4 or more phones is added to the Table, the new columns will have to be opened. This will unnecessarily increase the number of columns in the table. Therefore, there should be no repetitive columns in the same table.
Name | Surname | Phone |
Nurullah | Cakir | (123) 4567890 , (123) 1234567 , (123)1231231 |
Hakan | Gurbaslar | (123) 1111111 |
If we create the table as above, the number of columns will not be increased. However, if we want to find the person with the phone “(123) 1234567” or to delete the “(123) 1234567” phone, the operation will be more complicated.
Name | Surname | Phone |
Nurullah | Cakir | (123) 4567890 |
Nurullah | Cakir | (123) 1234567 |
Nurullah | Cakir | (123) 1231231 |
Hakan | Gurbaslar | (123) 1111111 |
When we create the table as above, the phone information will be in one column and a single phone information is stored on each row. However, we do not know whether the records belonging to Nurullah Cakir belong to the same person. Therefore, we can add a new column and determine if these records belong to same people.
ID | Name | Surname | Phone |
1 | Nurullah | Çakır | (123) 4567890 |
1 | Nurullah | Çakır | (123) 1234567 |
1 | Nurullah | Çakır | (123)1231231 |
2 | Hakan | Gürbaşlar | (123) 1111111 |
Normalization Level 2:
In order to be able to follow the steps in this level of normalization, Normalization Level 1 procedures must be completed.
Then we have to do the following checks:
- Creating different tables for records that contain the same subset
- Identify the foreign key to define the relationship between the sub table and the actual table.
When we look at the table in our first normalization level, we see that there are repeated records. Then we have to keep the contact information in a separate table, and the phone information of the contacts in a separate table.
Let’s create the contact table as follows.
ID | Name | Surname |
101 | Nurullah | Cakir |
102 | Hakan | Gurbaslar |
Convert the phone table to the following.
ID | Phone |
1 | (123) 4567890 |
2 | (123) 1234567 |
3 | (123)1231231 |
4 | (123) 1111111 |
In this table, it is not clear which phone belongs to which person. When we add the ID value that we created for the contact table, the table is as follows.
ID | Contact ID | Phone |
1 | 101 | (123) 4567890 |
2 | 101 | (123) 1234567 |
3 | 101 | (123)1231231 |
4 | 102 | (123) 1111111 |
This means that we can access the ID in the Contact table from the phone number. We also access all of the contact’s information from the ID value in the contact table.
Normalization Level 3:
In order to be able to follow the steps in this level of normalization, Normalization Level 2 procedures must be completed.
At this level, it is also checked that each non-primary column is not connected to another column.
Add the city and county columns to the contact table.
ID | Name | Surname | City | County |
101 | Nurullah | Cakir | Ankara | Yenimahalle |
102 | Hakan | Gurbaslar | Ankara | Yenimahalle |
City column(its not primary key) is related to the County column(its not primary key too).
Therefore we need to create a separate table for City and County information.
In OLTP systems, performance will increase as the level of normalization increases.
In OLAP systems, performance will decrease as the level of normalization increases.