What
is Database NORMALIZATION? What are different techniques to normalize DB
Tables?
Database
Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables
to eliminate data redundancy and undesirable characteristics like Insertion,
Update and Deletion Anomalies. It is
a multi-step process that puts data into tabular form by removing duplicated
data from the relation tables.
Normalization
is used for mainly two purpose,
·
Eliminating
redundant(useless) data.
·
Ensuring
data dependencies make sense i.e. data is logically stored.
Problem Without Normalization
Without
Normalization, it becomes difficult to handle and update the database, without
facing data loss. Insertion, Updation and Deletion anomalies are very frequent
if Database is not Normalized. To understand these anomalies let us take an example
of Student table.
S_id
|
S_Name
|
S_Address
|
Subject_opted
|
401
|
Adam
|
Noida
|
Bio
|
402
|
Alex
|
Panipat
|
Maths
|
403
|
Stuart
|
Jammu
|
Maths
|
404
|
Adam
|
Noida
|
Physics
|
·
Updation
Anamoly : To
update address of a student who occurs twice or more than twice in a table, we will
have to update S_Address column in all the rows, else data will
become inconsistent.
·
Insertion
Anamoly : Suppose
for a new admission, we have a Student id(S_id), name and address of a student
but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly.
·
Deletion
Anamoly : If
(S_id) 401 has only one subject and temporarily he drops it, when we delete
that row, entire student record will be deleted along with it.
Let’s understand each of them with the help of below Example:
Consider the below Table:
Let’s understand each of them with the help of below Example:
Consider the below Table:
EMPLOYEE_DEPARTMENT
with Attributes:
EMPNO(PK)
NAME
DEPTNO
DEPT_LOC
DEPT_MANAGER
Table
is having some records with Entries to each of the column.
Insertion Anomaly:
Whenever
we will insert new Employee will have to assign it to some Department as well
which
will work fine. Now suppose you want to introduce New Department and will
think about
Employee Assignment later.
So,
in above case as EMPNO is defined as Primary Key- will not allow you to insert
any record
without Employee initiation which leads to Insert Anomaly and
further needs Normalization.
So,
if we have Normalized Tables then we can insert Department Details in separate
table and
later can assign Employees to that department whenever there is
requirement.
EMPLOYEE(EMPNO,NAME,DEPTNO) DEPARTMENT(DEPTNO,DEPT_LOC,
DEPT_MANAGER)
Delete Anomaly:
Consider
there are some Employees in one Department and all have resigned So, on their
last
day Details needs to be deleted from the table. But Department detail will
also be removed from
your Table which leads to Deletion Anomaly where deletion
of some data lead to loss of other
information which might require in future.
So,
if we have Normalized Tables we can actually Delete Employee details separately
from
Employee Table but your Department Details will still be available in
Department Table.
EMPLOYEE(EMPNO,NAME,DEPTNO) DEPARTMENT(DEPTNO,DEPT_LOC,
DEPT_MANAGER)
Update Anomaly:
Suppose
Manager of a Department has changed. This requires updation of multiple entries
against that Department number. If any case we will fail to update some of the
records then we
will have two managers for the same department and one is not
available which leads to
inconsistency in the DB(Update Anomaly).
So,
if we have Normalized Tables then we can actually update Department Manager in
one table.
EMPLOYEE(EMPNO,NAME,DEPTNO) DEPARTMENT(DEPTNO,DEPT_LOC,
DEPT_MANAGER)
Normalization Rule
Normalization
rule are divided into following normal form.
1.
First
Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
First Normal Form (1NF)
As per
First Normal Form, no two Rows of data must contain repeating group of
information i.e each set of column must have a unique value, such that multiple
columns cannot be used to fetch the same row. Each table should be organized
into rows, and each row should have a primary key that distinguishes it as
unique.
The Primary key is usually a single column, but
sometimes more than one column can be combined to create a single primary key.
For example consider a table which is not in First normal form
Student
Table :
Student
|
Age
|
Subject
|
Adam
|
15
|
Biology, Maths
|
Alex
|
14
|
Maths
|
Stuart
|
17
|
Maths
|
In
First Normal Form, any row must not have a column in which more than one value
is saved, like separated with commas. Rather than that, we must separate such
data into multiple rows.
Student
Table following 1NF will be :
Student
|
Age
|
Subject
|
Adam
|
15
|
Biology
|
Adam
|
15
|
Maths
|
Alex
|
14
|
Maths
|
Stuart
|
17
|
Maths
|
Using
the First Normal Form, data redundancy increases, as there will be many columns
with same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
As per
the Second Normal Form there must not be any partial dependency of any column
on primary key. It means that for a table that has concatenated primary key,
each column in the table that is not part of the primary key must depend upon
the entire concatenated key for its existence. If any column depends only on
one part of the concatenated key, then the table fails Second normal form.
In
example of First Normal Form there are two rows for Adam, to include multiple
subjects that he has opted for. While this is searchable, and follows First
normal form, it is an inefficient use of space. Also in the above Table in
First Normal Form, while the candidate key is {Student, Subject}, Age of Student only depends on Student
column, which is incorrect as per Second Normal Form. To achieve second normal
form, it would be helpful to split out the subjects into an independent table,
and match them up using the student names as foreign keys.
New
Student Table following 2NF will be :
Student
|
Age
|
Adam
|
15
|
Alex
|
14
|
Stuart
|
17
|
In
Student Table the candidate key will be Student column, because all other column i.e Age is dependent on it.
New
Subject Table introduced for 2NF will be :
Student
|
Subject
|
Adam
|
Biology
|
Adam
|
Maths
|
Alex
|
Maths
|
Stuart
|
Maths
|
In
Subject Table the candidate key will be {Student, Subject} column. Now, both the
above tables qualifies for Second Normal Form and will never suffer from Update
Anomalies. Although there are a few complex cases in which table in Second
Normal Form suffers Update Anomalies, and to handle those scenarios Third
Normal Form is there.
Third Normal Form (3NF)
Third
Normal form applies
that every non-prime attribute of table must be dependent on primary key, or we
can say that, there should not be the case that a non-prime attribute is
determined by another non-prime attribute. So this transitive
functional dependency should
be removed from the table and also the table must be in Second Normal form. For
example, consider a table with following fields.
Student_Detail
Table :
Student_id
|
Student_name
|
DOB
|
Street
|
city
|
State
|
Zip
|
In
this table Student_id is Primary key, but street, city and state depends upon
Zip. The dependency between zip and other fields is called transitive dependency. Hence to
apply 3NF, we need to move
the street, city and state to new table, with Zip as primary key.
New
Student_Detail Table :
Student_id
|
Student_name
|
DOB
|
Zip
|
Address
Table :
Zip
|
Street
|
city
|
state
|
The
advantage of removing transtive dependency is,
·
Amount
of data duplication is reduced.
·
Data
integrity achieved.
Boyce and Codd Normal Form (BCNF)
Boyce
and Codd Normal Form is a
higher version of the Third Normal form. This form deals with certain type of
anamoly that is not handled by 3NF. A 3NF table which does not have multiple
overlapping candidate keys is said to be in BCNF. For a table to be in BCNF,
following conditions must be satisfied:
·
R must
be in 3rd Normal Form
·
and,
for each functional dependency ( X -> Y ), X should be a super Key.
It's RUDE to
Read and Run!
Get involved
and leave your Comments in the Box Below. The more people get involved, the
more we all benefit. So, leave your thoughts before you leave the page.
hi ,
ReplyDeletecould you please explain me insert, update and delete anomolies in detail
Hi Vicky,
DeleteI have updated in post itself. Thanks
When you or your company need help with QuickBooks or any other aspect of your business, dial Quickbooks Suppport Phone Number +1 615-510-6154.
ReplyDeleteJust call QuickBooks customer service at Quickbooks Customer Service +1 855-675-3194 or talk to them live on their website.
ReplyDeleteThis comment has been removed by the author.
ReplyDelete