Day 17: Normalization in DBMS

Hello Dear Students,
  Hope you all are doing good...

Aaj hum normalization topic cover karenge jo ki most important topic hai exam mein aane ke liye. Normalization karne se pehle hum functional dependency ko understand karenge, and then further topics.

Let's get started....

FUNCTIONAL DEPENDENCY - Dependency means kisi cheez par depend hona, and functional ek mathematical term hai jo ki functions ko define karti hai. Functional dependency basically same table ke attributes ke relations ko consider karti hai. Ek attribute par jab doosre attributes depend karein to wo uski functional dependency hoti hai. 
Ek table mein multiple attributes hote hain. Ek attribute jo ki unique values ko hi store karta hai and thus primary key hai, to baaki attributes us primary key wale attribute par depend honge, simple language mein ye functional dependency ki example hai.

Relation R1
 IdName Salary 
001 Ram 15,000 
002 Shyam 16,000 
003 Ram 15,500 

Ye example ke liye ek table hai, jisme ID unique hai so ye primary key hai. Ab baaki attributes, Name and Salary unique nahi hain, toh ye dono attributes ID attribute(primary key because unique) par depend karenge. Simply bhi agar dekha jaye to hum Shyam ki value ko fetch karne ke liye ID hi query mein likhenge. Jaise ki ID=002, so Shyam 16,000 result a jayega. So, yeh ek bahut basic cheez hai but consider karni bhi zaroori hai. 
Above table mein Name attribute mein Ram value same hai and repeat ho rahi hai, but value fetch karni easy hai because ID alag alag hai. So, functional dependency mein atleast ek attribute same hona chaiye jo ki primary key ho and us attribute ko use karke baaki values fetch ki ja sake.

Functional dependency is an association between two attributes of the same tables.
X--->Y


NORMALIZATION - Normalization ek technique hai jo ki data ko multiple related tables main organize karta hai, to minimize data redundancy. Normalization basically large table ko small tables mein divide kar deta hai jisse data redundancy ko minimize kiya jata hai. Data redundancy means data repetition, means ek table mein multiple times same values ho. 

Relation Student
 rollnonamedepartment hoddept_phoneno 
 01Ram CSA Mr. A 987654 
 02Shyam CSA Mr. A987654 
 03RishiCSAMr. A987654
 04JohnCSA Mr. A987654

In above table, values are repeated multiple times on attributes department, hod, and dept_phoneno. This is known as Data Redundancy means data is redundant i.e. repeated.

ANOMALIES - Anomalies basically means the problems. 
  • INSERT ANOMALY - Insert anomaly means kuch new value ko add karne mein jo problems aati hai. Jaise above table mein agar new values insert kartein hain to problem ayegi because data redundant hai and just ek student ko add karne ke liye bhi saari redundant values ko dubara likhna padega. Suppose, ek new student Ashraf ko add karna hai to department, hod, dept_phoneno sab kuch repeat kiya jayega. So, ye Insert Anomaly hai means insertion mein problems aana.
  • DELETE ANOMALY - Delete anomaly means already existing data ko remove karne ya delete karne mein aane wali problems. Suppose, session khatam hone ke baad sabhi values ko delete karna hai to sabhi values delete ho jayengi and department, hod and dept_phoneno bhi delete ho jayega, but ye data aage ke sessions ko add karne ke liye zaroori tha but wo bhi saath mein hi delete ho jayega. So, this is Delete anomaly means deletion mein problems aana.
  • UPDATE ANOMALY - Update anomaly means data ko update ya modify karne mein aane wali problems. Jab data ko change karenge to data redundant hone ki wajah se problems ayengi. Suppose, due to any reason dept_phoneno change so har ek tuples mein values ko modify karna difficult task hai and isme bahut sari problems aane ke bhi chances hai such as galti se just kisi bhi student ke data mein changes na kiye jayein to it leads inaccuracy. 

So, normalization mein hum kya karte hain ek long table ko small tables mein divide karte hain so that data redundancy ko minimize kiya jaa sake. Ye poori tarah se data redundancy ko eliminate nahi karta but minimize karta hai. For example,

Relation Student
rollno name department
 01 Ram CSA
 02 Shyam CSA
 03 Rishi CSA
 04 John CSA

Relation Department
 department hod dept_phoneno 
 CSAMr. A 987654 
 CSAMr. A987654
 CSAMr. A 987654 
 CSAMr. A 987654 


Long table ko small tables mein divide kiya gya hai jisse data redundancy minimize huyi hai. Relations, student and department dono aapas mein linked hain. If department table mein koi bhi changes honge to automatically student table usse fetch kar sakta hai. So, Normalization minimizes the data redundancy as well as anomalies. 

So that was normalization basic introduction for your understandability. Now lets study types of normalization.

  • FIRST NORMAL FORM (1NF) - 1st normal form is very easy. 1NF mein basically 4 rules hote hain jo ki satisfied hone chaiye ek database ke liye. Agar hum ek database bana rahe hain to hamara database atleast 1NF ke 4 rules ko satisfy karna chaiye, and agar hamara database yeh 4 rules ko satisfy nahi karta hai to wo database ko bad design database consider nahi kiya jata. So, the 4 rules are as follows-
  1. Each column should have atomic values, means har ek data value individual units mein hona chaiye. For example, X,Y this is not individual unit because it contains 2 values. 
  2. Each column must contain the data of same data type. For example, agar ek column char data type ko specify karta hai to usme number data type fill nahi ho sakta.
  3. Each column should have a unique name means koi bhi column name same nahi hona chaiye different hone chahiye.
  4. Data ko store karne ke liye uske order ko consider nahi kiya jata. User kisi bhi order mein data ko store kra sakta hai and SQL queries se easily result ko fetch kar sakta hai.

  • SECOND NORMAL FORM (2NF) - 2nd normal form is the normalization form which satisfies the 1NF and the fully functional dependency of all non-key attributes. Means 2NF ke liye 1NF ko satisfy karna hoga and jo bhi attributes hain wo fully functional dependent hone chahiye. Ek table mein jo primary key hai us par functional dependent honge sabhi other attributes, and koi bhi aisa attribute nahi hona chahiye jo ki partially dependent ho kise aur attribute par, sabhi attributes only primary key par hi fully dependent hone chahiye.For example,
Relation Student
 rollnonamedepartment dept_phoneno 
 01Ram CSA 987654 
 02Shyam Law987644 
 03RishiEducation987655
 04JohnCSA 987654


Is table mein rollno is primary key. Name and department dependent hain rollno par, means agar hum 01 search karnge to name and department aa jayega but dept_phoneno department attribute par depend karta hai, means dept_phoneno is partially dependent on department attribute. So, 2NF ke liye hume ise remove karna hoga. So, uske liye hum ye table ko divide kar denge, like- 

Relation Student
rollno name department
 01 RamCSA
 02 ShyamLaw
 03 RishiEducation
 04 JohnCSA

Relation Department
 department dept_phoneno 
 CSA987654 
 Law987644
 Education987655 
 CSA987654 


So, by doing this we achieve second normal form of normalization.


  • THIRD NORMAL FORM (3NF) - 3rd normal form is the normalization form which satisfies 2NF and the not any attribute is transitive dependency. Sabhi attributes fully functional dependent hone chaiye and koi bhi non-key attribute transitive dependent nahi hona chahiye. Transitive dependency alag hai partial dependency se. Maths mein bhi transitive property ki hai ki if 
A-->B and B-->C thus A-->C.

Same yehi property hai transitive property normalization mein bhi. 

Relation Student
 rollnoname subject teacherdepartment 
 01Ram  C ACSA 
 02Shyam  C++ ACSA 
 03Rishi Java CCSA
 04John CN DCSA 

In above table, name is functionally dependent on rollno, the subject is functionally dependent on teacher and the teacher depends on department thus subject depends on department. 
Student-->Teacher
Teacher-->Department, thus
Student-->Department

This is the transitive dependency so to remove this, we divide the table into small tables so as to remove the transitive dependency. 


  • BOYCE-CODD NORMAL FORM (BCNF) - BCNF wo form hai jisme har ek determinant(ek attribute on which other attribute full functionally dependent) ek candidate key honi chaiye. BCNF mein multiple candidate keys hone chaihiye. For BCNF-
  1. Relation should be in 2NF.
  2. No transitive dependency.
  3. When X-->A, then X is a superkey or candidate key. 
Relation Student
 idname subject grade 
001  Ram C A
 002 Shyam C C
 003 Amit CN A
 001 Ram Java B
003 Amit DBMS A

In this table, id cannot be primary key because it is not unique and repeats. Let us assume that each student have different name and id, so it contains 2 candidate keys(name,subject) and (id,subject). Here functional dependency is name-->id, and id-->name. Thus there is overlapping of candidate keys. So to remove this problem we will breake this table to achieve BCNF.


Relation Students_St
 nameid 
Ram 001 
Shyam002 
Amit 003

Relation Student_Subject
 idsubject grade 
001  C A
 002 C C
 003 CN A
 001 Java B
003 DBMS A


Thus, here it satisfies BCNF.


  • FOURTH NORMAL FORM (4NF) - 4NF wo form hai jisme multivalued dependency nahi honi chaiye. Multivalue dependency means ki multiple attributes ek hi attribute par fully functionally dependent ho. For example,
 courseteacher topics 
 DBMSnormalization
 DBMSSQL 
 DBMS Y Data Warehousing
 CNOSI 
 CN A Medium
 CN A Internet


So, here, course-->teacher and also course-->topics so this states that, course-->{teacher/topics} which means there is multivalued dependency here. So we have to remove it by breaking the table.

 courseteacher 
 DBMS
 DBMS Y
 CN A

 coursetopics 
 DBMSnormalization
 DBMSSQL 
 DBMS Data Warehousing
 CNOSI 
 CN Medium
 CN Internet
 
Both are now in 4NF.



Best of luck students,
  Do share, subscribe and comments if you like our efforts..