Day 17: Normalization in DBMS
Hello Dear Students,
Hope you all are doing good...
Aaj hum normalization topic cover karenge jo ki most important topic hai exam mein aane ke liye. Normalization karne se pehle hum functional dependency ko understand karenge, and then further topics.
Let's get started....
FUNCTIONAL DEPENDENCY - Dependency means kisi cheez par depend hona, and functional ek mathematical term hai jo ki functions ko define karti hai. Functional dependency basically same table ke attributes ke relations ko consider karti hai. Ek attribute par jab doosre attributes depend karein to wo uski functional dependency hoti hai.
Ek table mein multiple attributes hote hain. Ek attribute jo ki unique values ko hi store karta hai and thus primary key hai, to baaki attributes us primary key wale attribute par depend honge, simple language mein ye functional dependency ki example hai.
Relation R1
Id | Name | Salary |
001 | Ram | 15,000 |
002 | Shyam | 16,000 |
003 | Ram | 15,500 |
Ye example ke liye ek table hai, jisme ID unique hai so ye primary key hai. Ab baaki attributes, Name and Salary unique nahi hain, toh ye dono attributes ID attribute(primary key because unique) par depend karenge. Simply bhi agar dekha jaye to hum Shyam ki value ko fetch karne ke liye ID hi query mein likhenge. Jaise ki ID=002, so Shyam 16,000 result a jayega. So, yeh ek bahut basic cheez hai but consider karni bhi zaroori hai.
Above table mein Name attribute mein Ram value same hai and repeat ho rahi hai, but value fetch karni easy hai because ID alag alag hai. So, functional dependency mein atleast ek attribute same hona chaiye jo ki primary key ho and us attribute ko use karke baaki values fetch ki ja sake.
Functional dependency is an association between two attributes of the same tables.
X--->Y
NORMALIZATION - Normalization ek technique hai jo ki data ko multiple related tables main organize karta hai, to minimize data redundancy. Normalization basically large table ko small tables mein divide kar deta hai jisse data redundancy ko minimize kiya jata hai. Data redundancy means data repetition, means ek table mein multiple times same values ho.
Relation Student
rollno | name | department | hod | dept_phoneno |
01 | Ram | CSA | Mr. A | 987654 |
02 | Shyam | CSA | Mr. A | 987654 |
03 | Rishi | CSA | Mr. A | 987654 |
04 | John | CSA | Mr. A | 987654 |
In above table, values are repeated multiple times on attributes department, hod, and dept_phoneno. This is known as Data Redundancy means data is redundant i.e. repeated.
ANOMALIES - Anomalies basically means the problems.
- INSERT ANOMALY - Insert anomaly means kuch new value ko add karne mein jo problems aati hai. Jaise above table mein agar new values insert kartein hain to problem ayegi because data redundant hai and just ek student ko add karne ke liye bhi saari redundant values ko dubara likhna padega. Suppose, ek new student Ashraf ko add karna hai to department, hod, dept_phoneno sab kuch repeat kiya jayega. So, ye Insert Anomaly hai means insertion mein problems aana.
- DELETE ANOMALY - Delete anomaly means already existing data ko remove karne ya delete karne mein aane wali problems. Suppose, session khatam hone ke baad sabhi values ko delete karna hai to sabhi values delete ho jayengi and department, hod and dept_phoneno bhi delete ho jayega, but ye data aage ke sessions ko add karne ke liye zaroori tha but wo bhi saath mein hi delete ho jayega. So, this is Delete anomaly means deletion mein problems aana.
- UPDATE ANOMALY - Update anomaly means data ko update ya modify karne mein aane wali problems. Jab data ko change karenge to data redundant hone ki wajah se problems ayengi. Suppose, due to any reason dept_phoneno change so har ek tuples mein values ko modify karna difficult task hai and isme bahut sari problems aane ke bhi chances hai such as galti se just kisi bhi student ke data mein changes na kiye jayein to it leads inaccuracy.
So, normalization mein hum kya karte hain ek long table ko small tables mein divide karte hain so that data redundancy ko minimize kiya jaa sake. Ye poori tarah se data redundancy ko eliminate nahi karta but minimize karta hai. For example,
Relation Student
rollno | name | department |
01 | Ram | CSA |
02 | Shyam | CSA |
03 | Rishi | CSA |
04 | John | CSA |
Relation Department
department | hod | dept_phoneno |
CSA | Mr. A | 987654 |
CSA | Mr. A | 987654 |
CSA | Mr. A | 987654 |
CSA | Mr. A | 987654 |
Long table ko small tables mein divide kiya gya hai jisse data redundancy minimize huyi hai. Relations, student and department dono aapas mein linked hain. If department table mein koi bhi changes honge to automatically student table usse fetch kar sakta hai. So, Normalization minimizes the data redundancy as well as anomalies.
So that was normalization basic introduction for your understandability. Now lets study types of normalization.
- FIRST NORMAL FORM (1NF) - 1st normal form is very easy. 1NF mein basically 4 rules hote hain jo ki satisfied hone chaiye ek database ke liye. Agar hum ek database bana rahe hain to hamara database atleast 1NF ke 4 rules ko satisfy karna chaiye, and agar hamara database yeh 4 rules ko satisfy nahi karta hai to wo database ko bad design database consider nahi kiya jata. So, the 4 rules are as follows-
- Each column should have atomic values, means har ek data value individual units mein hona chaiye. For example, X,Y this is not individual unit because it contains 2 values.
- Each column must contain the data of same data type. For example, agar ek column char data type ko specify karta hai to usme number data type fill nahi ho sakta.
- Each column should have a unique name means koi bhi column name same nahi hona chaiye different hone chahiye.
- Data ko store karne ke liye uske order ko consider nahi kiya jata. User kisi bhi order mein data ko store kra sakta hai and SQL queries se easily result ko fetch kar sakta hai.
- SECOND NORMAL FORM (2NF) - 2nd normal form is the normalization form which satisfies the 1NF and the fully functional dependency of all non-key attributes. Means 2NF ke liye 1NF ko satisfy karna hoga and jo bhi attributes hain wo fully functional dependent hone chahiye. Ek table mein jo primary key hai us par functional dependent honge sabhi other attributes, and koi bhi aisa attribute nahi hona chahiye jo ki partially dependent ho kise aur attribute par, sabhi attributes only primary key par hi fully dependent hone chahiye.For example,
Relation Student
rollno | name | department | dept_phoneno |
01 | Ram | CSA | 987654 |
02 | Shyam | Law | 987644 |
03 | Rishi | Education | 987655 |
04 | John | CSA | 987654 |
Is table mein rollno is primary key. Name and department dependent hain rollno par, means agar hum 01 search karnge to name and department aa jayega but dept_phoneno department attribute par depend karta hai, means dept_phoneno is partially dependent on department attribute. So, 2NF ke liye hume ise remove karna hoga. So, uske liye hum ye table ko divide kar denge, like-
Relation Student
rollno | name | department |
01 | Ram | CSA |
02 | Shyam | Law |
03 | Rishi | Education |
04 | John | CSA |
Relation Department
department | dept_phoneno |
CSA | 987654 |
Law | 987644 |
Education | 987655 |
CSA | 987654 |
So, by doing this we achieve second normal form of normalization.
- THIRD NORMAL FORM (3NF) - 3rd normal form is the normalization form which satisfies 2NF and the not any attribute is transitive dependency. Sabhi attributes fully functional dependent hone chaiye and koi bhi non-key attribute transitive dependent nahi hona chahiye. Transitive dependency alag hai partial dependency se. Maths mein bhi transitive property ki hai ki if
Same yehi property hai transitive property normalization mein bhi.
Relation Student
rollno | name | subject | teacher | department |
01 | Ram | C | A | CSA |
02 | Shyam | C++ | A | CSA |
03 | Rishi | Java | C | CSA |
04 | John | CN | D | CSA |
In above table, name is functionally dependent on rollno, the subject is functionally dependent on teacher and the teacher depends on department thus subject depends on department.
Student-->Teacher
Teacher-->Department, thus
Student-->Department
This is the transitive dependency so to remove this, we divide the table into small tables so as to remove the transitive dependency.
- BOYCE-CODD NORMAL FORM (BCNF) - BCNF wo form hai jisme har ek determinant(ek attribute on which other attribute full functionally dependent) ek candidate key honi chaiye. BCNF mein multiple candidate keys hone chaihiye. For BCNF-
- Relation should be in 2NF.
- No transitive dependency.
- When X-->A, then X is a superkey or candidate key.
Relation Student
id | name | subject | grade |
001 | Ram | C | A |
002 | Shyam | C | C |
003 | Amit | CN | A |
001 | Ram | Java | B |
003 | Amit | DBMS | A |
In this table, id cannot be primary key because it is not unique and repeats. Let us assume that each student have different name and id, so it contains 2 candidate keys(name,subject) and (id,subject). Here functional dependency is name-->id, and id-->name. Thus there is overlapping of candidate keys. So to remove this problem we will breake this table to achieve BCNF.
Relation Students_St
name | id |
Ram | 001 |
Shyam | 002 |
Amit | 003 |
Relation Student_Subject
id | subject | grade |
001 | C | A |
002 | C | C |
003 | CN | A |
001 | Java | B |
003 | DBMS | A |
Thus, here it satisfies BCNF.
- FOURTH NORMAL FORM (4NF) - 4NF wo form hai jisme multivalued dependency nahi honi chaiye. Multivalue dependency means ki multiple attributes ek hi attribute par fully functionally dependent ho. For example,
course | teacher | topics |
DBMS | X | normalization |
DBMS | X | SQL |
DBMS | Y | Data Warehousing |
CN | A | OSI |
CN | A | Medium |
CN | A | Internet |
So, here, course-->teacher and also course-->topics so this states that, course-->{teacher/topics} which means there is multivalued dependency here. So we have to remove it by breaking the table.
course | teacher |
DBMS | X |
DBMS | Y |
CN | A |
course | topics |
DBMS | normalization |
DBMS | SQL |
DBMS | Data Warehousing |
CN | OSI |
CN | Medium |
CN | Internet |
Both are now in 4NF.
Best of luck students,
Do share, subscribe and comments if you like our efforts..