Day 20: Data Warehousing, OLAP/OLTP, Data Mining, Big Data, Hadoop

Hello Dear Students,
  Hope you all are doing good...

This is supposed to be our last topic of DBMS. After this, we will start another subject. Test is ready for you and the students who do study our lectures from emails needs to visit our website for daily tests. Now, we are going to cover the topics of data warehousing, OLAP/OLTP data mining. big data, hadoop. 

Let's get started...

DATA WAREHOUSING - 'Warehouse' means ek bahut badi jagah jaha saamaan ko store kiya jata hai. So, data warehouse means same, aisi jagah jaha big amount of data ko store kiya jata hai and is technique ko data warehousing kehte hain. Database mein bhi hum data ko store karte hain and data warehouse mein bhi. So, major difference is-
  • Data warehouse mein bahut large amount mein data ko save kiya jata hai.
  • Database basically small/ medium organizations ke liye hota hai. For example, Shops, Companies, etc. But Data warehouse large organizations mein use hota hai jaha large amount of data ho. For example, Insurance LIC, Governments, etc. 
  • Google uses the data warehouse(BigQuery) because it has to store a very large amount data.
  • Data warehouse mein bahut sare database hote hain, means basically data warehouse is collection of various databases.
  • Data warehouse mein historic data/ timely data store kiya jata hai.
Data Warehouse Components
  1. Data Warehouse Database
  2. Tools for transformation and cleanup
  3. Meta Data
  4. Data Marts 
  5. Data Warehousing Administrative and Management
  6. Information Delivery System
  8. Data Mining, etc.

OLAP and OLTP - OLAP stands for Online Analytical Processing. OLAP ko Dr. E.F.Codd ne introduce kiya tha. OLAP mein basically hum analysis karte hain data warehouse ke data ka. Ye ek software technology hai jisme data analysts or managers data ko analyse karte hain. Data warehouse bahut large amount of data store karta hai, So, data complex hota hai and complex data ko analyse karne ke liye hum jo technique use karte hain use OLAP kehte hain. 

OLTP stands for Online Transaction Processing. OLTP mein transactions ko process kita jata hai means queries perform ki jati hai data par. Queries/ Transactions can be inser, update, delete, etc. So data warehouse ke data mein jo transactions ko process karta hai us technique ko hum OLTP kehte hain.

DATA MINING - Data mining means database mein se valuable data ko extract karna and unwanted data ko remove kar dena. Data mining basically data ki cleaning hoti hai and jo data useful hota hai usse extract kiya jata hai. Data mining mein various patterns ko use kiya jata hai data mining karne ke liye. Data mining can be done on any type of data means can be done on-
  • database
  • data warehouse
  • text data
  • spatial data
  • multimedia data
  • WWW, and so on

Data mining methods are as follows-
  1. Bayesian Classification(It is basically concerned with the hypothesis and probabilty)
  2. Back Propagation(It is neural network learning algorithm) 
  3. k-nearest neighbor(It is concerned with hamming distance functions)
  4. Case based reasoning(4R's Retrieve, Reuse, Revise, Retain)
  5. Genetic Algorithms(Search algorithm on natural genetic population)
  6. Rough set approach(Set of similar objects, basically used in AI, Fuzzy set, material sciences, etc.)
  7. Fuzzy Set Approach(has truth values from 0 and 1) and 
  8. Clustering(Extract meaningful data known as clusters, same type of data ko groups mein divide kar dena), and so on.
Clustering Methods-
  • Partition method(It moves data points from one group to another, 2 Algorithms are there- K-means and K-modes algorithms)
  • Grid-based method(It is basically grid based means the no. of cells(rows and columns), and has faster processing)
  • Model-based method(It has decision tree, and neural networks)
  • Hierarchical method(tree like structure)
  • Density-based method(To find the density and the methods are-DBSCAN, OPTICS, and DENCLUE)

BIG DATA - Big data means large amount of data. Big data is not is Mb or Gb's, It is in Tb's (Terabytes), Pb's(Petabytes). Big data ko collect, store and maintain karna bahut hi difficult hota hai means amount of data is very large. Big data mein large velocity hoti hai like Twitter ek time par bahut sari tweets generate karta hai means uski velocity bahut zyada hai. Data kisi bhi type ka ho sakta hai- Text, audio, video, queries, etc. 

Hadoop - Hadoop basically ek framework hota hai jo ki data ko distributed processing karne ke liye use kiya jata hai. Hadoop master/slave architecture par kaam karta hai means k host PC and then baaki clusters slave PC connected with the host PC. Hadoop1 and hadoop2 are 2 basic hadoops and following are its components-
Hadoop 1 - HDFS(Hadoop Distributed File System), MapReduce. HDFS basically data ko store karta hai and then MapReduce us stored data ko process karta hai.
Hadoop 2 - HDFS(Hadoop Distributed File System), MapReduce version 2, YARNS(Yet Another Resource Negotiator).  

Apache Hadoop- Ye ek framewok hai jo ki distributed files ko process karta hai with clusters. Hadoop basically big data problems ko solve karne ke liye use kiya jata hai. 

Big data mein bahut sara data hota hai jise manage karna bahut difficult hota hai, so, hadoop use kiya jata hai. Hadoop mein 1 master node hoti hai jisse various slave nodes connected hoti hain, slave nodes data ke size ko distribute kar deti hain so it becomes easy to manage big data.

Top 3 Largest Hadoops 
  1. Yahoo
  2. LinkedIn
  3. Facebook
The hadoop latest version is Apache Hadoop which is written in Java.

Best of Luck Students,
 Do share, subscribe and comments if you like our efforts...

Do visit our website UGC NET EXPERTS regularly for more content and for daily tests.