Powered By Blogger

Saturday 10 March 2012

Data Warehousing


DWH  Is a RDBMS used for analytical propose.

Characteristics of DWH
·          
Su subject oriented
·         Time variant
·         Integrated data
·         Non volatile
·         Consistent DB.

Four stages of DWH
·         Data profiling
·         Data Clancy
·         Data quality
·         Data sampling
Data mining
It is used to dig the DWH to detect hidden pattern and relationship
Data mart
Data mart is subset of DWH. It contains departmental data. RALPH KIMBALL.
ì  Business data mart
ì  Technical data mart
Approaches of DWH
1.       BILL INMON / TOP DOWN -                             DWH to  data mart
2.       RALPH KIMBALL/ BOTTOM UP-                       Data mart to DWH
Data modeling
Data modeling refers to the process data structured and organized’
E-R modeling-
·         Transactional table – grow rapidly (sales)                                                                      [fact]
·         Semi transitional table- grow over a period of time (customer & product)             [dimension]
·         Non transactional table- never change (region, sub region)                                        [dimension]

Dimension modeling
Fact table
·         Fact table will have measure
·         Every column may have corresponding table relation.
·         They will not be any critical column.
·         Only numeric data.
·         Contain million of traction.
Dimension table
A.      Detail 
B.      Level
C.      Critical column
D.      Surrogate key
E.       No measures
F.       It contains more columns.
Types of schema                                           
Snow flake- A dimension table is splitting into one or more dimension which result is partially normalized or fully normalized.
A single fact table surrounded by multiple dimension tables.
Star schema- the fact table consist of numeric measurement is join to set dimension table field which distractive attributes.
Star schema is DB design which contain the centrally located fact table surrounded by dimension table.
Galaxy schema-a dimension table is shared by multiple fact table is known as confirmed dimension.

Fact less fact -   a fact table without numeric fact columns is called fact less fact fact table.
Non additive fact- fact that cannot be summed up for any of the dimension.
Additive fact- fact that can be summed up for any dimension.







Erwin (entity relation window)- It provide user friendly relation with database

·         ER win is computer associate
·         It is a product of logical words
·         It support 22 different type of DB

Logical model- logical model handled by business person. (The terminology is entity, attributes, relationship).
Physical model- it is handled by technical person. (Table, Column, Constraint).

Modeling technique
Forward Engineering- We create the model in the modeling tool and which is not present  in DB. If we want to this model to be present in db then we have to perform forward engineering.
Reverse engineering- It is a reverse process of forward engineering. We pull the structure from Db to model to make some modification.
 Whenever you perform reverse engineering you will get the model in physical window.
Complete compare- after the modification the table move in the DB through the complete compare.
Tab- to switch one area to another area.
Enter- to move within the area.

Active transformation- Number of record  in the source table may not be equal to the number of records in the target table . filter and aggregator
Passive transformation- Number of records should be same in both table.
SCD ( Slowly changing dimension)
Dimension that change over a date of time are called SCD
SCD Type 1-
·         We are not maintain any historical data . here the data will get over write. Hence there is no way to find out historical data.
·         The dimension table will contain only current data.
SCD Type 2-
·         In this old value will not be replaced but a new row contain the new value  will be added to the table.
SCD Type 3-
·         In this we can be able to see the latest update.
·          Only the problem with type 3 is we can maintain only partial history.