Powered By Blogger

Monday, 27 February 2012

Candidate & Composite Key


Candidate key - A candidate key is any key that could be used as the primary key. This comes with two requirements: it cannot be null, and its values must be unique.
Example,
In a table Customer, suppose you have fields Customer ID, Customer Name, Address, Mobile Number here Customer ID and Mobile

Number is called as candidate Key because those fields can be unique,


Composite key - A composite key contains more than one column Sometimes you may create primary key with the combination of two or more fileds in such cases it is called composite key.
Example,
In an Order table, suppose you have OrderID, ItemID, price. Here Primary key field will be the combination of OrderID and

ItemID because in a single order you can have multiple items.

Star & Snowflake Schema


Star schema
A star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions.  A fact is an event that is counted or measured, such as a sale.  A dimension contains reference information about the fact, such as date, product, or customer. 
For example, a simple Sales fact with millions of individual clothing sale records might contain a Product Key, Promotion Key, Customer Key, and Date Key, along with Units Sold and Revenue measures.  The Product dimension would hold reference information such as product name, description, size, and color.  The Promotion dimension would hold information such as promotion name and price.  The Customer dimension would hold information such as first and last name, birth date, gender, address, etc.  The Date dimension would include calendar date, week of year, month, quarter, year, etc. 

The star schema supports rapid aggregations (such as count, sum, and average) of many fact records, and these aggregations can be easily filtered and grouped (“sliced & diced”) by the dimensions. 
 A star schema may be partially normalized (snowflake), with related information stored in multiple related dimension tables, to support specific data warehousing needs.

Snowflake schema
In data warehousing, snow flaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables.
Snow flaking is used to improve the performance of certain queries.  The schema is diagramed with each fact surrounded by its associated dimensions (as in a star schema), and those dimensions are further related to other dimensions, branching out into a snowflake pattern.
Cases for snow flaking includes:
Sparsely populated attributes, where most dimension member records have a NULL value for the attribute, are moved to a sub-dimension.
Low cardinality attributes that are queried independently.  For example, a product dimension may contain thousands of products, but only a handful of product types.  Moving the product type attribute to its own dimension table can improve performance when the product types are queried independently.
Attributes that are part of a hierarchy and are queried independently.  Examples include the year, quarter, and month attributes of
                   Star schema
                       Snowflake Schema
1. De-Normalized Data Structure

2. Category wise Single Dimension Table

3. More data dependency and redundancy

4. No need to use complicated join

5. Query Results Faster

6. No Parent Table

7. Simple DB Structure


1. Normalized Data Structure

2. Dimension table split into many pieces

3. less data dependency and No redundancy

4. Complicated Join

5. Some delay in Query Processing

6. It May contain Parent Table

7. Complicated DB Structure


Star schema: In this star schema fact table in normalized format and dimension table is in de normalized format.  It also known as basic star schema.
Snow flake schema: In this both dimension and fact table is in normalized format only. It is also known as extended star schema.
If u r taking the snow flake it requires more dimensions, more foreign keys, and it will reduce the query performance, but it normalizes the records.

Data Modeling


Data modeling
£  Data modeling techniques to easily understood representations of the data flows and processes.
£  A data model can be thought of as a diagram or flowchart that illustrates the relationships between data.
Component of data model
E-R diagram

Which represents the data strucures in a pictorial form? Because the diagram is easily learned, it is valuable tool to communicate the model to the end-user.

Data documentation

This a document that describes in detail the data objects, relationships, And rules required by the database.

Basic style of data model

Conceptual data models-

Identifies the highest-level relationships between different entities. Sometimes called
Domain models, are typically used to explore domain concepts with project stakeholders
Logical Data Modeling 
Illustrates the specific entities, attributes and relationships involved in a business function. Serves as the basis for the creation of the physical data model.
Physical Data Modeling
 Represents an application and database-specific implementation of a logical data model. PDMs are used to design the internal schema of a database, depicting the data tables, the data columns of those tables, and the relationships between the tables.














Figure 1. A simple logical data model.
Simple LDM
 
Figure 2. A simple physical data model.
Simple PDM
It is critical for an application developer to have a grasp of the fundamentals of data modeling so they can not only read data models but also work effectively with Agile DBAs who are responsible for the data-oriented aspects of your project.  Your goal reading this section is not to learn how to become a data modeler, instead it is simply to gain an appreciation of what is involved.
The following tasks are performed in an iterative manner:

Normalization


Level
Rule
An entity type is in 1NF when it contains no repeating groups of data.
An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key.
An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key.