Powered By Blogger

Saturday, 5 May 2012

Find the Difference amongs the column in different tables.

INTRODUCTION

Let us suppose you have two table. One is yearly (Permanent) table and other is temporary table ( daily transaction table). If your  yearly table has 450 columns and your temporary table has 446 columns.If
you want to move temporary table data into the yearly table that time oracle give the error """"NOT ENOUGH VALUE"""". That time you  need to know which column is not present in your temporary table. For that you write the query.....


> select column_name from user_tab_columns where table_name='EMP'
   MINUS
   select column_name from user_tab_columns where table_name='EMPTEMP'

It will show 4 columns that was not present in your temporary table. you can alter the column.


""here EMP is a yearly(permanent) table and EMPTEMP is a temporary table"".

Wednesday, 2 May 2012

Cluster and Non Cluster Index With Example

1. Introduction

We all know that data entered in the tables are persisted in the physical drive in the form of database files. Think about a table, say Customer (For any leading bank India), that has around 16 million records. When we try to retrieve records for two or three customers based on their customer id, all 16 million records are taken and comparison is made to get a match on the supplied customer ids. Think about how much time that will take if it is a web application and there are 25 to 30 customers that want to access their data through internet. Does the database server do 16 million x 30 searches? The answer is no because all modern databases use the concept of index.

2. What is an Index

Index is a database object, which can be created on one or more columns (16 Max column combination). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (Insert, Delete and Update) operations.
In this article, we will see creating the Index. The below two sections are taken from my previous article as it is required here. If your database has changes for the next two sections, you can directly go to section 5.

3. First Create Two Tables

To explain these constraints, we need two tables. First, let us create these tables. Run the below scripts to create the tables. Copy paste the code on the new Query Editor window, then execute it.

CREATE TABLE Student(StudId Number(4), StudName varchar(50), Class varchar2(15));
CREATE TABLE TotalMarks(StudId Number(5), TotalMarks number(5));
Go

Note that there are no constraints at present on these tables. We will add the constraints one by one.

4. Primary Key Constraint

A table column with this constraint is called as the key column for the table. This constraint helps the table to make sure that the value is not repeated and also no null entries. We will mark the StudId column of the Student table as primary key. Follow these steps:
  1. Right click the student table and click on the modify button.
  2. From the displayed layout, select the StudId row by clicking the Small Square like button on the left side of the row.
  3. Click on the Set Primary Key toolbar button to set the StudId column as primary key column.
Pic01.JPG
Now this column does not allow null values and duplicate values. You can try inserting values to violate these conditions and see what happens. A table can have only one Primary key. Multiple columns can participate on the primary key column. Then, the uniqueness is considered among all the participant columns by combining their values.

5. Clustered Index

The primary key created for the StudId column will create a clustered index for the Studid column. A table can have only one clustered index on it.
When creating the clustered index, Oracle reads the Studid column and forms a Binary tree on it. This binary tree information is then stored separately in the disc. Expand the table Student and then expand the Indexes. You will see the following index created for you when the primary key is created:

Pic02.jpg

With the use of the binary tree, now the search for the student based on the studid decreases the number of comparisons to a large amount. Let us assume that you had entered the following data in the table student:
Pic03.jpg The index will form the below specified binary tree. Note that for a given parent, there are only one or two Childs. The left side will always have a lesser value and the right side will always have a greater value when compared to parent. The tree can be constructed in the reverse way also. That is, left side higher and right side lower.
Pic04.JPG Now let us assume that we had written a query like below:
Select * from student where studid = 103;
Select * from student where studid = 107;

Execution without index will return value for the first query after third comparison.
Execution without index will return value for the second query at eights comparison.

Execution of first query with index will return value at first comparison.
Execution of second query with index will return the value at the third comparison. Look below:
  1. Compare 107 vs 103 : Move to right node
  2. Compare 107 vs 106 : Move to right node
  3. Compare 107 vs 107 : Matched, return the record
If numbers of records are less, you cannot see a different one. Now apply this technique with a Yahoo email user accounts stored in a table called say YahooLogin. Let us assume there are 33 million users around the world that have Yahoo email id and that is stored in the YahooLogin. When a user logs in by giving the user name and password, the comparison required is 1 to 25, with the binary tree that is clustered index.
Look at the above picture and guess yourself how fast you will reach into the level 25. Without Clustered index, the comparison required is 1 to 33 millions.
Got the usage of Clustered index? Let us move to Non-Clustered index.

6. Non Clustered Index

A non-clustered index is useful for columns that have some repeated values. Say for example, AccountType column of a bank database may have 10 million rows. But, the distinct values of account type may be 10-15. A clustered index is automatically created when we create the primary key for the table. We need to take care of the creation of the non-clustered index.

7. How Does a Non-Clustered Index Work?

A table can have more than one Non-Clustered index. But, it should have only one clustered index that works based on the Binary tree concept. Non-Clustered column always depends on the Clustered column on the database.
This can be easily explained with the concept of a book and its index page at the end. Let us assume that you are going to a bookshop and found a big 1500 pages of C# book that says all about C#. When you glanced at the book, it has all beautiful color pages and shiny papers. But, that is not only the eligibility for a good book right? One you are impressed, you want to see your favorite topic of Regular Expressions and how it is explained in the book. What will you do? I just peeped at you from behind and recorded what you did as below:
  1. You went to the Index page (it has total 25 pages). It is already sorted and hence you easily picked up Regular Expression that comes on page Number 17.
  2. Next, you noted down the number displayed next to it which is 407, 816, 1200-1220.
  3. Your first target is Page 407. You opened a page in the middle, the page is greater than 500.
  4. Then you moved to a somewhat lower page. But it still reads 310.
  5. Then you moved to a higher page. You are very lucky you exactly got page 407. [Yes man you got it. Otherwise I need to write more. OK?]
  6. That’s all, you started exploring what is written about Regular expression on that page, keeping in mind that you need to find page 816 also.
In the above scenario, the Index page is Non-Clustered index and the page numbers are clustered index arranged in a binary tree. See how you came to the page 407 very quickly. Your mind actually traversed the binary tree way left and right to reach the page 407 quickly.

Tuesday, 1 May 2012

Normalization with Example

Why do we need to do normalization?

To eliminate redundancy of data i.e. having same information stored at multiple places, which eventually be difficult to maintain and will also increase the size of our database.
With normalization we will have tables with fewer columns which will make data retrieval and insert, update and delete operations more efficient.


What do we mean when we say a table is not in normalized form?


Let’s take an example to understand this,
Say I want to create a database which stores my friends name and their top three favorite artists.
This database would be quite a simple so initially I’ll be having only one table in it say friends table. Here FID is the primary key.


FID FNAME FavoriteArtist
1 Srihari Akon, The Corrs, Robbie Williams.
2 Arvind Enigma, Chicane, Shania Twain

This table is not in normal form why?

FavoriteArtist column is not atomic or doesn’t have scalar value i.e. it has having more that one value.
Let’s modify this table

FID FNAME FavoriteArtist1 FavoriteArtist2 FavoriteArtist3
1 Srihari Akon. The Corrs Robbie Williams.
2 Arvind Enigma Chicane Shania Twain

This table is also not in normal form why?

We have now changed our table and now each column has only one value!! (So what’s left?)
Because here we are having multiple columns with same kind of value.

I.e. repeating group of data or repeating columns.

So what we need to do to make it normal or at least bring it in First Normal Form?
  1. We’ll first break our single table into two.
  2. Each table should have information about only one entity so it would be nice if we store our friend’s information in one table and his favorite artists’ information in another
(For simplicity we are working with few columns but in real world scenario there could be column like friend’s phone no, email , address and favorites artists albums, awards received by them, country etc. So in that case having two different tables would make complete sense)


FID FNAME
1 Srihari
2 Arvind
FID Favorite Artist
1 Akon.
1 The Corrs
1 Robbie Williams
2 Enigma
2 Chicane
2 Shania Twain

FID foreign key in FavoriteArtist table which refers to FID in our Friends Table.

Now we can say that our table is in first normal form.

Remember For First Normal Form

1...Column values should be atomic, scalar or should be holding single value
2...No repetition of information or values in multiple columns.

3...So what does Second Normal Form means?


 Second normal form our database should already be in first normal form and every non-key column must depend on entire primary key.

Here we can say that our Friend database was already in second normal form l.
Why?

Because we don’t have composite primary key in our friends and favorite artists table.

Composite primary keys are- primary keys made up of more than one column. But there is no such thing in our database.
But still let’s try to understand second normal form with another example
This is our new table
Gadgets Supplier Cost Supplier Address
Headphone Abaci 123$ New York
Mp3 Player Sagas 250$ California
Headphone Mayas 100$ London

In about table ITEM+SUPPLIER together form a composite primary key.

Let’s check for dependency

If I know gadget can I know the cost?

No same gadget is provided my different supplier at different rate.

If I know supplier can I know about the cost?

No because same supplier can provide me with different gadgets.

If I know both gadget and supplier can I know cost?

Yes than we can.

So cost is fully dependent (functionally dependent) on our composite primary key (Gadgets+Supplier)

Let’s start with another non-key column Supplier Address.

If I know gadget will I come to know about supplier address?

Obviously no.

If I know who the supplier is can I have it address?

Yes.

So here supplier is not completely dependent on (partial dependent) on our composite primary key (Gadgets+Supplier).

This table is surely not in Second Normal Form.

So what do we need to do to bring it in second normal form?

Here again we’ll break the table in two.
Gadgets Supplier Cost
Headphone Abaci 123$
Mp3 Player Sagas 250$
Headphone Mayas 100$
Supplier Supplier Address
Abaci New York
Sagas California
Mayas London

We now how to normalize till second normal form.

But let’s take a break over here and learn some definitions and terms.

Composite Key: -Composite key is a primary key composed of multiple columns.
Functional Dependency – When value of one column is dependent on another column.

So that if value of one column changes the value of other column changes as well.

e.g. Supplier Address is functionally dependent on supplier name. If supplier’s name is changed in a record we need to change the supplier address as well.

S.Supplier–àS.SupplierAddress

“In our s table supplier address column is functionally dependent on the supplier column”

Partial Functional DependencyA non-key column is dependent on some, but not all the columns in a composite primary key.

In our above example Supplier Address was partially dependent on our composite key columns (Gadgets+Supplier).

Transitive Dependency- A transitive dependency is a type of functional dependency in which the value in a non-key column is determined by the value in another non-key column.

With these definitions in mind let’s move to Third Normal Form.
For a table in third normal form
  • It should already be in Second Normal Form.
  • There should be no transitive dependency, i.e. we shouldn’t have any non-key column depending on any other non-key column.
Again we need to make sure that the non-key columns depend upon the primary key and not on any other non-key column.

Album Artist No. of tracks Country
Come on over Shania Twain 11 Canada
History Michael Jackson 15 USA
Up Shania Twain 11 Canada
MCMXC A.D. Enigma 8 Spain
The cross of changes Enigma 10 Spain

Although the above table looks fine but still there is something in it because of which we will normalize it further.

Album is the primary key of the above table.

Artist and No. of tracks are functionally dependent on the Album(primary key).

But can we say the same of Country as well?

In the above table Country value is getting repeated because of artist.

So in our above table Country column is depended on Artist column which is a non-key column.

So we will move that information in another table and could save table from redundancy i.e. repeating values of Country column.

Album Artist No. of tracks
Come on over Shania Twain 11
History Michael Jackson 15
Up Shania Twain 11
MCMXC A.D. Enigma 8
The cross of changes Enigma 10
Artist Country
Shania Twain Canada
Michael Jackson USA
Enigma Spain


Normally this is considered enough and we don’t really go on applying the other normal forms.

Most of real-world application has databases which are in third normal forms.