Powered By Blogger

Monday 5 November 2012

Bitmap Index vs. B-tree Index: Which and When?


Bitmap Index vs. B-tree Index: Which and When?

The classical approach for  the bitmap indexes are most appropriate for columns having low distinct values--such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique values (a column candidate for primary key) is as efficient as a B-tree index.
In this article I'll provide some examples, along with optimizer decisions, that are common for both types of indexes on a low-cardinality column as well as a high-cardinality one. These examples will help DBAs understand that the usage of bitmap indexes is not in fact cardinality dependent but rather application dependent.

Cardinality

In the context of databases, cardinality refers to the uniqueness of data values contained in a column. High cardinality means that the column contains a large percentage of totally unique values. Low cardinality means that the column contains a lot of “repeats” in its data range.

It is not common, but cardinality also sometimes refers to the relationships between tables. Cardinality between tables can be one-to-one, many-to-one, or many-to-many.

Techopedia explains Cardinality


High cardinality columns are those with very unique or uncommon data values. For example, in a database table that stores bank account numbers, the “Account Number” column should have very high cardinality – by definition, every item of data in this column should be totally unique.

Normal cardinality columns are those with a somewhat unique percentage of data values. For instance, if a table holds customer information, the “Last Name” column will have normal cardinality. Not every last name will be unique (for example, there will be several occurrences of “Smith”) but on the whole, the data will be fairly non-repetitive.

Low cardinality columns are those with very few unique values. In a customer table, a low cardinality column will be the “Gender” column. This column will likely only have “M” and “F” as the range of values to choose from, and all the thousands or millions of records in the table can only pick one of these two values for this column.

Cardinality relationships between tables can take the form of one-to-one, one-to-many (whose reversal is many-to-one), or many-to-many. These terms simply refer to the relationships of data between the tables. For example, the relationship between the “Customers” table and the “Bank Accounts” table is one-to-many, that is, one customer can have several accounts, but one account cannot belong to more than one customer. That is, of course, assuming the bank has never heard of joint accounts!
Comparing the Indexes
There are several disadvantages to using a bitmap index on a unique column--one being the need for sufficient space (and Oracle does not recommend it). However, the size of the bitmap index depends on the cardinality of the column on which it is created as well as the data distribution. Consequently, a bitmap index on the GENDER column will be smaller than a B-tree index on the same column. In contrast, a bitmap index on EMPNO (a candidate for primary key) will be much larger than a B-tree index on this column. But because fewer users access decision-support systems (DSS) systems than would access transaction-processing (OLTP) ones, resources are not a problem for these applications.
To illustrate this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I inserted one million rows into the TEST_NORMAL table using a PL/SQL block, and then inserted these rows into the TEST_RANDOM table in random order:

                              
Create table test_normal (empno number(10), ename varchar2(30), sal number(10));

Begin
For i in 1..1000000
Loop
   Insert into test_normal
   values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
   If mod(i, 10000) = 0 then
   Commit;
  End if;
End loop;
End;
/
 
Create table test_random
as
select /*+ append */ * from test_normal order by dbms_random.random;

SQL> select count(*) "Total Rows" from test_normal;

Total Rows
----------
   1000000

Elapsed: 00:00:01.09

SQL> select count(distinct empno) "Distinct Values" from test_normal;

Distinct Values
---------------
        1000000

Elapsed: 00:00:06.09
SQL> select count(*) "Total Rows" from test_random;

Total Rows
----------
   1000000

Elapsed: 00:00:03.05
SQL> select count(distinct empno) "Distinct Values" from test_random;

Distinct Values
---------------
        1000000

Elapsed: 00:00:12.07
                           
Note that the TEST_NORMAL table is organized and that the TEST_RANDOM table is randomly created and hence has disorganized data. In the above table, column EMPNO has 100-percent distinct values and is a good candidate to become a primary key. If you define this column as a primary key, you will create a B-tree index and not a bitmap index because Oracle does not support bitmap primary key indexes.
To analyze the behavior of these indexes, we will perform the following steps:
1.                    On TEST_NORMAL:
A.                  Create a bitmap index on the EMPNO column and execute some queries with equality predicates.
B.                   Create a B-tree index on the EMPNO column, execute some queries with equality predicates, and compare the logical and physical I/Os done by the queries to fetch the results for different sets of values.
2.                    On TEST_RANDOM:
A.                  Same as Step 1A.
B.                   Same as Step 1B.
3.                    On TEST_NORMAL:
A.                  Same as Step 1A, except that the queries are executed within a range of predicates.
B.                   Same as Step 1B, except that the queries are executed within a range of predicates. Now compare the statistics.
4.                    On TEST_RANDOM:
A.                  Same as Step 3A.
B.                   Same as Step 3B.
5.                    On TEST_NORMAL:
A.                  Create a bitmap index on the SAL column, and then execute some queries with equality predicates and some with range predicates.
B.                   Create a B-tree index on the SAL column, and then execute some queries with equality predicates and some with range predicates (same set of values as in Step 5A). Compare the I/Os done by the queries to fetch the results.
6.                  Add a GENDER column to both of the tables, and update the column with three possible values: M for male, F for female, and nullfor N/A. This column is updated with these values based on some condition.
7.                    Create a bitmap index on this column, and then execute some queries with equality predicates.
8.                    Create a B-tree index on the GENDER column, and then execute some queries with equality predicates. Compare to results from Step 7.
Steps 1 to 4 involve a high-cardinality (100-percent distinct) column, Step 5 a normal-cardinality column, and Steps 7 and 8 a low-cardinality column.

Step 1A (on TEST_NORMAL)

In this step, we will create a bitmap index on the TEST_NORMAL table and then check for the size of this index, its clustering factor, and the size of the table. Then we will run some queries with equality predicates and note the I/Os of these queries using this bitmap index.
                              
SQL> create bitmap index normal_empno_bmx on test_normal(empno);

Index created.

Elapsed: 00:00:29.06
SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;


Table analyzed.

Elapsed: 00:00:19.01
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX');

SEGMENT_NAME             Size in MB
------------------------------------       ---------------
TEST_NORMAL              50
NORMAL_EMPNO_BMX         28

Elapsed: 00:00:02.00
SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME              CLUSTERING_FACTOR
------------------------------         ---------------------------------
NORMAL_EMPNO_BMX        1000000

Elapsed: 00:00:00.00
                           
You can see in the preceding table that the size of the index is 28MB and that the clustering factor is equal to the number of rows in the table. Now let's execute the queries with equality predicates for different sets of values:
                              
SQL> set autotrace only
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_normal where empno=&empno
new   1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
          d=1 Bytes=34)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
                           

Step 1B (on TEST_NORMAL)

Now we will drop this bitmap index and create a B-tree index on the EMPNO column. As before, we will check for the size of the index and its clustering factor and execute the same queries for the same set of values, to compare the I/Os.
                              
SQL> drop index NORMAL_EMPNO_BMX;

Index dropped.

SQL> create index normal_empno_idx on test_normal(empno);
Index created.

SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');

SEGMENT_NAME              Size in MB
----------------------------------         ---------------
TEST_NORMAL               50
NORMAL_EMPNO_IDX          18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME           CLUSTERING_FACTOR
----------------------------------    ----------------------------------
NORMAL_EMPNO_IDX     6210
                           
It is clear in this table that the B-tree index is smaller than the bitmap index on the EMPNO column. The clustering factor of the B-tree index is much nearer to the number of blocks in a table; for that reason, the B-tree index is efficient for range predicate queries.
Now we'll run the same queries for the same set of values, using our B-tree index.
                               
SQL> set autot trace
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_normal where empno=&empno
new   1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
          d=1 Bytes=34)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
          ost=3 Card=1)
Statistics
----------------------------------------------------------
         29  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
                   
As you can see, when the queries are executed for different set of values, the number of consistent gets and physical reads are identical for bitmap and B-tree indexes on a 100-percent unique column.

BITMAP
EMPNO
B-TREE
Consistent Reads
Physical Reads
Consistent Reads
Physical Reads
5
0
1000
5
0
5
2
2398
5
2
5
2
8545
5
2
5
2
98008
5
2
5
2
85342
5
2
5
2
128444
5
2
5
2
858
5
2

Step 2A (on TEST_RANDOM)

Now we'll perform the same experiment on TEST_RANDOM:
                              
SQL> create bitmap index random_empno_bmx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');

SEGMENT_NAME             Size in MB
------------------------------------       ---------------
TEST_RANDOM              50
RANDOM_EMPNO_BMX         28

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME              CLUSTERING_FACTOR
------------------------------         ---------------------------------
RANDOM_EMPNO_BMX        1000000
                           
Again, the statistics (size and clustering factor) are identical to those of the index on the TEST_NORMAL table:
                              
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_random where empno=&empno
new   1: select * from test_random where empno=1000

Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
                           

Step 2B (on TEST_RANDOM)

Now, as in Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO column.
                              
SQL> drop index RANDOM_EMPNO_BMX;

Index dropped.

SQL> create index random_empno_idx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');

SEGMENT_NAME              Size in MB
----------------------------------         ---------------
TEST_RANDOM               50
RANDOM_EMPNO_IDX          18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME           CLUSTERING_FACTOR
----------------------------------    ----------------------------------
RANDOM_EMPNO_IDX     999830
                           
This table shows that the size of the index is equal to the size of this index on TEST_NORMAL table but the clustering factor is much nearer to the number of rows, which makes this index inefficient for range predicate queries (which we'll see in Step 4). This clustering factor will not affect the equality predicate queries because the rows have 100-percent distinct values and the number of rows per key is 1.
Now let's run the queries with equality predicates and the same set of values.
                              
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_random where empno=&empno
new   1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
   2    1     INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
                           
Again, the results are almost identical to those in Steps 1A and 1B. The data distribution did not affect the amount of consistent gets and physical reads for a unique column.

Step 3A (on TEST_NORMAL)

In this step, we will create the bitmap index (similar to Step 1A). We know the size and the clustering factor of the index, which equals the number of rows in the table. Now let's run some queries with range predicates.
                              
SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_normal where empno between &range1 and &range2
new   1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        331  consistent gets
          0  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed
                           

Step 3B (on TEST_NORMAL)

In this step, we'll execute the queries against the TEST_NORMAL table with a B-tree index on it.
                              
SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_normal where empno between &range1 and &range2
new   1: select * from test_normal where empno between 1 and 2300

2300 rows selected.
Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        329  consistent gets
         15  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed
                           
When these queries are executed for different sets of ranges, the results below show:

BITMAP
EMPNO (Range)
B-TREE
Consistent Reads
Physical Reads
Consistent Reads
Physical Reads
331
0
1-2300
329
0
285
0
8-1980
283
0
346
19
1850-4250
344
16
427
31
28888-31850
424
28
371
27
82900-85478
367
23
2157
149
984888-1000000
2139
35
As you can see, the number of consistent gets and physical reads with both indexes is again nearly identical. The last range (984888-1000000) returned almost 15,000 rows, which was the maximum number of rows fetched for all the ranges given above. So when we asked for a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and physical read counts were 7,239 and 5,663, respectively.

Step 4A (on TEST_RANDOM) 

In this step, we will run the queries with range predicates on the TEST_RANDOM table with bitmap index and check for consistent gets and physical reads. Here you'll see the impact of the clustering factor.
                              
SQL>select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_random where empno between &range1 and &range2
new   1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:08.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'






Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2463  consistent gets
       1200  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed

                           

Step 4B (on TEST_RANDOM)

In this step, we will execute the range predicate queries on TEST_RANDOM with a B-tree index on it. Recall that the clustering factor of this index was very close to the number of rows in a table (and thus inefficient). Here's what the optimizer has to say about that:
                              
SQL> select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_random where empno between &range1 and &range2
new   1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6415  consistent gets
       4910  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed
                           
The optimizer opted for a full table scan rather than using the index because of the clustering factor:
BITMAP
EMPNO (Range)
B-TREE
Consistent Reads
Physical Reads
Consistent Reads
Physical Reads
2463
1200
1-2300
6415
4910
2114
31
8-1980
6389
4910
2572
1135
1850-4250
6418
4909
3173
1620
28888-31850
6456
4909
2762
1358
82900-85478
6431
4909
7254
3329
984888-1000000
7254
4909
For the last range (984888-1000000) only, the optimizer opted for a full table scan for the bitmap index, whereas for all ranges, it opted for a full table scan for the B-tree index. This disparity is due to the clustering factor: The optimizer does not consider the value of the clustering factor when generating execution plans using a bitmap index, whereas for a B-tree index, it does. In this scenario, the bitmap index performs more efficiently than the B-tree index.
The following steps reveal more interesting facts about these indexes.

Step 5A (on TEST_NORMAL)

Create a bitmap index on the SAL column of the TEST_NORMAL table. This column has normal cardinality.
                              
SQL> create bitmap index normal_sal_bmx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;

Table analyzed.
                           
Now let's get the size of the index and the clustering factor.
                              
SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2* from user_segments
  3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');

SEGMENT_NAME                 Size in MB
------------------------------              --------------
TEST_NORMAL                  50
NORMAL_SAL_BMX               4

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME              CLUSTERING_FACTOR
------------------------------         ----------------------------------
NORMAL_SAL_BMX          6001

                           
Now for the queries. First run them with equality predicates:
                              
SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old   1: select * from test_normal where sal=&sal
new   1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        165  consistent gets
          0  physical reads
          0  redo size
       8461  bytes sent via SQL*Net to client
        609  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        164  rows processed
                           
and then with range predicates:
                              
SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old   1: select * from test_normal where sal between &sal1 and &sal2
new   1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:05.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
          =2001024)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
          Bytes=2001024)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11778  consistent gets
       5850  physical reads
          0  redo size
    4123553  bytes sent via SQL*Net to client
      61901  bytes received via SQL*Net from client
       5584  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      83743  rows processed
                           
Now drop the bitmap index and create a B-tree index on TEST_NORMAL.
                              
SQL> create index normal_sal_idx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;

Table analyzed.
                           
Take a look at the size of the index and the clustering factor.
                              
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');

SEGMENT_NAME          Size in MB
------------------------------       ---------------
TEST_NORMAL           50
NORMAL_SAL_IDX        17

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME            CLUSTERING_FACTOR
------------------------------       ----------------------------------
NORMAL_SAL_IDX        986778

                           
In the above table, you can see that this index is larger than the bitmap index on the same column. The clustering factor is also near the number of rows in this table.
Now for the tests; equality predicates first:
                              
SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old   1: select * from test_normal where sal=&sal
new   1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        177  consistent gets
          0  physical reads
          0  redo size
       8461  bytes sent via SQL*Net to client
        609  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        164  rows processed
                            
...and then, range predicates:
                              
SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old   1: select * from test_normal where sal between &sal1 and &sal2
new   1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:04.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
          =2001024)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
          Bytes=2001024)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11778  consistent gets
       3891  physical reads
          0  redo size
    4123553  bytes sent via SQL*Net to client
      61901  bytes received via SQL*Net from client
       5584  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      83743  rows processed
                           
When the queries were executed for different set of values, the resulting output, as shown in the tables below, reveals that the numbers of consistent gets and physical reads are identical.

BITMAP
SAL (Equality)
B-TREE
Rows Fetched
Consistent Reads
Physical Reads
Consistent Reads
Physical Reads
165
0
1869
177
164

169
163
3548
181
167

174
166
6500
187
172

75
69
7000
81
73

177
163
2500
190
175


BITMAP
SAL (Range)
B-TREE
Rows Fetched
Consistent Reads
Physical Reads
Consistent Reads
Physical Reads
11778
5850
1500-2000
11778
3891
83743
11765
5468
2000-2500
11765
3879
83328
11753
5471
2500-3000
11753
3884
83318
17309
5472
3000-4000
17309
3892
166999
39398
5454
4000-7000
39398
3973
500520
For range predicates the optimizer opted for a full table scan for all the different set of values—it didn't use the indexes at all—whereas for equality predicates, the optimizer used the indexes. Again, the consistent gets and physical reads are identical.
Consequently, you can conclude that for a normal-cardinality column, the optimizer decisions for the two types of indexes were the same and there were no significant differences between the I/Os.

Step 6 (add a GENDER column)

Before performing the test on a low-cardinality column, let's add a GENDER column to this table and update it with MF, and null values.
                              
SQL> alter table test_normal add GENDER varchar2(1);

Table altered.

SQL> select GENDER, count(*) from test_normal group by GENDER;

S     COUNT(*)
-     ----------
F     333769
M     499921
     166310

3 rows selected.
                           
The size of the bitmap index on this column is around 570KB, as indicated in the table below:
                              
SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);

Index created.

Elapsed: 00:00:02.08

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');

SEGMENT_NAME         Size in MB
------------------------------      ---------------
TEST_NORMAL          50
NORMAL_GENDER_BMX    .5625

2 rows selected.
                           
In contrast, the B-tree index on this column is 13MB in size, which is much bigger than the bitmap index on this column.
                              
SQL> create index normal_GENDER_idx on test_normal(GENDER);

Index created.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');

SEGMENT_NAME        Size in MB
------------------------------     ---------------
TEST_NORMAL         50
NORMAL_GENDER_IDX   13

2 rows selected.
                           
Now, if we execute a query with equality predicates, the optimizer will not make use of this index, be it a bitmap or a B-tree. Rather, it will prefer a full table scan.
                              
SQL> select * from test_normal where GENDER is null;

166310 rows selected.

Elapsed: 00:00:06.08

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750)

SQL> select * from test_normal where GENDER='M';

499921 rows selected.

Elapsed: 00:00:16.07

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025)

SQL>select * from test_normal where GENDER='F'
 /

333769 rows selected.

Elapsed: 00:00:12.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte
          s=8344225)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
           Bytes=8344225)
                           

Conclusions

Now that we understood how the optimizer reacts to these techniques, let's examine a scenario that clearly demonstrates the best respective applications of bitmap indexes and B-tree indexes.
With a bitmap index on the GENDER column in place, create another bitmap index on the SAL column and then execute some queries. The queries will be re-executed with B-tree indexes on these columns.
From the TEST_NORMAL table, you need the employee number of all the male employees whose monthly salaries equal any of the following values:
1000
1500
2000
2500
3000
3500
4000
4500

Thus:
                              
SQL>select * from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
                           
This is a typical data warehouse query, which, of course, you should never execute on an OLTP system. Here are the results with the bitmap index in place on both columns:
                               
SQL>select * from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

1453 rows selected.

Elapsed: 00:00:02.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP AND
   4    3         BITMAP OR
   5    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   6    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   7    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   8    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   9    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  10    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  11    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  12    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  13    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  14    3         BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1353  consistent gets
        920  physical reads
          0  redo size
      75604  bytes sent via SQL*Net to client
       1555  bytes received via SQL*Net from client
         98  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1453  rows processed

                           
And with the B-tree index in place:
                              
SQL>select * from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

1453 rows selected.

Elapsed: 00:00:03.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6333  consistent gets
       4412  physical reads
          0  redo size
      75604  bytes sent via SQL*Net to client
       1555  bytes received via SQL*Net from client
         98  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1453  rows processed
                           
As you can see here, with the B-tree index, the optimizer opted for a full table scan, whereas in the case of the bitmap index, it used the index to answer the query. You can deduce performance by the number of I/Os required to fetch the result.
In summary, bitmap indexes are best suited for DSS regardless of cardinality for these reasons:
                      With bitmap indexes, the optimizer can efficiently answer queries that include AND, OR, or XOR. (Oracle supports dynamic B-tree-to-bitmap conversion, but it can be inefficient.)
                      With bitmaps, the optimizer can answer queries when searching or counting for nulls. Null values are also indexed in bitmap indexes (unlike B-tree indexes).
                      Most important, bitmap indexes in DSS systems support ad hoc queries, whereas B-tree indexes do not. More specifically, if you have a table with 50 columns and users frequently query on 10 of them—either the combination of all 10 columns or sometimes a single column—creating a B-tree index will be very difficult. If you create 10 bitmap indexes on all these columns, all the queries can be answered by these indexes, whether they are queries on all 10 columns, on 4 or 6 columns out of the 10, or on a single column. The AND_EQUAL hint provides this functionality for B-tree indexes, but no more than five indexes can be used by a query. This limit is not imposed with bitmap indexes.
In contrast, B-tree indexes are well suited for OLTP applications in which users' queries are relatively routine (and well tuned before deployment in production), as opposed to ad hoc queries, which are much less frequent and executed during nonpeak business hours. Because data is frequently updated in and deleted from OLTP applications, bitmap indexes can cause a serious locking problem in these situations.
The data here is fairly clear. Both indexes have a similar purpose: to return results as fast as possible. But your choice of which one to use should depend purely on the type of application, not on the level of cardinality.





Saturday 15 September 2012

Merge Query (More than two tables)


MERGE

CREATE EMP1 TABLE WITH FOLLOWING ROWS AND COLUMNS.


    CREATE TABLE EMP1
       (
        EMPNO NUMBER(4,0),
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4,0),
HIREDATE DATE,
SAL NUMBER(10,5),
COMM NUMBER(7,2),
DEPTNO NUMBER(2,0)
       );


Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7566,'JONES','sr_MANGER',7839,to_date('02-APR-81','DD-MON-RR'),2975,null,60);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7788,'SCOTT','accountt',7566,to_date('09-DEC-82','DD-MON-RR'),3000,null,60);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7902,'FORD','accountt',7566,to_date('03-DEC-81','DD-MON-RR'),3000,null,60);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7369,'SMITH','SR_CLERK',7902,to_date('17-DEC-80','DD-MON-RR'),800,null,60);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7782,'CLARK','sr_MANGER',7839,to_date('09-JUN-81','DD-MON-RR'),2450,null,50);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7934,'MILLER','SR_CLERK',7782,to_date('23-JAN-82','DD-MON-RR'),1300,null,50);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7698,'BLAKE','sr_MANGER',7839,to_date('01-MAY-81','DD-MON-RR'),2850,null,70);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7521,'WARD','accountt',7698,to_date('22-FEB-81','DD-MON-RR'),1250,500,70);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7654,'MARTIN','accountt',7698,to_date('28-SEP-81','DD-MON-RR'),1250,1400,70);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7844,'TURNER','accountt',7698,to_date('08-SEP-81','DD-MON-RR'),1500,0,70);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
 values (7900,'JAMES','SR_CLERK',7698,to_date('03-DEC-81','DD-MON-RR'),950,null,70);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
 values (7876,'ADAMS','SR_CLERK',7788,to_date('12-JAN-83','DD-MON-RR'),1100,null,60);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values (7499,'ALLEN','accountt',7698,to_date('20-FEB-81','DD-MON-RR'),1600,300,70);
INSERT INTO EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
VALUES (7839,'KING','PRESIDENT',NULL,TO_DATE('17-NOV-81','DD-MON-RR'),0.5,NULL,10);


EMPNO   ENAME    JOB            MGR      HIREDATE        SAL    COMM    DEPTNO
--------------------------------------------------------------------------------  
7566 JONES sr_MANGER 7839 02-APR-81 2975 60
7788 SCOTT accountt 7566 09-DEC-82 3000 60
7902 FORD accountt 7566 03-DEC-81 3000 60
7369 SMITH SR_CLERK 7902 17-DEC-80 800 60
7782 CLARK sr_MANGER 7839 09-JUN-81 2450 50
7934 MILLER SR_CLERK 7782 23-JAN-82 1300 50
7698 BLAKE sr_MANGER 7839 01-MAY-81 2850 70
7521 WARD accountt 7698 22-FEB-81 1250 500 70
7654 MARTIN accountt 7698 28-SEP-81 1250 1400 70
7844 TURNER accountt 7698 08-SEP-81 1500 0 70
7900 JAMES SR_CLERK 7698 03-DEC-81 950 70
7876 ADAMS SR_CLERK 7788 12-JAN-83 1100 60
7499 ALLEN accountt 7698 20-FEB-81 1600 300 70
7839 KING PRESIDENT 17-NOV-81 0.5 10


--firstly create the table mybonus with given syntax
CREATE TABLE MYBONUS
(
EMPNO NUMBER,
BONUS NUMBER DEFAULT 100
);

--insert the data into the table
insert into mybonus(empno,sal) select e.empno,e.sal from emp1 e where job='SR_CLERK';

After insert our MYBONUS table becomes like.....

empno   bonus   sal
---------------------
7369 100 800
7934 100 1300
7900 100 950
7876 100 1100

NOTE:= Bonus has value 100 because it is define default at the time of table creation.

Now when we are fire the below merge query then the result like that

EXAMPLE 1
-------------
MERGE INTO MYBONUS B
USING (SELECT EMPNO,SAL,DEPTNO FROM EMP1 WHERE DEPTNO=70) S
ON (B.EMPNO=S.EMPNO)
WHEN MATCHED THEN
UPDATE SET B.BONUS=B.BONUS+200
WHEN NOT MATCHED THEN
INSERT (B.EMPNO,B.BONUS,B.SAL) VALUES (S.EMPNO,S.SAL*.10,S.SAL) WHERE (S.SAL<=4000);

then the result like that

OUTPUT:=  6 rows merged.
-------

and result become like that in MYBONUS Table....


EMPNO   BONUS   SAL
--------------------
7369 100 800
7934 100 1300
7900 300 950
7876 100 1100
7521 125 1250
7844 150 1500
7698 285 2850
7654 125 1250
7499 160 1600

EXPLANATION:= 
when merge will fire then using clause will fire that contain the query....

SQL>SELECT EMPNO,SAL,DEPTNO FROM EMP1 WHERE DEPTNO=70;

and result like.......

EMPNO   SAL    DEPTNO
--------------------
7521 1250 70
7698 2850 70
7654 1250 70
7844 1500 70
7900 950 70
7499 1600 70


then it will check the on(B.EMPNO=S.EMPNO) condition.

if the ON condtion will meet, then it will  update the existing data in MYBONUS table. Other wise insert.
In given example EMPNO (7369,7934,7876) are not matched any condition of output of second table (means S table which is
output of >>SELECT EMPNO,SAL,DEPTNO FROM EMP1 WHERE DEPTNO=70;).
Only EMPNO 7900 is satisfying the condition, that means only 7900 empno is updated in MYBONUS table.

Apart of this all the data of second table S (result of query >>SELECT EMPNO,SAL,DEPTNO FROM EMP1 WHERE DEPTNO=70;)
will insert into the MYBONUS table.
and result become such as....


EMPNO   BONUS   SAL
--------------------
7369 100 800
7934 100 1300
7900 300 950
7876 100 1100
7521 125 1250
7844 150 1500
7698 285 2850
7654 125 1250
7499 160 1600
 


EXAMPLE 2
-----------

MERGE INTO MYBONUS B
USING (SELECT EMPNO,SAL,DEPTNO FROM EMP1 WHERE DEPTNO=70) S
ON (B.EMPNO=S.EMPNO)
WHEN MATCHED THEN
UPDATE SET B.BONUS=B.BONUS+200
DELETE WHERE b.SAL in (800,950)
WHEN NOT MATCHED THEN
INSERT (B.EMPNO,B.BONUS,B.SAL) VALUES (S.EMPNO,S.SAL*.10,S.SAL) WHERE (S.SAL<=4000);

The difference Between first merge and second merge is only that, the DELETE statement is added in second one.

and the result become like that...


EMPNO   BONUS   SAL
--------------------
7369 100 800
7934 100 1300
7876 100 1100
7521 125 1250
7844 150 1500
7698 285 2850
7654 125 1250
7499 160 1600



EXPLANATION:=
--------------

The Delete statement is delete the record  which satisfy the ON condition as well as DELETE condition.
For example, merge query delete the EMPNO 7900 (has sal 950) because if we notice in first merge query
only EMPNO 7900 is satisfied the on condition as well as DELETE condition here.EMPNO 7369 which has sal 800
is not  deleted from the table bescause empno 7369 is not satisfied the on condition of the merge query.


NOTE:=   If your EMP1 table contain the duplicate row in table then it will not merge and give the error
-------

ERROR: IT WILL COME WHEN WE ARE FIRE THE QUERY AND WHEN WE ARE GETTING THE DUPLICATE RECORD IN SOURCE TABLE for above query.
here if emp1 table contain the duplicate records.


ORA-30926: unable to get a stable set of rows in the source tables
30926. 00000 -  "unable to get a stable set of rows in the source tables"
*Cause:    A stable set of rows could not be got because of large dml
           ACTIVITY OR A NON-DETERMINISTIC WHERE CLAUSE.
*ACTION:   REMOVE ANY NON-DETERMINISTIC WHERE CLAUSES AND REISSUE THE DML.


SOLUTION OF ERROR:
-------------------

Only we have to write the distinct keywords in using clause after the.
and our merge query become like that.....

MERGE INTO MYBONUS B
USING (SELECT distinct EMPNO,SAL,DEPTNO FROM EMP1 WHERE DEPTNO=70) S
ON (B.EMPNO=S.EMPNO)
WHEN MATCHED THEN
UPDATE SET B.BONUS=B.BONUS+200
DELETE WHERE b.SAL>2000
WHEN NOT MATCHED THEN
INSERT (B.EMPNO,B.BONUS,B.SAL) VALUES (S.EMPNO,S.SAL*.10,S.SAL) WHERE (S.SAL<=4000);


--create table by following syntax

CREATE TABLE EXAMTIMETABLE(EXAMNAME VARCHAR2(40),EXAMTIME VARCHAR2(12),CONSTRAINT EXAMNAMEPK PRIMARY KEY(EXAMNAME));
insert into EXAMTIMETABLE values('PHYSICAL SCIENCE','9:00 AM');


--merge the table one table TO itself.

MERGE INTO  EXAMTIMETABLE E1
USING EXAMTIMETABLE E2
ON (E2.EXAMNAME=E1.EXAMNAME AND E1.EXAMNAME='PHYSICAL SCIENCE')
WHEN MATCHED THEN
UPDATE SET E1.EXAMTIME='10:30 AM'
WHEN NOT MATCHED THEN
insert (e1.EXAMNAME,e1.EXAMTIME) values ('PHYSICAL SCIENCE','10:00 AM');


Inline view:=
--------------
An inline view is term given to sub query in FROM clause of query which can be used as table.
Inline view effectively is a named sub query.

EXAMPLE:=
---------

SELECT DNAME,ENAME,SAL FROM EMP,
(SELECT DNAME, DEPTNO FROM DEPT) D
WHERE A.DEPTNO = B.DEPTNO;

In the above query (SELECT DNAME, DEPTNO FROM DEPT) D is the inline view.

Inline views are determined at runtime, and in contrast to normal view they are not stored in the data dictionary.


Advantage of using inline views:
--------------------------------

1. Better query performance
2. Better visibility of code


EXAMPLE 2:
-----------

select emp.empno,
EMP.ENAME,
DEPT.DEPTNO
from
(SELECT DEPTNO FROM DEPT) DEPT,
(SELECT EMPNO, ENAME,DEPTNO FROM EMP) EMP
WHERE EMP.DEPTNO = DEPT.DEPTNO;

In the above example,first the compiler forms 2 inline views emp1 and dept and then
joins these 2 in the where clause.

Hashing Algorithm:
-------------------
Difference Between Hash Join & Merge Join
-------------------------------------------
Merge Join :
-------------
Oracle performs a join between two sets of row data  using the merge
join algorithm. The inputs are two separate sets of row data. Output is
the results of the join.  Oracle reads rows from both inputs in an
alternating fashion and merges together matching rows in order to
generate output. The two inputs are sorted on join column.

Hash Join :
-------------

Oracle performs a join between two sets of row data using hash join
algorithm.  Input and Output same as Merge Join.  Oracle reads all rows
from the second input and builds a hash structure (like has table in
java), before reading each row from the first input one at a time. For
each row from the first input, the hash structure is probed and matching
rows generate output.


Hashing and Hash table

Before seeing javaĆ¢€™s Hashtable in detail you should understand hashing in general.
 Assume that, v is a value to be stored and k is the key used for storage / retrieval,
 then h is a hash function where v is stored at h(k) of table. To retrieve a value compute
 h(k) so that you can directly get the position of v. So in a key-value pair table, you need
 not sequentially scan through the keys to identify a value.

h(k) is the hashing function and it is used to find the location to store the corresponding value v.
h(k) cannot compute to a indefinite space. Storage allocated for a Hashtable is limited within a
program. So, the hasing function h(k) should return a
number within that allocated spectrum (logical address space).

1. Definition of a Hash Table

Before we get into the definition of Hash Tables, it is good to introduce WHY to use Hash tables.

Hash tables are good for doing a quick search on things.

For instance if we have an array full of data (say 100 items). If we knew the position that a specific item is stored in an array, then we could quickly access it. For instance, we just happen to know that the item we want it is at position 3; I can apply:
myitem=myarray[3];

With this, we don't have to search through each element in the array, we just access position 3.

The question is, how do we know that position 3 stores the data that we are interested in?

This is where hashing comes in handy. Given some key, we can apply a hash function to it to find an index or position that we want to access.



1.1 What is the hash function?

There are many different hash functions. Some hash functions will take an integer key and turn it into an index. A common one is the division method.

Let's learn through an example:

1.2 Division method (one hash method for integers)

Let's say you had the following numbers or keys that you wanted to map into an array of 10 elements:

123456
123467
123450
To apply the division method, you could divide the number by 10 (or the maximum number of elements in the array) and use the remainder (the modulo) as an index. The following would result:

123456 % 10 = 6 (the remainder is 6 when dividing by 10)
123467 % 10 = 7 (the remainder is 7)
123450 % 10 = 0 (the remainder is 0)
These numbers would be inserted into the array at positions 6, 7, and 0 respectively. It might look something like this:

The important thing with the division method is that the keys are integers.

1.3 What happens when the keys aren't integers?

You have to apply another hash function to turn them into integers. Effectively, you get two hash functions in one:

function to get an integer
function to apply a hash method from above to get an index to an array


What do we mean that the keys aren't integers? Well, let's say that the keys are people's names. Such as:

Sarah Jones
Tony Balognie
Tom Katz
The goal is to type in one of these names and get an index to an array in order to access that information. How do we do this?
The hash function has to do two things:

Convert the names into integers
For instance, we have a function which turns a string into an integer. The results will be as follows:
Sarah Jones --> 1038
Tony Balognie --> 1259
Tom Katz --> 746
Apply a hash method to get an index
We can now apply the division method to get an index for an array of 10 elements
Sarah Jones --> 1038 % 10 --> 8
Tony Balognie --> 1259 % 10 --> 9
Tom Katz --> 746 % 10 --> 6
1.4 What would that look like in the array?

The array is known as a hash table. It stores the key (used to find the index) along with associated values. In the above example, we might have a hash table that looked something like this:



Again, the idea is that we will insert items into the hash table using the key and applying the hash function(s) to get the index.

A problem occurs when two keys yield the same index. For Instance, say we wanted to include:
John Smith --> 948 % 10 --> 8

We have a collision because Sarah Jones is already stored at array index 8.

We need a method to resolve this. The resolution comes in how you create your hash table. There two major approaches given in the book:

Linear Probing
Chaining
The approach used in this lab is referred to as chaining.
The details are left as class material, but recognize that in chaining you have an array of linked lists. All the data in the "same link", have colliding index values.

Consider a diagram of the above example. Remember, there was a collision with Sarah Jones and John Smith. Notice that John Smith is "chained" or "linked" after Sarah Jones.







Saturday 4 August 2012

Important Query



TO GET THE DATA_LENGTH AND DATA_TYPE OF TABLE

 SELECT  COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, DATA_SCALE FROM USER_TAB_COLUMNS WHERE COLUMN_NAME='INTERCHANGE' AND TABLE_NAME='EMI_TRAN_DETAIL';


 > SELECT * FROM USER_OBJECTS
WHERE OBJECT_NAME = 'APP_MAST_CURRENCYUPDATE';



 SELECT DBMS_METADATA.GET_DDL('VIEW','APP_MAST_CURRENCYUPDATE') FROM DUAL;


CONVERT FROM TIMESTAMP TO DATE FORMAT

> SELECT TO_DATE(TO_CHAR (SYSTIMESTAMP, 'YYYY-MON-DD HH24:MI:SS'),'YYYY-MON-DD HH24:MI:SS') AS MY_DATE FROM DUAL;


CONVERT FROM VARCHAR2 TO DATE FORMAT

>SELECT TO_DATE(LOG_DATETIME, 'DDMMYYYYHH24MISS') FROM app_switchin;



DIFFERENCES OF THE COLUMN

>SELECT COLUMN_NAME FROM USER_TAB_COLS WHERE TABLE_NAME='EMP'
MINUS
SELECT COLUMN_NAME FROM USER_TAB_COLS WHERE TABLE_NAME='EMP1';



CONCAT QUERY FOR SELECT STATEMENT

SELECT  EMPNO||'
        '||      ENAME   ||'
        '||       SAL    ||'
        '||      DEPTNO  AS EMPLOYEE FROM EMP;




>SELECT 'EMPNO ='|| EMPNO ||',
        ENAME ='|| ENAME ||',
        SAL   ='|| SAL   ||', ,
        DEPTNO='|| DEPTNO  AS EMPLOYEE FROM EMP;  

PASS THE DBMS_OUTPUT.PUT_LINE ARGUMENT

> DBMS_OUTPUT.PUT_LINE('v_id = ' || v_id || ', v_name = ' || v_name ||', v_salary = ' || v_salary);


ALTER THE TYPE IN TABLE 

> ALTER TABLE EMP ADD ADRESS ADDRESS_TYPE;

 REPLACE THE SECOND POSITION OF ENAME BY OTHER CHARACTER 

SELECT ENAME,SUBSTR(ENAME,2,1),REPLACE(ENAME,SUBSTR(ENAME,2,1),'T') FROM EMP1;

INSERT ONE COLUMN'S DATA INTO ANOTHER COLUMN WITHIN THE SAME TABLE


UPDATE EMP1 SET DESTINATION_COLUMN_NAME = SOURCE_COLUMN_NAME;

INSERT INTO TABLE WITH ALIAS NAME
  
>INSERT INTO EMP1 (EMPLOYEE_ID,SALARY)

SELECT EMPNO AS EMPLOYEE_ID, SAL AS SALARY FROM EMP;

 SUB QUERY IN SELECT CLAUSE

>SELECT
(SELECT SUM(SAL) FROM EMP) TOT_SAL,
(SELECT COUNT(*) FROM EMP) CN,
(SELECT MAX(SAL) FROM EMP) MSAL,
(SELECT MIN(SAL) FROM EMP) LEAST_SAL,
(SELECT AVG(SAL) FROM EMP) AVERAGE_SAL
FROM

EMP;


 >SELECT INCR_SAL,MODIFIED_ENAME,LPAD_EMPNO,HIREDATE FROM
(SELECT SAL+1000 AS INCR_SAL, RPAD(ENAME,10,'@') AS MODIFIED_ENAME, LPAD(EMPNO,5,'0') AS LPAD_EMPNO,HIREDATE FROM EMP)
EMPLOYEE WHERE INCR_SAL>3000;


>SELECT E.ENAME, E.SAL,
(SELECT AVG(SAL) FROM EMP WHERE DEPTNO = E.DEPTNO) AVG_SAL_DEPT,
(SELECT MAX(SAL) FROM EMP WHERE DEPTNO =E.DEPTNO) MAX_SAL
FROM EMP E
ORDER BY 1;


>SELECT AVG(SUM_COLUMN1)
  FROM (SELECT SUM(SAL) AS SUM_COLUMN1

        FROM EMP GROUP BY DEPTNO) A ;


>SELECT ENAME,
(SELECT MAX(SAL) FROM EMP) MAXSAL ,
SAL,
((SELECT MAX(SAL) FROM EMP ) - SAL ) DIFFERENCE
FROM EMP

ORDER BY ((SELECT MAX(SAL) FROM EMP ) - SAL);


TOP 2 SAL FROM EMP TABLE 

>SELECT * FROM (SELECT * FROM EMP  ORDER BY SAL DESC) WHERE ROWNUM <3;

 NTH HIGHEST SAL FROM EMP   


>SELECT SAL FROM (SELECT ROWNUM AS SALINDEX, SAL FROM (SELECT DISTINCT SAL FROM EMP ORDER BY SAL DESC)) WHERE SALINDEX=&N;

 MERGE THREE TABLES

MERGE INTO APP_MAST_MDSFC FC USING (SELECT FC.ROWID FC_RW, EPST.ADJ_AMT_SETT, CARDMASTER.CARD_NUMBER
          from
          APP_MAST_MDSFC FC,APP_MAST_MDSEPST EPST,APP_CARDMASTER CARDMASTER
          where
          FC.PROCESSOR='I'
          AND
          EPST.ADJ_SETT_IND = 'c'
          AND
          FC.TTUM_FLAG is NULL
          AND
          FC.CPD BETWEEN to_date('10-07-2012','dd-mm-rrrr') AND to_date('10-07-2012','dd-mm-rrrr')
          AND
          FC.SWT_SRL_NO = EPST.ORG_SWT_SRL_NO
          AND
          CARDMASTER.CARD_NUMBER(+) = FC.CRD_NUM)
ON (FC.ROWID=FC_RW)
WHEN MATCHED THEN
UPDATE SET EPST_AMOUNT = (ADJ_AMT_SETT / 100);


COUNT THE EMP WHO IS WORKING IN ALL (10,20,30) DEPT

SELECT D.DEPTNO, D.DNAME,
(SELECT COUNT(*) FROM EMP E
WHERE E.DEPTNO = D.DEPTNO) EMPL_CNT

FROM DEPT D;


COUNT THE EMP OF 10TH DEPT

SELECT D.DEPTNO, D.DNAME,
(SELECT COUNT(*) FROM EMP E
WHERE E.DEPTNO = D.DEPTNO) EMPL_CNT

FROM DEPT D WHERE D.DEPTNO=10;