Bitmap Index vs. B-tree Index: Which and When?
The classical approach for
the bitmap indexes are most appropriate for columns having low distinct
values--such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not
completely accurate, however. In reality, a bitmap index is always advisable
for systems in which data is not frequently updated by many concurrent systems.
In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent
unique values (a column candidate for primary key) is as efficient as a B-tree
index.
In this
article I'll provide some examples, along with optimizer decisions, that are
common for both types of indexes on a low-cardinality column as well as a
high-cardinality one. These examples will help DBAs understand that the usage
of bitmap indexes is not in fact cardinality dependent but rather application
dependent.
Cardinality
In the context of databases, cardinality refers to
the uniqueness of data values contained in a column. High cardinality means
that the column contains a large percentage of totally unique values. Low
cardinality means that the column contains a lot of “repeats” in its data
range.
It is not common, but cardinality also sometimes refers to the relationships between tables. Cardinality between tables can be one-to-one, many-to-one, or many-to-many.
It is not common, but cardinality also sometimes refers to the relationships between tables. Cardinality between tables can be one-to-one, many-to-one, or many-to-many.
Techopedia
explains Cardinality
High
cardinality columns are those with very unique or uncommon data values. For
example, in a database table that stores bank account numbers, the “Account
Number” column should have very high cardinality – by definition, every item of
data in this column should be totally unique.
Normal cardinality columns are those with a somewhat unique percentage of data values. For instance, if a table holds customer information, the “Last Name” column will have normal cardinality. Not every last name will be unique (for example, there will be several occurrences of “Smith”) but on the whole, the data will be fairly non-repetitive.
Low cardinality columns are those with very few unique values. In a customer table, a low cardinality column will be the “Gender” column. This column will likely only have “M” and “F” as the range of values to choose from, and all the thousands or millions of records in the table can only pick one of these two values for this column.
Cardinality relationships between tables can take the form of one-to-one, one-to-many (whose reversal is many-to-one), or many-to-many. These terms simply refer to the relationships of data between the tables. For example, the relationship between the “Customers” table and the “Bank Accounts” table is one-to-many, that is, one customer can have several accounts, but one account cannot belong to more than one customer. That is, of course, assuming the bank has never heard of joint accounts!
Normal cardinality columns are those with a somewhat unique percentage of data values. For instance, if a table holds customer information, the “Last Name” column will have normal cardinality. Not every last name will be unique (for example, there will be several occurrences of “Smith”) but on the whole, the data will be fairly non-repetitive.
Low cardinality columns are those with very few unique values. In a customer table, a low cardinality column will be the “Gender” column. This column will likely only have “M” and “F” as the range of values to choose from, and all the thousands or millions of records in the table can only pick one of these two values for this column.
Cardinality relationships between tables can take the form of one-to-one, one-to-many (whose reversal is many-to-one), or many-to-many. These terms simply refer to the relationships of data between the tables. For example, the relationship between the “Customers” table and the “Bank Accounts” table is one-to-many, that is, one customer can have several accounts, but one account cannot belong to more than one customer. That is, of course, assuming the bank has never heard of joint accounts!
Comparing the Indexes
There are
several disadvantages to using a bitmap index on a unique column--one being the
need for sufficient space (and Oracle does not recommend it). However, the size
of the bitmap index depends on the cardinality of the column on which it is
created as well as the data distribution. Consequently, a bitmap index on the
GENDER column will be smaller than a B-tree index on the same column. In contrast,
a bitmap index on EMPNO (a candidate for primary key) will be much larger than
a B-tree index on this column. But because fewer users access decision-support
systems (DSS) systems than would access transaction-processing (OLTP) ones,
resources are not a problem for these applications.
To illustrate
this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I inserted one
million rows into the TEST_NORMAL table using a PL/SQL block, and then inserted
these rows into the TEST_RANDOM table in random order:
Create table
test_normal (empno number(10), ename varchar2(30), sal number(10));
Begin
For i in
1..1000000
Loop
Insert into
test_normal
values(i,
dbms_random.string('U',30), dbms_random.value(1000,7000));
If mod(i, 10000)
= 0 then
Commit;
End if;
End loop;
End;
/
Create table
test_random
as
select /*+ append
*/ * from test_normal order by dbms_random.random;
SQL> select
count(*) "Total Rows" from test_normal;
Total Rows
----------
1000000
Elapsed:
00:00:01.09
SQL> select
count(distinct empno) "Distinct Values" from test_normal;
Distinct Values
---------------
1000000
Elapsed:
00:00:06.09
SQL> select
count(*) "Total Rows" from test_random;
Total Rows
----------
1000000
Elapsed:
00:00:03.05
SQL> select
count(distinct empno) "Distinct Values" from test_random;
Distinct Values
---------------
1000000
Elapsed:
00:00:12.07
Note that the
TEST_NORMAL table is organized and that the TEST_RANDOM table is randomly
created and hence has disorganized data. In the above table, column EMPNO has
100-percent distinct values and is a good candidate to become a primary key. If
you define this column as a primary key, you will create a B-tree index and not
a bitmap index because Oracle does not support bitmap primary key indexes.
To analyze the
behavior of these indexes, we will perform the following steps:
1.
On TEST_NORMAL:
A.
Create a
bitmap index on the EMPNO column and execute some queries with equality
predicates.
B.
Create a
B-tree index on the EMPNO column, execute some queries with equality
predicates, and compare the logical and physical I/Os done by the queries to
fetch the results for different sets of values.
2.
On
TEST_RANDOM:
A.
Same as Step
1A.
B.
Same as Step
1B.
3.
On
TEST_NORMAL:
A.
Same as Step
1A, except that the queries are executed within a range of predicates.
B.
Same as Step
1B, except that the queries are executed within a range of predicates. Now
compare the statistics.
4.
On
TEST_RANDOM:
A.
Same as Step
3A.
B.
Same as Step
3B.
5.
On
TEST_NORMAL:
A.
Create a
bitmap index on the SAL column, and then execute some queries with equality
predicates and some with range predicates.
B.
Create a
B-tree index on the SAL column, and then execute some queries with equality
predicates and some with range predicates (same set of values as in Step 5A).
Compare the I/Os done by the queries to fetch the results.
6.
Add a GENDER
column to both of the tables, and update the column with three possible
values: M for male, F for
female, and nullfor N/A. This column
is updated with these values based on some condition.
7.
Create a
bitmap index on this column, and then execute some queries with equality
predicates.
8.
Create a
B-tree index on the GENDER column, and then execute some queries with equality
predicates. Compare to results from Step 7.
Steps 1 to 4
involve a high-cardinality (100-percent distinct) column, Step 5 a
normal-cardinality column, and Steps 7 and 8 a low-cardinality column.
Step 1A (on
TEST_NORMAL)
In this step,
we will create a bitmap index on the TEST_NORMAL table and then check for the
size of this index, its clustering factor, and the size of the table. Then we
will run some queries with equality predicates and note the I/Os of these
queries using this bitmap index.
SQL> create
bitmap index normal_empno_bmx on test_normal(empno);
Index created.
Elapsed:
00:00:29.06
SQL> analyze
table test_normal compute statistics for table for all indexes for all indexed
columns;
Table analyzed.
Elapsed:
00:00:19.01
SQL> select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2 from user_segments
3* where segment_name
in ('TEST_NORMAL','NORMAL_EMPNO_BMX');
SEGMENT_NAME Size in MB
------------------------------------ ---------------
TEST_NORMAL 50
NORMAL_EMPNO_BMX 28
Elapsed:
00:00:02.00
SQL> select
index_name, clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ ---------------------------------
NORMAL_EMPNO_BMX 1000000
Elapsed:
00:00:00.00
You can see in
the preceding table that the size of the index is 28MB and that the clustering
factor is equal to the number of rows in the table. Now let's execute the
queries with equality predicates for different sets of values:
SQL> set
autotrace only
SQL> select *
from test_normal where empno=&empno;
Enter value for
empno: 1000
old 1: select * from test_normal where
empno=&empno
new 1: select * from test_normal where
empno=1000
Elapsed:
00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4
Card=1 Bytes=34)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
d=1 Bytes=34)
2 1
BITMAP CONVERSION (TO ROWIDS)
3
2 BITMAP INDEX (SINGLE
VALUE) OF 'NORMAL_EMPNO_BMX'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0
redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Step 1B (on
TEST_NORMAL)
Now we will
drop this bitmap index and create a B-tree index on the EMPNO column. As
before, we will check for the size of the index and its clustering factor and
execute the same queries for the same set of values, to compare the I/Os.
SQL> drop
index NORMAL_EMPNO_BMX;
Index dropped.
SQL> create
index normal_empno_idx on test_normal(empno);
Index created.
SQL> analyze
table test_normal compute statistics for table for all indexes for all indexed
columns;
Table analyzed.
SQL> select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2 from user_segments
3 where segment_name in
('TEST_NORMAL','NORMAL_EMPNO_IDX');
SEGMENT_NAME Size in MB
---------------------------------- ---------------
TEST_NORMAL 50
NORMAL_EMPNO_IDX 18
SQL> select
index_name, clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
---------------------------------- ----------------------------------
NORMAL_EMPNO_IDX 6210
It is clear in
this table that the B-tree index is smaller than the bitmap index on the EMPNO
column. The clustering factor of the B-tree index is much nearer to the number
of blocks in a table; for that reason, the B-tree index is efficient for range
predicate queries.
Now we'll run
the same queries for the same set of values, using our B-tree index.
SQL> set autot
trace
SQL> select *
from test_normal where empno=&empno;
Enter value for
empno: 1000
old 1: select * from test_normal where
empno=&empno
new 1: select * from test_normal where
empno=1000
Elapsed:
00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4
Card=1 Bytes=34)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
d=1 Bytes=34)
2 1
INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
ost=3 Card=1)
Statistics
----------------------------------------------------------
29 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As you can
see, when the queries are executed for different set of values, the number of
consistent gets and physical reads are identical for bitmap and B-tree indexes
on a 100-percent unique column.
BITMAP
|
EMPNO
|
B-TREE
|
||
Consistent Reads
|
Physical Reads
|
Consistent Reads
|
Physical Reads
|
|
5
|
0
|
1000
|
5
|
0
|
5
|
2
|
2398
|
5
|
2
|
5
|
2
|
8545
|
5
|
2
|
5
|
2
|
98008
|
5
|
2
|
5
|
2
|
85342
|
5
|
2
|
5
|
2
|
128444
|
5
|
2
|
5
|
2
|
858
|
5
|
2
|
Step 2A (on
TEST_RANDOM)
Now we'll
perform the same experiment on TEST_RANDOM:
SQL> create
bitmap index random_empno_bmx on test_random(empno);
Index created.
SQL> analyze
table test_random compute statistics for table for all indexes for all indexed
columns;
Table analyzed.
SQL> select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2 from user_segments
3* where
segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');
SEGMENT_NAME Size in MB
------------------------------------ ---------------
TEST_RANDOM 50
RANDOM_EMPNO_BMX 28
SQL> select
index_name, clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ ---------------------------------
RANDOM_EMPNO_BMX 1000000
Again, the statistics
(size and clustering factor) are identical to those of the index on the
TEST_NORMAL table:
SQL> select *
from test_random where empno=&empno;
Enter value for
empno: 1000
old 1: select * from test_random where
empno=&empno
new 1: select * from test_random where
empno=1000
Elapsed:
00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4
Card=1 Bytes=34)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
2 1
BITMAP CONVERSION (TO ROWIDS)
3 2
BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Step 2B (on TEST_RANDOM)
Now, as in
Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO
column.
SQL> drop
index RANDOM_EMPNO_BMX;
Index dropped.
SQL> create
index random_empno_idx on test_random(empno);
Index created.
SQL> analyze
table test_random compute statistics for table for all indexes for all indexed
columns;
Table analyzed.
SQL> select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2 from user_segments
3 where segment_name in
('TEST_RANDOM','RANDOM_EMPNO_IDX');
SEGMENT_NAME Size in MB
---------------------------------- ---------------
TEST_RANDOM 50
RANDOM_EMPNO_IDX 18
SQL> select
index_name, clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
---------------------------------- ----------------------------------
RANDOM_EMPNO_IDX 999830
This table
shows that the size of the index is equal to the size of this index on
TEST_NORMAL table but the clustering factor is much nearer to the number of
rows, which makes this index inefficient for range predicate queries (which
we'll see in Step 4). This clustering factor will not affect the equality
predicate queries because the rows have 100-percent distinct values and the
number of rows per key is 1.
Now let's run
the queries with equality predicates and the same set of values.
SQL> select *
from test_random where empno=&empno;
Enter value for
empno: 1000
old 1: select * from test_random where
empno=&empno
new 1: select * from test_random where
empno=1000
Elapsed:
00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4
Card=1 Bytes=34)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
2 1
INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Again, the
results are almost identical to those in Steps 1A and 1B. The data distribution
did not affect the amount of consistent gets and physical reads for a unique
column.
Step 3A (on
TEST_NORMAL)
In this step,
we will create the bitmap index (similar to Step 1A). We know the size and the
clustering factor of the index, which equals the number of rows in the table.
Now let's run some queries with range predicates.
SQL> select *
from test_normal where empno between &range1 and &range2;
Enter value for
range1: 1
Enter value for
range2: 2300
old 1: select * from test_normal where empno
between &range1 and &range2
new 1: select * from test_normal where empno
between 1 and 2300
2300 rows selected.
Elapsed:
00:00:00.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=451 Card=2299 Bytes=78166)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299
Bytes=78166)
2 1
BITMAP CONVERSION (TO ROWIDS)
3 2
BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
331 consistent gets
0 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
Step 3B (on
TEST_NORMAL)
In this step,
we'll execute the queries against the TEST_NORMAL table with a B-tree index on
it.
SQL> select *
from test_normal where empno between &range1 and &range2;
Enter value for
range1: 1
Enter value for
range2: 2300
old 1: select * from test_normal where empno
between &range1 and &range2
new 1: select * from test_normal where empno
between 1 and 2300
2300 rows
selected.
Elapsed:
00:00:00.02
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=23 Card=2299 Bytes=78166)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299
Bytes=78166)
2 1
INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
329 consistent gets
15 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
When these queries
are executed for different sets of ranges, the results below show:
BITMAP
|
EMPNO (Range)
|
B-TREE
|
||
Consistent Reads
|
Physical Reads
|
Consistent Reads
|
Physical Reads
|
|
331
|
0
|
1-2300
|
329
|
0
|
285
|
0
|
8-1980
|
283
|
0
|
346
|
19
|
1850-4250
|
344
|
16
|
427
|
31
|
28888-31850
|
424
|
28
|
371
|
27
|
82900-85478
|
367
|
23
|
2157
|
149
|
984888-1000000
|
2139
|
35
|
As you can
see, the number of consistent gets and physical reads with both indexes is again
nearly identical. The last range (984888-1000000) returned almost 15,000 rows,
which was the maximum number of rows fetched for all the ranges given above. So
when we asked for a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and physical read counts were 7,239
and 5,663, respectively.
Step 4A (on
TEST_RANDOM)
In this step,
we will run the queries with range predicates on the TEST_RANDOM table with
bitmap index and check for consistent gets and physical reads. Here you'll see
the impact of the clustering factor.
SQL>select *
from test_random where empno between &range1 and &range2;
Enter value for
range1: 1
Enter value for
range2: 2300
old 1: select * from test_random where empno between
&range1 and &range2
new 1: select * from test_random where empno
between 1 and 2300
2300 rows
selected.
Elapsed:
00:00:08.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453
Card=2299 Bytes=78166)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299
Bytes=78166)
2 1
BITMAP CONVERSION (TO ROWIDS)
3 2
BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'
Statistics
----------------------------------------------------------
0 recursive calls
0
db block gets
2463 consistent gets
1200 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
Step 4B (on
TEST_RANDOM)
In this step,
we will execute the range predicate queries on TEST_RANDOM with a B-tree index
on it. Recall that the clustering factor of this index was very close to the
number of rows in a table (and thus inefficient). Here's what the optimizer has
to say about that:
SQL> select *
from test_random where empno between &range1 and &range2;
Enter value for
range1: 1
Enter value for
range2: 2300
old 1: select * from test_random where empno
between &range1 and &range2
new 1: select * from test_random where empno
between 1 and 2300
2300 rows
selected.
Elapsed:
00:00:03.04
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=613 Card=2299 Bytes=78166)
1 0
TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6415 consistent gets
4910 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
The optimizer
opted for a full table scan rather than using the index because of the clustering
factor:
BITMAP
|
EMPNO (Range)
|
B-TREE
|
||
Consistent Reads
|
Physical Reads
|
Consistent Reads
|
Physical Reads
|
|
2463
|
1200
|
1-2300
|
6415
|
4910
|
2114
|
31
|
8-1980
|
6389
|
4910
|
2572
|
1135
|
1850-4250
|
6418
|
4909
|
3173
|
1620
|
28888-31850
|
6456
|
4909
|
2762
|
1358
|
82900-85478
|
6431
|
4909
|
7254
|
3329
|
984888-1000000
|
7254
|
4909
|
For the last
range (984888-1000000) only, the optimizer opted for a full table scan for the
bitmap index, whereas for all ranges, it opted for a full table scan for the
B-tree index. This disparity is due to the clustering factor: The optimizer
does not consider the value of the clustering factor when generating execution
plans using a bitmap index, whereas for a B-tree index, it does. In this
scenario, the bitmap index performs more efficiently than the B-tree index.
The following
steps reveal more interesting facts about these indexes.
Step 5A (on
TEST_NORMAL)
Create a
bitmap index on the SAL column of the TEST_NORMAL table. This column has normal
cardinality.
SQL> create
bitmap index normal_sal_bmx on test_normal(sal);
Index created.
SQL> analyze
table test_normal compute statistics for table for all indexes for all indexed
columns;
Table analyzed.
Now let's get
the size of the index and the clustering factor.
SQL>select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2* from
user_segments
3* where
segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');
SEGMENT_NAME Size in MB
------------------------------ --------------
TEST_NORMAL 50
NORMAL_SAL_BMX 4
SQL> select
index_name, clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ ----------------------------------
NORMAL_SAL_BMX 6001
Now for the queries.
First run them with equality predicates:
SQL> set autot
trace
SQL> select *
from test_normal where sal=&sal;
Enter value for
sal: 1869
old 1: select * from test_normal where
sal=&sal
new 1: select * from test_normal where sal=1869
164 rows
selected.
Elapsed:
00:00:00.08
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=39 Card=168 Bytes=4032)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168
Bytes=4032)
2 1
BITMAP CONVERSION (TO ROWIDS)
3 2
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
165 consistent gets
0 physical reads
0 redo size
8461 bytes sent via SQL*Net to client
609 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
164 rows processed
and then with range
predicates:
SQL> select *
from test_normal where sal between &sal1 and &sal2;
Enter value for
sal1: 1500
Enter value for
sal2: 2000
old 1: select * from test_normal where sal
between &sal1 and &sal2
new 1: select * from test_normal where sal
between 1500 and 2000
83743 rows
selected.
Elapsed:
00:00:05.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=601 Card=83376 Bytes
=2001024)
1 0
TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
Bytes=2001024)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11778 consistent gets
5850 physical reads
0 redo size
4123553 bytes sent via SQL*Net to client
61901 bytes received via SQL*Net from client
5584 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
83743 rows processed
Now drop the bitmap
index and create a B-tree index on TEST_NORMAL.
SQL> create
index normal_sal_idx on test_normal(sal);
Index created.
SQL> analyze
table test_normal compute statistics for table for all indexes for all indexed
columns;
Table analyzed.
Take a look at the
size of the index and the clustering factor.
SQL> select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2 from user_segments
3 where segment_name in
('TEST_NORMAL','NORMAL_SAL_IDX');
SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_SAL_IDX 17
SQL> select
index_name, clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ ----------------------------------
NORMAL_SAL_IDX 986778
In the above table,
you can see that this index is larger than the bitmap index on the same column.
The clustering factor is also near the number of rows in this table.
Now for the
tests; equality predicates first:
SQL> set autot
trace
SQL> select *
from test_normal where sal=&sal;
Enter value for
sal: 1869
old 1: select * from test_normal where
sal=&sal
new 1: select * from test_normal where sal=1869
164 rows
selected.
Elapsed:
00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=169 Card=168 Bytes=4032)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168
Bytes=4032)
2 1
INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
177 consistent gets
0 physical reads
0 redo size
8461 bytes sent via SQL*Net to client
609 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
164 rows processed
...and then, range
predicates:
SQL> select *
from test_normal where sal between &sal1 and &sal2;
Enter value for
sal1: 1500
Enter value for
sal2: 2000
old 1: select * from test_normal where sal
between &sal1 and &sal2
new 1: select * from test_normal where sal
between 1500 and 2000
83743 rows
selected.
Elapsed:
00:00:04.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=601 Card=83376 Bytes
=2001024)
1 0
TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
Bytes=2001024)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11778 consistent gets
3891 physical reads
0 redo size
4123553 bytes sent via SQL*Net to client
61901 bytes received via SQL*Net from client
5584 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
83743 rows processed
When the queries were
executed for different set of values, the resulting output, as shown in the
tables below, reveals that the numbers of consistent gets and physical reads
are identical.
BITMAP
|
SAL (Equality)
|
B-TREE
|
Rows Fetched
|
||
Consistent Reads
|
Physical Reads
|
Consistent Reads
|
Physical Reads
|
||
165
|
0
|
1869
|
177
|
164
|
|
169
|
163
|
3548
|
181
|
167
|
|
174
|
166
|
6500
|
187
|
172
|
|
75
|
69
|
7000
|
81
|
73
|
|
177
|
163
|
2500
|
190
|
175
|
BITMAP
|
SAL (Range)
|
B-TREE
|
Rows Fetched
|
||
Consistent Reads
|
Physical Reads
|
Consistent Reads
|
Physical Reads
|
||
11778
|
5850
|
1500-2000
|
11778
|
3891
|
83743
|
11765
|
5468
|
2000-2500
|
11765
|
3879
|
83328
|
11753
|
5471
|
2500-3000
|
11753
|
3884
|
83318
|
17309
|
5472
|
3000-4000
|
17309
|
3892
|
166999
|
39398
|
5454
|
4000-7000
|
39398
|
3973
|
500520
|
For range
predicates the optimizer opted for a full table scan for all the different set
of values—it didn't use the indexes at all—whereas for equality predicates, the
optimizer used the indexes. Again, the consistent gets and physical reads are
identical.
Consequently,
you can conclude that for a normal-cardinality column, the optimizer decisions
for the two types of indexes were the same and there were no significant
differences between the I/Os.
Step 6 (add a GENDER
column)
Before
performing the test on a low-cardinality column, let's add a GENDER column to
this table and update it with M, F,
and null values.
SQL> alter
table test_normal add GENDER varchar2(1);
Table altered.
SQL> select
GENDER, count(*) from test_normal group by GENDER;
S COUNT(*)
- ----------
F 333769
M 499921
166310
3 rows selected.
The size of the
bitmap index on this column is around 570KB, as indicated in the table below:
SQL> create
bitmap index normal_GENDER_bmx on test_normal(GENDER);
Index created.
Elapsed:
00:00:02.08
SQL> select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2 from user_segments
3 where segment_name in
('TEST_NORMAL','NORMAL_GENDER_BMX');
SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_GENDER_BMX .5625
2 rows selected.
In contrast, the
B-tree index on this column is 13MB in size, which is much bigger than the
bitmap index on this column.
SQL> create
index normal_GENDER_idx on test_normal(GENDER);
Index created.
SQL> select
substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2 from user_segments
3 where segment_name in
('TEST_NORMAL','NORMAL_GENDER_IDX');
SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_GENDER_IDX 13
2 rows selected.
Now, if we execute a
query with equality predicates, the optimizer will not make use of this index,
be it a bitmap or a B-tree. Rather, it will prefer a full table scan.
SQL> select *
from test_normal where GENDER is null;
166310 rows
selected.
Elapsed:
00:00:06.08
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601
Card=166310 Bytes=4157750)
1 0
TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310
Bytes=4157750)
SQL> select *
from test_normal where GENDER='M';
499921 rows
selected.
Elapsed:
00:00:16.07
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=601 Card=499921 Bytes=12498025)
1 0
TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601
Card=499921Bytes=12498025)
SQL>select *
from test_normal where GENDER='F'
/
333769 rows
selected.
Elapsed:
00:00:12.02
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=601 Card=333769 Byte
s=8344225)
1 0
TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
Bytes=8344225)
Conclusions
Now that we
understood how the optimizer reacts to these techniques, let's examine a
scenario that clearly demonstrates the best respective applications of bitmap
indexes and B-tree indexes.
With a bitmap
index on the GENDER column in place, create another bitmap index on the SAL
column and then execute some queries. The queries will be re-executed with
B-tree indexes on these columns.
From the
TEST_NORMAL table, you need the employee number of all the male employees whose
monthly salaries equal any of the following values:
1000
1500
2000
2500
3000
3500
4000
4500
Thus:
SQL>select *
from test_normal
where sal in
(1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
This is a typical
data warehouse query, which, of course, you should never execute on an OLTP
system. Here are the results with the bitmap index in place on both columns:
SQL>select *
from test_normal
where sal in
(1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
1453 rows
selected.
Elapsed:
00:00:02.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=198 Card=754 Bytes=18850)
1 0
TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754
Bytes=18850)
2 1
BITMAP CONVERSION (TO ROWIDS)
3 2
BITMAP AND
4 3
BITMAP OR
5 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
6 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
7 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
8 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
9 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
10 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
11 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
12 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
13 4
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
14 3
BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1353 consistent gets
920 physical reads
0 redo size
75604 bytes sent via SQL*Net to client
1555 bytes received via SQL*Net from client
98 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1453 rows processed
And with the B-tree
index in place:
SQL>select *
from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000)
and GENDER='M';
1453 rows
selected.
Elapsed:
00:00:03.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=601 Card=754 Bytes=18850)
1 0
TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6333 consistent gets
4412 physical reads
0 redo size
75604 bytes sent via SQL*Net to client
1555 bytes received via SQL*Net from client
98 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1453 rows processed
As you can see
here, with the B-tree index, the optimizer opted for a full table scan, whereas
in the case of the bitmap index, it used the index to answer the query. You can
deduce performance by the number of I/Os required to fetch the result.
In summary,
bitmap indexes are best suited for DSS regardless of cardinality for these
reasons:
•
With bitmap
indexes, the optimizer can efficiently answer queries that include AND, OR, or
XOR. (Oracle supports dynamic B-tree-to-bitmap conversion, but it can be
inefficient.)
•
With bitmaps,
the optimizer can answer queries when searching or counting for nulls. Null
values are also indexed in bitmap indexes (unlike B-tree indexes).
•
Most
important, bitmap indexes in DSS systems support ad hoc queries, whereas B-tree
indexes do not. More specifically, if you have a table with 50 columns and
users frequently query on 10 of themeither the combination of all 10 columns
or sometimes a single columncreating a B-tree index will be very difficult. If
you create 10 bitmap indexes on all these columns, all the queries can be
answered by these indexes, whether they are queries on all 10 columns, on 4 or
6 columns out of the 10, or on a single column. The AND_EQUAL hint provides
this functionality for B-tree indexes, but no more than five indexes can be
used by a query. This limit is not imposed with bitmap indexes.
In contrast,
B-tree indexes are well suited for OLTP applications in which users' queries
are relatively routine (and well tuned before deployment in production), as
opposed to ad hoc queries, which are much less frequent and executed during
nonpeak business hours. Because data is frequently updated in and deleted from
OLTP applications, bitmap indexes can cause a serious locking problem in these
situations.
The data here
is fairly clear. Both indexes have a similar purpose: to return results as fast
as possible. But your choice of which one to use should depend purely on the
type of application, not on the level of cardinality.