keynote: about bitmap indexes · andreas schmidt - dbkda 2013 keynote 9/55 bitmap index: basic idea...

55
Andreas Schmidt - DBKDA 2013 Keynote 1/55 The Fifth International Conference on Advances in Databases, Knowledge, and Data Applications January 27 - February 1, 2013 - Seville, Spain Keynote: About Bitmap Indexes Andreas Schmidt ([email protected]) Institute for Applied Sciences Karlsruhe Institute of Technologie PO-box 3640 76021 Karlsruhe Germany Department of Informatics and Business Information Systems University of Applied Sciences Karlsruhe Moltkestraße 30 76133 Karlsruhe Germany

Upload: others

Post on 30-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 1/55

The Fifth International Conference on Advances in Databases, Knowledge, and Data Applications

January 27 - February 1, 2013 - Seville, Spain

Keynote: About Bitmap Indexes

Andreas Schmidt

([email protected])

Institute for Applied Sciences

Karlsruhe Institute of Technologie

PO-box 3640

76021 Karlsruhe

Germany

Department of Informatics and Business Information Systems

University of Applied Sciences Karlsruhe

Moltkestraße 30

76133 Karlsruhe

Germany

Page 2: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 2/55

Outline

• Motivation

• What is a bitmap index ?

• Possible encodings

• equality encoding

• range encoding

• interval encoding

• High cardinality attributes (algorithms, ...)

• Encoding

• Binning

• Compression

• Bitmaps Indexes in Column-Stores

• Summary

Page 3: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

3/55Andreas Schmidt - DBKDA 2013 Keynote

Motivation: Access speed

Source: Adam Jacobs: The Pathologies of Big Data, acmqueue, July 2009

random, disk

sequential, disk

random, SSD

sequential, SSD

random, memory

sequential, memory

1.26 mbytes/sec

212.8 mbytes/sec

168.8 mbytes/sec

1468 mbytes/sec

14328 mbytes/sec

7.69 mbytes/sec

Comparision of random and sequential memory access

101 102 106103 104 105 107 108 109

(bytes)

Page 4: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 4/55

CPU Memory Gap

Source: Hennessy, J.L.; Patterson, D.A. Computer Organization and Design, 2nd

ed. San Francisco: Morgan Kaufmann Publishers, 1997.

Page 5: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 5/55

Memory Hierarchy

[Car02] Carlos Carvalho: The Gap between Processor and Memory Speeds, ICCA,

2002

typical access times:

register: 0.5 ns

L1 cache: 2 ns

L2 cache: 6 ns

Main Memory: 60 ns

Register

L1-Cache

L3-Cache

L2-Cache

Main Memory

Disk, NAS, ...

Page 6: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 6/55

Motivation: Cache Consciousness

Motivation experiment:

clock->start();

uint i = 0;

for (uint k = 1; k < steps; k++) {

x += mem[i];

i += stepwide;

}

clock->stop();

cout << stepwide << "\t" << clock->CPUTime() << endl;

Page 7: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 7/55

Access time in respect to gap width

Page 8: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 8/55

main-

memory

cacheline

load

memory request (from CPU)

(2)

(3)

request

(4)

load into CPU another memory request

(1)

cache manager

load into CPU(5)

(6)

x y

x ycache

steps (1)...(4): cache miss

steps (5) + (6): cache hit

Cache & Cacheline

Page 9: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 9/55

Bitmap Index: Basic Idea

• A bitmap consists of a number of 0/1 values (1 bit) in sequence.

• The length of a bitmap is defined by the number of datasets which

must be indexed (n)

• For an attribute with m possible values (cardinality) the bitmap-index

consists of m bitmaps

• Each possible value of an attribute is associated with one bitmap.

• The occurence of a 1-bit at position k means, that dataset k has the

correponding value of that bitmap

Page 10: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 10/55

Bitmap:

A = {male, female}

memory consumption:

mem = (2 * 106)/8 = 2.5 * 105 bytes

Test for specific value (i.e. ’male’):

memtest = 125000 bytes, sequential

Bitmap Index: Memory Consumption

• Table with 1 * 106 datasets, Attribute A, cardinality: 2 (gender)

B+-Tree:

A = {male, female}

memory consumption (only leafs):

mem = 4 * 106

Test for specific value (i.e. ’male’):

memtest = at least 2 * 106, sequential/

random

Page 11: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 11/55

Simple Bitmap Example (equality encoding)

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

1

1

1

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

1

0

0

1

0

sin

gle

wid

ow

ed

ma

rrie

d

Page 12: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 12/55

Simple Bitmap Example (equality encoding)

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

1

1

1

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

1

0

0

1

0

sin

gle

wid

ow

ed

ma

rrie

d

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

81 77 53 47 41 35 34 22

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

Page 13: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 13/55

Equality encoded bitmaps

• One bitmap per distinct attribute value

• Optimized for exact match queries like: Qe : ai = vj

• Instead of reading the plain data and evaluating the query, the matching bitmap is

used to find the matching tuples

• Advantage:

• Much less data to read

• Bitmap is cache conscious

• Hardware support for finding next ’1’ bit

• Queries like Q : a1 = v1 OR a1 = v2 OR a1 = v3 are mapped on hardware supported

OR-operation over multiple bitmaps

• Multidimensional queries like Q : a1 = v1 AND a2 = v2 AND a3 = v3 are mapped on

hardware supported AND-operation over multiple bitmaps

Page 14: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 14/55

Range Encoding

• What is about queries of type: Qe : age >= 40:

• OR-operations over all the bitmaps for values 41, 47, 53, 77, 81

• use of another encoding schema

• Range Encoding

• Encoding: like equality encoding, but additionally set all bits at position k for bit-

maps with lower values also to ’1’

• At most two bitmaps need to be accessed per query

[CY98] Chee-Yong Chan und Yannis Ioannidis: Bitmap Index Design and Evaluation.

Proceedings of the 1998 ACM SIGMOD Conference.

Page 15: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 15/55

Equality Encoding

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

0

0

1

0

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

[22] [34] [35] [41] [47] [53] [77] [81]

Page 16: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 16/55

Range Encoding [CY98]

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[22] [34] [35] [41] [47] [53] [77] [81]

Page 17: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 17/55

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[22] [34] [35] [41] [47] [53] [77] [81]

Page 18: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 18/55

Range Encoding [CY98]

22

34

35

41

47

53

77

81

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[22] [34] [35] [41] [47] [53] [77] [81]

[22] [34] [35] [41] [47] [53] [77] [81]

47

22

77

35

41

53

34

81

Page 19: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 19/55

Range Encoding [CY98]

22

34

35

41

47

53

77

81

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[22] [34] [35] [41] [47] [53] [77] [81]

[22] [34] [35] [41] [47] [53] [77] [81]

[22] [34] [35] [41] [47] [53] [77] [81]

47

22

77

35

41

53

34

81

Page 20: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 20/55

Query: age >= 41

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[34] [35] [41] [47] [53] [77] [81]

Page 21: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 21/55

Query: age >= 41

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[34] [35] [41] [47] [53] [77] [81]

Page 22: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 22/55

Query: age >= 41

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[34] [35] [41] [47] [53] [77] [81]

Page 23: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 23/55

Query: age < 41

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[34] [35] [41] [47] [53] [77] [81]

Page 24: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 24/55

Query: age < 41

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[34] [35] [41] [47] [53] [77] [81]

Page 25: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 25/55

Query: age < 41

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[34] [35] [41] [47] [53] [77] [81]

Page 26: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 26/55

Query: 35 =< age < 53

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

[34] [35] [41] [47] [53] [77] [81]

Page 27: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 27/55

Query: 35 =< age < 53

sex family_statname age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

1

1

1

0

1

0

0

1

!

[34] [35] [41] [47] [53] [77] [81]

Page 28: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 28/55

Query: 35 =< age < 53

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

1

1

1

0

1

0

0

1

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

AND =

0

1

0

0

1

0

0

1

[34] [35] [41] [47] [53] [77] [81]

Page 29: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 29/55

Query: 35 =< age < 53

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

1

1

0

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

1

1

1

0

1

0

0

1

0

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

AND =

0

1

0

0

1

0

0

1

[34] [35] [41] [47] [53] [77] [81]

Page 30: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 30/55

Interval Encoding

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

53

34

47

22

77

35

41

53

34

81

Chee-Yong Chan and Yannis E. Ioannidis. 1999. An efficient bitmap encoding scheme for selection

queries. In Proceedings of the 1999 ACM SIGMOD international conference on Management of

data (SIGMOD '99). ACM, New York, NY, USA

Basic Idea:

• Find |A| / 2 intervals, so that every equality, range or interval query could be

answered using at most 2 bitmaps

0

0

0

1

1

1

0

1

0

1

0

1

1

0

0

1

1

1

0

0

1

0

0

0

1

1

1

0

1

0

0

0

[22;41] [34;47] [35;53] [41;77]

Page 31: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 31/55

Interval Encoding

0

0

0

1

1

1

0

1

0

1

0

1

1

0

0

1

1

1

0

0

1

0

0

0

1

1

1

0

1

0

0

0

[22;41] [34;47] [35;53] [41;77]age

53

34

47

22

77

35

41

81

22

34

35

41

47

53

77

81

[22;41] [34;47] [35;53] [41;77]

Page 32: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 32/55

Interval Encoding

0

0

0

1

1

1

0

1

0

1

0

1

1

0

0

1

1

1

0

0

1

0

0

0

1

1

1

0

1

0

0

0

[22;41] [34;47] [35;53] [41;77]age

53

34

47

22

77

35

41

81

22

34

35

41

47

53

77

81

[22;41] [34;47] [35;53] [41;77]

[22;41] [34;47] [35;53] [41;77]

Page 33: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 33/55

Interval Encoding

• Queries:

• age = 22: [22; 41] AND [34;47]

• age =< 77:

• 35 < age =< 77:

22

34

35

41

47

53

77

81

[22;41] [34;47] [35;53] [41;77]

[22;41] [34;47] [35;53] [41;77]

Page 34: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 34/55

Interval Encoding

• Queries:

• age = 22: [22; 41] AND [34;47]

• age =< 77: [22;41] OR [41;77]

• 35 < age =< 77:

22

34

35

41

47

53

77

81

[22;41] [34;47] [35;53] [41;77]

[22;41] [34;47] [35;53] [41;77]

Page 35: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 35/55

Interval Encoding

• Queries:

• age = 22: [22; 41] AND [34;47]

• age =< 77: [22;41] OR [41;77]

• 35 < age =< 77: [41;77]

22

34

35

41

47

53

77

81

[22;41] [34;47] [35;53] [41;77]

[22;41] [34;47] [35;53] [41;77]

Page 36: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 36/55

Multidimensional Ad-Hoc Queries

• Bitmap-Index and B-trees are build for one attribute each

• Multi-dimensional queries require to access multiple indexes.

• Combining indexes (multidimensional query):

• B-Tree: The final result consists of the intersection/union/... of the pre-

liminary lists. Construction of final result is probably expensive (union

semantics, duplicate elimination)

• Bitmap-Index: Construction of final result is cheap and can be done in

constant time (small, fixed size, hardware supported AND/OR opera-

tions, no sorting necessary)

Page 37: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 37/55

sounds intuitive ... but

what is about high dimen-

sional cardinality

attributes? i.e.floating

point numbers ?

Page 38: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 38/55

High dimensional data

• Number of bitmaps depends on the cardinality of an attribute.

=> High cardinality leads to a high number of bitmaps for an attribute

• Example: 1 Million datasets with floating point numbers in ascending order

• Number of bitmaps: 106

• Size of a bitmap: 106/8 = 125000 bytes

• Size of bitmap index: 1.25 * 1011 bytes - upps !!

• Solutions:

• More compact encoding strategies (i.e. Wu and Buchmann, log2(|A|), [WB98])

• Binning

• Compression

Page 39: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 39/55

Binning

• One bitmap (bin) per value range

• Problem: not all results from the bitmap check belong to the final result

• Solution: Subsequent candidate check for the ambiguous values

• Goal: keep the number of values in the candidate check small

• Example: Algorithm GenericRangeEval [Stock01]

[Stock01] Stockinger, K.: Design and implementation of bitmap indices for scientific data,

International Symposium on Database Engineering and Applications, 2001.

Page 40: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 40/55

Binning Example

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

1

1

1

1

1

0

0

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

[0;20) [0;40) [0;60) [0;80) [0;100)

Query: age > 71

Bcandhits = [0;60) = [60;100)

Bcands = [0;60) XOR [0;80)

Page 41: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 41/55

Binning Example

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

1

1

1

1

1

0

0

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

[0;20) [0;40) [0;60) [0;80) [0;100)

Query: age > 71

Bcandhits = [0;60) = [60;100)

Bcands = [0;60) XOR [0;80)

0

0

0

0

0

1

1

0

[60;100)

!

Bcandhits

Page 42: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 42/55

Binning Example

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

1

1

1

1

1

0

0

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

[0;20) [0;40) [0;60) [0;80) [0;100)

Query: x > 71

Bcandhits = [0;60) = [60;100)

Bcands = [0;60) XOR [0;80)

XOR

0

0

0

0

0

1

0

0

=

Bcands

Page 43: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 43/55

Binning Example

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

1

1

1

1

1

0

0

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

[0;20) [0;40) [0;60) [0;80) [0;100)

Query: x > 71

Bcandhits = [0;60) = [60;100)

Bcands = [0;60) XOR [0;80)

XOR

candidate check

=

0

0

0

0

0

1

0

0

Bcands

Page 44: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 44/55

Binning Example

sex family_statename age

single

widowed

married

single

single

married

widowed

M

F

F

F

F

M

M

M

...

...

...

...

...

...

...

...

47

22

77

35

41

53

34

81

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

1

1

1

1

1

0

0

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

[0;20) [0;40) [0;60) [0;80) [0;100)

Query: x > 71

Bcandhits = [0;60) = [60;100)

Bcands = [0;60) XOR [0;80)

XOR

candidate check

=

0

0

0

0

0

0

0

0

skip to ’0’ if query is satisfied

Bcands

Page 45: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 45/55

Binning Example

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

Bhits = Bcandhits XOR Bcands(manually-checked) =

0

0

0

0

0

1

1

0

77

81

= XOR

Page 46: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 46/55

Compression

• Run Length Encoding allows operations (AND, OR, NOT, XOR) on compressed

bitmaps

• Algorithms: Byte Aligned Bitmap Code (BBC), Word Aligned Hybrid Code

(WAH), ...

• Studies [WU04] showed, that even with high cardinality attributes the size of a

compressed bitmap index has only about half the size of a B-tree.

[WU04] Kesheng Wu and Ekow Otoo and Arie Shoshani: On the Performance of Bitmap Indices

for High Cardinality Attributes, 2004

Page 47: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 47/55

OLAP vs. OLTP

• Uncompressed bitmaps also suitable for OLTP

• Update/Insert/Delete: O(1)

• Binning with high cardinality attributes

• Compression disqualifies bitmaps use for OLTP

• Update to expensive

• But: Append mode with immutable records is still possible

Page 48: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 48/55

Bitmaps in

Column-Stores

Page 49: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 49/55

Bitmap Indexes and Column Stores

ID Name Firstname date-of-birth sex

31 Waits Tom 1949-12-07 M

45 Benigni Roberto 1952-10-27 M

65 Jarmusch Jim 1953-01-22 M

77 Ryder Winona 1971-10-29 F

81 Rowlands Gena 1930-06-19 F

82 Perez Rosa 1964-09-06 F

Row-Store Column-Store

31 Waits Tom 1949-12-07 M

31

45 Benigni Roberto 1952-10-27 M

65 Jarmusch Jim 1953-01-22 M

77 Ryder Winona 1971-10-29 F

81 Rowlands Gena 1930-06-19 F

82 Perez Rosa 1964-09-06 F

45

65

77

81

82

Benigni

Jarmusch

Ryder

Rowlands

Perez

1952-10-27

1953-01-22

1971-10-29

1930-06-19

1964-09-06

M

M

F

F

F

Waits Tom 1949-12-07 M

Roberto

Jim

Winona

Gena

Rosa

Page 50: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 50/55

PositionLists

name

Waits

Begnini

Jarmusch

Ryder

Rowlands

Perez

birthdate

1949-12-07

1952-10-27

1953-01-22

1971-10-29

1930-06-19

1964-09-06

sex

M

M

M

F

F

F

PL2

4

6

5

PL1

1

3

5

2

sex=’F’birthdate < ’1960-01-01’

PL3

5

and

name

Rowlandsextract

select name

from person

where birthdate < ’1960-01-01’

and sex=’F’

Page 51: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 51/55

PositionLists

• PositionLists store Tuple IDs in ascending order (no duplicates)

• Implementation variants:

• Dynamic array

• uncompressed bitmap

• compressed bitmap

• Best implementation (performance/storage consumption) heavily

depends on selectivity

for more information about this, see Daniel Kimmig’s talk on wednesday after lunch

[SK03] Schmidt, A, Kimmig, D.: Considerations about implementation variants for position-lists. Accepted paper for the

Fifth International Conference on Advances in Databases, Knowledge, and Data Applications, Sevilla, Spain, 2013.

Page 52: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 52/55

Summary

• Sequential read characteristic and small footprint make bitmap-indexes an interest-

ing alternative to traditional B-Trees.

• For low cardinality attributes bitmap-indexes are superior in space and time require-

ments

• Bitmap-Indexes are very well suited for multidimensional ad-hoc queries

• With the concpets of

• binning

• compression,

bitmap indexes can also be used for attributes with high cardinality like floating

point numbers

• Bitmaps/Compressed bitmaps can also be used in Column-stores for the imple-

mentation of position lists.

Page 53: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 53/55

Literature

• [Car02] Carlos Carvalho: The Gap between Processor and Memory Speeds, ICCA,

2002

• [CI98] Chee-Yong Chan und Yannis Ioannidis: Bitmap Index Design and Evaluation.

Proceedings of the 1998 ACM SIGMOD Conference, 1999

• [CI99] Chee-Yong Chan and Yannis E. Ioannidis. 1999. An efficient bitmap encod-

ing scheme for selection queries. In Proceedings of the 1999 ACM SIGMOD inter-

national conference on Management of data (SIGMOD '99), 1999

• [Jac09] Adam Jacobs: The Pathologies of Big Data, acmqueue, 2009

• [Joh99] Theodore Johnson. Performance Measurements of Compressed Bitmap

Indices. In Proceedings of the 25th International Conference on Very Large Data

Bases,1999

• [Koud00] Nick Koudas: Space efficient bitmap indexing. In Proceedings of the ninth

international conference on Information and knowledge management, 2000

Page 54: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 54/55

Literature

• [NQ97] Patrick O'Neil and Dallan Quass: Improved query performance with variant

indexes. In Proceedings of the 1997 ACM SIGMOD international conference on

Management of data, 1997

• [RSW04] D Rotem, K Stockinger, Wu: Efficient binning for bitmap indices on high-

cardinality attributes, 2004

• [SK03] Schmidt, A, Kimmig, D.: Considerations about implementation variants for

position-lists. Accepted paper for the Fifth International Conference on Advances in

Databases, Knowledge, and Data Applications, Sevilla, Spain, 2013

• [Stoc01] Stockinger, K.; , "Design and implementation of bitmap indices for scien-

tific data," International Symposium on Database Engineering and Applications,

2001

• [Stoc02] K. Stockinger: Bitmap Indices for Speeding Up High-Dimensional Data

Analysis, 2002

Page 55: Keynote: About Bitmap Indexes · Andreas Schmidt - DBKDA 2013 Keynote 9/55 Bitmap Index: Basic Idea • A bitmap consists of a number of 0/1 values (1 bit) in sequence. • The length

Andreas Schmidt - DBKDA 2013 Keynote 55/55

Literature

• [SWS04] Kurt Stockinger, Kesheng Wu, and Arie Shoshani: Evaluation Strategies

for Bitmap Indices with Binning, 2004

• [WB98] Ming-Chuan Wu; Buchmann, A.P.; , "Encoded bitmap indexing for data

warehouses," 14th International Conference on Data Engineering, 1998

• [WOS01] Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2001. A performance

comparison of bitmap indexes. In Proceedings of the tenth international conference

on Information and knowledge management, 2001

• [WOS04] Wu K, Otoo E, Shoshani A: On the Performance of Bitmap Indices for

High Cardinality Attributes, 2004

• [WY98] Kun-Lung Wu; Yu, P.S.; , "Range-based bitmap indexing for high cardinality

attributes with skew," Computer Software and Applications Conference, 1998

• [YJ00] Sihem Amer-Yahia and Theodore Johnson. 2000. Optimizing Queries on

Compressed Bitmaps. In Proceedings of the 26th International Conference on Very

Large Data Bases, 2000