indexing techniques

48
Indexing Techniques

Upload: alana-fulton

Post on 03-Jan-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Indexing Techniques. The Problem. What can we introduce to make search more efficient? Indices ! What is an index?. …. …. Anna. Paul. Tim. Page i. Page i+1. Definitions. Index: an auxiliary data structure to speed up record retrieval - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Indexing Techniques

Indexing Techniques

Page 2: Indexing Techniques

Advanced Databases Indexing Techniques 2

The Problem

• What can we introduce to make search more efficient?– Indices!

• What is an index?

… …

Page 3: Indexing Techniques

Advanced Databases Indexing Techniques 3

Definitions

• Index: an auxiliary data structure to speed up record retrieval• Search key: the field/s of a table which is/are indexed• Storage: index files that contain index records

– Each entry storing• Actual data record • or, search key value k and record ID <k,rid> • or, search key value k and list of records IDs <k,rid list>

• Types: ordered and unordered (hash) indices

Page i Page i+1

Paul

Anna

Tim

Page 4: Indexing Techniques

Advanced Databases Indexing Techniques 4

Types of Ordered Indices (1/3)

• Assuming ordered data files• Depending on which field is indexed

– Primary index: search key is ordering key field• Pointer for each page

– Secondary index: search key is non ordering field

Paul00112233Anna00112234Matt00112235Tim00112236

Carol00112237Rob00112238

00112233001122350011223600112238

AnnaCarolPaulTim

primary

secondary

Page 5: Indexing Techniques

Advanced Databases Indexing Techniques 5

Types of Ordered Indices (2/3)

• Depending on the density of index records– Dense index: an index record for each distinct search key value, ie

every record

– Sparse index: index records for only some search key values• search key value for first record in page• pointer to page

Paul00112233Anna00112234

Matt00112235

Tim00112236Carol00112237

Rob00112238

00112233001122350011223600112238

sparse

001122330011223400112235001122360011223700112238

dense

Page 6: Indexing Techniques

Advanced Databases Indexing Techniques 6

Types of Ordered Indices (3/3)

• Ordering field is nonkey (may have duplicates)– Clustered index

– Unclustered index

Paul00112233

Anna00112234

Matt00112235

Tim00112236

Carol00112237

Rob00112238Paul01112233

Tim01112236Tim02112236

AnnaCarolMattPaulRobTim

001122330011223400112235001122360011223700112238011122330111223602112236

clustered

unclustered

Page 7: Indexing Techniques

Advanced Databases Indexing Techniques 7

Indices Exercise

• 215 records• 128 bytes/record• 210 bytes/page• ordered file equality search on ordering field, unspanned

organization– without an index

– with a primary index• on field of size 12 bytes• assume pointer 4 bytes long

Page 8: Indexing Techniques

Advanced Databases Indexing Techniques 8

Multi-level Indices (1/2)

• If access using first-level index is still expensive• Build a sparse index on the first-level index

– Multi-level Index

• Fan-out: index blocking factor

Paul00112233Anna00112234

Matt00112235

Tim00112236Carol00112237

Rob00112238

0011223300112234

00112235

001122360011223700112238

00112233

00112235

00112236

first-level index

second-level index

Page 9: Indexing Techniques

Advanced Databases Indexing Techniques 9

Multi-level Indices (2/2)

• 26 index records/page (fan-out)• 215 index records• 1st-level

– 29 pages

• 2nd-level– 29 index records

– 23 pages

• 3rd-level– 23 index records

– 1 page

• 1 <= 215 / (26)t

• t = ceil(log26 215 ) = 3

• t = ceil(logfo#index-records)

Page 10: Indexing Techniques

Advanced Databases Indexing Techniques 10

Dynamic multi-level indices

• So far assumed indices are physically ordered files– expensive insertions and deletions

• Dynamic multi-level indices– B trees

– B+ trees

Page 11: Indexing Techniques

Advanced Databases Indexing Techniques 11

Tree-structured Indices

• For each node: K1 < K2 < … Kq-1

• For each value X in subtree pointed to by Pi

– Ki-1< X < Ki, 1<i<q

– X < Ki, i=1

– Ki-1< X, i=q

P1 K1 … Ki-1 Pi Ki … Kq-1 Pq

X X X

Page 12: Indexing Techniques

Advanced Databases Indexing Techniques 12

B tree

• Problems: empty nodes, unbalanced trees– solution: B trees

… …

… …

Page 13: Indexing Techniques

Advanced Databases Indexing Techniques 13

B tree: Definition

• Each node: <P1,<K1, Pr1>, P2,…,<Kq-1, Prq-1>, Pq>• Pi tree pointer, Ki search value, Pri data pointer • For each node: K1 < K2 < … Kq-1

• For each value X in subtree pointed to by Pi – Ki-1< X < Ki, 1<i<q– X < Ki, i=1– Ki-1< X, i=q

• Each node at most q pointers– B tree is order q

• Each node at least ceil(q/2) tree pointers– except from root

• Internal node with p pointers has p-1 values• All leaves at the same level

– balanced tree

Page 14: Indexing Techniques

Advanced Databases Indexing Techniques 14

B tree: Example

5 8

ø 1 ø 3 ø ø 6 ø 7 ø ø 9 ø 12 ø

tree pointer

data pointer

ø null pointer

Page 15: Indexing Techniques

Advanced Databases Indexing Techniques 15

B+ tree

• Most implementations of B tree are B+ tree• Data pointers only in leaves

– more entries in internal nodes than regular B trees

– less internal nodes

– less levels

– faster access

Page 16: Indexing Techniques

Advanced Databases Indexing Techniques 16

B+ tree: Definition

• Internal nodes: <P1,K1, P2,…, Pq-1, Kq-1, Pq>

• Leaf nodes: <<K1, Pr1>, <K2, Pr2>,…,<Kp-1, Prp-1>, Pnext>

• Pri points a data records or block of pointers of such records

• leaf order

120 150 180

150 156 179 180 200

100 101 110 120 130

Page 17: Indexing Techniques

Advanced Databases Indexing Techniques 17

100 101 110 120 130 150 156 179 180 200

3 5 11 30 35

120 150 18030

100

B+ tree: Search

• At each level, find smallest Ki larger than search key

• Follow associated pointer Pi

Page 18: Indexing Techniques

Advanced Databases Indexing Techniques 18

B+ tree: Insert

• Nodes may overflow or underflow• Ignoring overflow or underflow• Inserting data record with with search key value k

– find leaf node

– if k found• add record to file, create indirect block if there isn’t one• add record pointer to indirect block

– if k not found• add data record to file• insert record pointer in leaf node (all search keys in order)

Page 19: Indexing Techniques

Advanced Databases Indexing Techniques 19

B+ tree: Delete

• Ignoring overflow or underflow• Find leaf node with search key value k• Find data record pointer, delete record• delete index record

– and indirect block, if any, if empty

Page 20: Indexing Techniques

Advanced Databases Indexing Techniques 20

B+ tree: Simple Insert

• Insert 42

100 101 110 120 130 150 156 179 180 200

3 5 11 30 35

120 150 18030

100k < 100

42

Page 21: Indexing Techniques

Advanced Databases Indexing Techniques 21

B+ tree: Leaf Overflow (1/2)

• Insert 9

100 101 110 120 130 150 156 179 180 200

3 5 11 30 35 42

120 150 18030

100k < 100

Page 22: Indexing Techniques

Advanced Databases Indexing Techniques 22

B+ tree: Leaf Overflow (2/2)

• first ceil(n/2) in existing node, rest in new leaf node• n=3+1=4

100 101 110 120 130 150 156 179 180 200

120 150 1809 30

100k < 100

3 5 30 35 429 11

Page 23: Indexing Techniques

Advanced Databases Indexing Techniques 23

9 30

k < 100

3 5 30 35 429 11

B+ tree: Internal Node Overflow (1/3)

• Insert 210, insert 205

100 101 110 120 130 150 156 179 180 200 210

120 150 180

100

Page 24: Indexing Techniques

Advanced Databases Indexing Techniques 24

B+ tree: Internal Node Overflow (2/3)

• Leaf Split

9 30

k < 100

3 5 30 35 429 11

100 101 110 120 130 150 156 179 180 200

120 150 180

100

205 210

Page 25: Indexing Techniques

Advanced Databases Indexing Techniques 25

B+ tree: Internal Node Overflow (3/3)

9 30

k < 100

3 5 30 35 429 11

100 101 110 120 130 150 156 179 180 200

120

100 150

205 210

180 205

Page 26: Indexing Techniques

Advanced Databases Indexing Techniques 26

B+ tree: New Root (1/2)

• Insert 210, insert 205

100 101 110 120 130 150 156 179 180 200

120 150 180

205 210

Page 27: Indexing Techniques

Advanced Databases Indexing Techniques 27

B+ tree: New Root (2/2)

180 205

100 101 110 120 130 150 156 179 180 200

120

205 210

150

Page 28: Indexing Techniques

Advanced Databases Indexing Techniques 28

Index Insert Exercise

• Insert 8, 7, 41

9 30

3 5 30 35 429 11

Page 29: Indexing Techniques

Advanced Databases Indexing Techniques 29

B+ tree: Delete

• Simple delete case• Underflow case:

– redistribute records

– coalesce with siblings

– update parents

Page 30: Indexing Techniques

Advanced Databases Indexing Techniques 30

B+ tree: Simple Delete (1/2)

• Delete 110

180 205

100 101 110 120 130 150 156 179 180 200

120

205 210 215

150

Page 31: Indexing Techniques

Advanced Databases Indexing Techniques 31

B+ tree: Simple Delete (2/2)

• Leaf Updated

180 205

100 101 120 130 150 156 179 180 200

120

205 210 215

150

Page 32: Indexing Techniques

Advanced Databases Indexing Techniques 32

B+ tree: Delete Redistribution (1/2)

• Delete 180

180 205

100 101 120 130 150 156 179 180 200

120

205 210 215

150

Page 33: Indexing Techniques

Advanced Databases Indexing Techniques 33

B+ tree: Delete Redistribution (2/2)

• Redistribute entries– left or right sibling

179 205

100 101 120 130 150 156 179 200

120

205 210

150

Page 34: Indexing Techniques

Advanced Databases Indexing Techniques 34

B+ tree: Delete Coalesce (1/4)

• Delete 101

179 205

100 101 120 130 150 156 179 200

120

205 210 215

150

Page 35: Indexing Techniques

Advanced Databases Indexing Techniques 35

B+ tree: Delete Coalesce (2/4)

• Leaf updated• No redistribution

– sibling coalesce

179 205

100 120 130 150 156 179 200

120

205 210 215

150

Page 36: Indexing Techniques

Advanced Databases Indexing Techniques 36

B+ tree: Delete Coalesce (3/4)

• Leaf updated• No redistribution

– sibling coalesce

179 205

100 120 130 150 156 179 200

205 210 215

150

Page 37: Indexing Techniques

Advanced Databases Indexing Techniques 37

B+ tree: Delete Coalesce (4/4)

• Redistribution

205

100 120 130 150 156 179 200

150

205 210 215

179

Page 38: Indexing Techniques

Hashing Techniques

Page 39: Indexing Techniques

Advanced Databases Indexing Techniques 39

Static Hashing (1/2)

• Store records in buckets with overflow chains• Allocate a fixed number of buckets M• Problems:

– small M• long overflow chains, slow search-delete-insert

null

h

null

Page 40: Indexing Techniques

Advanced Databases Indexing Techniques 40

Static Hashing (2/2)

• Problems:– large M

• wasted space, slow scan null

h

null

null

Page 41: Indexing Techniques

Advanced Databases Indexing Techniques 41

Dynamic Hashing

• Splitting and coalescing buckets as the database grows-shrinks• One scheme: Extendible Hashing• Hash function generates large values, eg 32 bits

– use i bits, change i as database size changes

• If overflow, double the number of buckets– use i+1 bits of the hash function

– but, expensive: read all pages M and distribute records in 2*M pages

• solution: use a directory and double the size of the directory– only split bucket that overflowed

Page 42: Indexing Techniques

Advanced Databases Indexing Techniques 42

Extendible Hashing (1/4)

h(18) = 10010

2

01

00

11

10

16 20

2

1

2

2

Directory

Buckets

3 7

2

A

B

C

D

18

Page 43: Indexing Techniques

Advanced Databases Indexing Techniques 43

Extendible Hashing (2/4)

h(4) = 00100

2

01

00

11

10

16 20

2

1

2

2

3 7

2

A

B

C

D

18

Page 44: Indexing Techniques

Advanced Databases Indexing Techniques 44

Extendible Hashing (3/4)

2

01

00

11

10

16

3

1

2

2

3 7

2

A

B

C

D

18

20 4

3A1

Page 45: Indexing Techniques

Advanced Databases Indexing Techniques 45

Extendible Hashing (4/4)

3

001

000

011

010

16

3

1

2

2

3 7

2

A

B

C

D

18

20 4

3A1

101

100

111

110

• Global Depth• Local Depth• If bucket full:

– split bucket

– increment LD

• If GD=LD– increment GD

– double directory

Page 46: Indexing Techniques

Advanced Databases Indexing Techniques 46

Extendible Hashing: Delete

• If deletion make bucket empty– merge with split image

• If directory pointers point to same bucket as split image– directory halved

Page 47: Indexing Techniques

Advanced Databases Indexing Techniques 47

Extendible Hashing: Summary

• Avoids overflow pages• Directory can get large• Key search requires just 2 page reads• Space utilization fluctuates

– 59-90% for uniformly distributed records

Page 48: Indexing Techniques

Advanced Databases Indexing Techniques 48

Extendible Hashing: Exercise

• Initially GD = LD = 1• M = 2 buckets• Hash function: h(k) = k mod 2i

• inserts: 14, 18, 22, 3, 9• deletes 9, 22, 3

1

01

00

12 8

1

5

1