22-09-2007noea/it fen databases/physicaldb 1 physical db on hardware, disc etc. file structures...

22
22-09-2007 NOEA/IT FEN Databases/Phy sicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited Query Processing and Optimisation

Upload: letitia-evans

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

1

Physical DB

On hardware, disc etc.

File Structures

Hashing

Index Structures

Search Trees and B trees revisited

Query Processing and Optimisation

Page 2: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

2

Storage Hierarchy

• Internal storage:– Static RAM (cache)– Dynamic RAM (main memory)

• External Memory (secondary storage):– Flash memory (memory sticks)– Hard Disc– DVD– CD-ROM– Tape– Floppy disc

Page 3: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

3

Hard Disc Drive Principle

Fig 5.1

Page 4: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

4

Sorted Files• Fixed record-size (direct access)• Records are kept sorted on a key field• Binary search may be applied:

examine middle elementIF (NOT found)

IF (search_element.key>middle_element.key)Search upper half of the file

ELSESearch lower half of the file

• Search in log(n) - not n/2 (n = number of records)• (log(1024) = 10)

Page 5: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

5

Hashing

• Compute an direct access index from a key• IF(collision) insert in a list• Average number of collisions:

– 1/(1-LF), where LF is the Load Factor in decimal, for instance 80% of the entries are in use: LF = 0.8, average number of collisions:1/(1-0.8) = 1/0.2 = 10/2 = 5

Note:The number of collisions is not dependent of the size of the hash table only of the load factor

Page 6: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

6

Hashing on Disc

Fig 5.12

Page 7: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

7

Index

• Sorted files have problems with insertion and deletion of new records:– Records are to be moved around to keep the file

sorted

• No possibilities for fast search on alternative keys– sort on SSN, how about searching on name, for

instance?

• Hence index – especially multilevel index

Page 8: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

8

2 Level Index

Fig. 6.6

Page 9: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

9

Binary Search Tree

• A Binary Tree is a tree-structure which is either empty or has a non-empty root element with a left and a right sub-tree, which themselves are binary trees.

• For a Binary Search Tree it also holds that if it is not empty:– all elements in the left sub-tree are less than the

root element– all elements in the right sub-tree are greater than

the root element– This property holds recursively down through the

tree

Page 10: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

10

Binary Search Tree - Ex.:

Insert Z1, Z2, Z3,…?

Page 11: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

11

Binary Search Tree - Ex.:

• Searchig the key value k:– Examine the root r:

• r.key == k – got it!!!• r.key < k – search the right sub-tree• r.key > k – search the left sub-tree

• Insertion of element x:– Search down the tree to an empty position and

insert there

Page 12: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

12

Binary Search Tree - Efficiency

• If the tree is balanced:– Searching in log(n) - (n number of elements in the tree)

(ex.: n = 1024 => log(1024) = 10 elements are accessed)

• But binary search trees have a tendency to become unbalanced– (if for instance input is sorted or if insertions and deletions are

made in an order that is uniformly distributed)

• It is expensive in running time to keep the tree balanced, hence Balaced Search Trees….

Page 13: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

13

Multi-way Search Trees

Fig. 6.8

Page 14: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

14

B-Trees

Fig. 6.10

Page 15: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

15

B-Trees - Principles• Node size (number of keys in a node) is adjusted according to

the block size of the file system• A node is always kept at least half-full:

– If an element is inserted, so the node is over flooded, then the node is split into two and the middle element is moved up one level. If this causes overflow in the node on the next level, then this is split and the middle element is moved up, and so on recursively until eventually a new root is created

– If an element is deleted, and this causes the node to become less than half-full, the node is merge with a sibling and elements are distributed between the new node and the parent. If this causes the parent to become less than half-full, the process is continued recursively up the tree until the root eventually is deleted.

• Hence a B-tree is always balanced, and searches, insertions and deletions can be performed in logarithmic time (log(n))

Page 16: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

16

B+ TreesAll data pointers are kept at the leaf level, and leafs are chained

together. Hence a total order is defined:

Page 17: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

17

Insertion in a B+ Tree

Fig. 6.12

Page 18: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

18

Deletion in a B+ Tree

Fig. 6.13

Page 19: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

19

Query-Optimisation

• Nested sub selects are handle in separate query-blocks

• Each query-block is transformed into a sequence equivalent algebra operations represented in a tree-structure

• This tree-structure can optimised using standard compiler optimisation techniques– For instance row-selection before join– Keep track of estimates of size of cross table

relations

Page 20: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

20

Join Algorithms

• Nested-loop or brute force– O(|A|*|B|)

• Single-loop: require index on at least one of the join attributes (in table B)– O(|A|*log(|B|))

• Sort-merge: only if both tables are physically sorted on the join attribute– minimises disc access

Page 21: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

21

Query-Optimisation

Lots of other tricks:DB2 for instance: conditions on attributes with index are executed first.

Assume index on lname:

SELECT *

FROM Employee

WHERE fname=’Kurt’

AND lname=’Jensen’

It will be much more efficient to find ’Jensen’ using an index and then ’Kurt’ using linear search than the other way around

Page 22: 22-09-2007NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited

22-09-2007 NOEA/IT FEN Databases/PhysicalDB

22

Opgave

• Undersøg, hvor meget af det foregående (og i givet fald hvordan), der er understøttet af MS SQL Server.