file structures dale-marie wilson, ph.d.. basic concepts primary storage main memory inappropriate...

27
File Structures Dale-Marie Wilson, Ph.D.

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

File Structures

Dale-Marie Wilson, Ph.D.

Page 2: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Basic Concepts

Primary storageMain memoryInappropriate for storing databaseVolatile

Secondary storagePhysical storage e.g. magnetic disksNonvolatileCheaper

Page 3: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Basic Concepts

2° storage organized into files Each file has one or more records Each record has one or more fields Process

User requests tuple e.g. SG37DBMS maps logical record to physical recordPhysical record moved to DBMS buffers

N.B. Physical record is unit of transfer between disk and primary storage

Page 4: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Basic Concepts Physical record typically consists of more than 1 logical

record Logical record can correspond to more than 1 physical

record Refer to physical record as blocks and pages

staffNo lName position branchNo

SL21 White manager B005

SG37 Beech Assistant B003

SG14 Ford Supervisor B003

SA9 Howe Assistant B007

SG5 Brand Manager B003

SL41 Lee Assistant B005

Page

1

2

Page 5: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Basic Concepts

File organization Physical arrangement of data in file into records and

pages in 2° storage Determines order records stored and accessed

Types Heap (unordered)

• Records place on disk in no specific order Sequential (ordered)

• Records ordered by value of specific field Hash

• Records placement determined by hash function

Page 6: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Basic Concepts

Access methodSteps involved in storing and retrieving

records from file

Page 7: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Heap Files

Unordered files Aka heap files Simplest organization Records placed in same order inserted Linear search for retrieval Insertion efficient; retrieval not efficient

Deletion process Relevant page identified Record marked as deleted Page rewritten to disk N.B. deleted record space not reused → performance

deterioration Best suited for bulk loading data

Page 8: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Ordered Files

Ordered filesAka sequential filesSorted on field – ordering fieldIf ordering field = key → ordering keyBinary search for retrievalInsertion and deletion problematic

• Need to maintain order of recordsRarely used unless 1° index exists

Page 9: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Hash Files

Hash files Aka random/direct files Hash function used to det. page address for storing record

• Chosen to provide most even distribution of records – min. collisions• Examples:

• Folding – applying arithmetic function to hash field e.g. + 7• Division-remainder – uses mod function to det. field value

• Each address corresponds to a page/bucket• Each bucket has slots for multiple records – placed in order of arrival

Base field – hash field If hash field = key → hash key

Collision Hash function does not calculate unique address for 2 or more

records

Page 10: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Hash Files

Collision management techniquesOpen addressingUnchained overflowChained overflowMultiple hashing

Page 11: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Collision Management

Open addressingLinear search performed to locate 1st

available slotSame procedure for searching for record

• Record doesn’t exist if empty slot found before record located

Page 12: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Collision Management Unchained overflow

Overflow area maintained for collisions Improves over open addressing by minimizing collisions

Staff SA9 recordStaff SL21 record

Staff SG5 recordStaff SG14 record

Staff SG37 record

Bucket

Staff SL41 record

Bucket

0

1

2

3

4

Page 13: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Collision Management

Chained overflow Overflow area maintained for collisions Uses synonym pointer

• Additional field that indicates whether collision occurred• If collision, contains bucket address of overflow area

Staff SA9 recordStaff SL21 record

Staff SG5 recordStaff SG14 record

Staff SG37 record

Bucket

Staff SG7 record

Bucket

0

1

2

3

4

0

0

3

Page 14: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Collision Management

Multiple hashingIf collision occurs, new hash function

performed2nd hash function typically used to place

record in overflow area

Page 15: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Indexes

Index Data structure that allows DBMS to locate particular

records in file more quickly Similar to index in book Main types of indices:

• Primary index• Index a key field

• Clustering index• File sequentially ordered on non-key field i.e. more than

record can correspond with index

• Secondary index• Index defined on non-ordering field of data file

Page 16: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Indexes

File can have:At most 1 primary or 1 clustering indexSeveral secondary indices

Index may be:Dense

• Index record for every search key valueSparse

• Index record for some key search values

Page 17: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Indexed Sequential Files

Indexed sequential fileSorted data file with primary indexHas:

• Primary storage area• Separate index• Overflow area

Page 18: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Multilevel Index

Multilevel indexIndex treated as file and split into smaller

indicesOvercomes problems with large indices that

span several pages

Page 19: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

B+ Trees

Search Tree Used to guide search for a record, given the value of

one of its fields Two types of Nodes

• Internal Nodes contain Key values and node pointers

• Leaf Nodes contain Key, Record-Pointer pairs Degree/order

Max # children allowed B-tree – balanced tree

Depth from root to leaf same for every leaf

Page 20: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

B+ Trees

The structure of internal nodes in a B+ tree of order p: Each internal node is of the form

<P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq > , where q <= p , each Pi  is a tree pointer

Within each internal node, K1 < K2 < ... < Kq-1  For all values of X in the subtree pointed at by Pi , we have

Ki-1 < X < Ki  for 1 < i < q , X < Ki for i=q, and Ki-1 < X for i=q Each internal node has at most p tree pointers Each internal node, except the root, has at least (p/2) tree

pointers. The root node has at least two tree pointers if it is an internal node.

An internal node with q pointers, q <= p, has q-1 search field values.

Page 21: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

B+ Trees

Page 22: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

B+ Trees

The structure of leaf nodes in a B+ tree of order p: Each leaf node is of the form < <K1,Pr1>,  <K2,Pr2>,  ..., <Kq-1,Prq-1>,  Pnext > , where

q <= p , each Pri  is a data pointer that points to a record or block of records

Within each internal node, K1 < K2 < ... < Kq-1  Each leaf node, has at least (p/2) values All leaf nodes are at the same level The Pnext pointer points to the next leaf node in

the tree• This give efficient sequential access to data

Page 23: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

B+ Trees

Page 24: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

B+ Trees

Insertion example for B+ Tree: When you insert into a leaf node that is

full, you split and pass the rightmost value up to the parent

When you insert into a full root, the root splits and a new root is created with the middle value from the child nodes

Otherwise, values are inserted into openings at the lowest level

Page 25: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical
Page 26: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical
Page 27: File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical

Appendix F Assignment #7