file structures dale-marie wilson, ph.d.. basic concepts primary storage main memory inappropriate...
Post on 22-Dec-2015
215 views
TRANSCRIPT
File Structures
Dale-Marie Wilson, Ph.D.
Basic Concepts
Primary storageMain memoryInappropriate for storing databaseVolatile
Secondary storagePhysical storage e.g. magnetic disksNonvolatileCheaper
Basic Concepts
2° storage organized into files Each file has one or more records Each record has one or more fields Process
User requests tuple e.g. SG37DBMS maps logical record to physical recordPhysical record moved to DBMS buffers
N.B. Physical record is unit of transfer between disk and primary storage
Basic Concepts Physical record typically consists of more than 1 logical
record Logical record can correspond to more than 1 physical
record Refer to physical record as blocks and pages
staffNo lName position branchNo
SL21 White manager B005
SG37 Beech Assistant B003
SG14 Ford Supervisor B003
SA9 Howe Assistant B007
SG5 Brand Manager B003
SL41 Lee Assistant B005
Page
1
2
Basic Concepts
File organization Physical arrangement of data in file into records and
pages in 2° storage Determines order records stored and accessed
Types Heap (unordered)
• Records place on disk in no specific order Sequential (ordered)
• Records ordered by value of specific field Hash
• Records placement determined by hash function
Basic Concepts
Access methodSteps involved in storing and retrieving
records from file
Heap Files
Unordered files Aka heap files Simplest organization Records placed in same order inserted Linear search for retrieval Insertion efficient; retrieval not efficient
Deletion process Relevant page identified Record marked as deleted Page rewritten to disk N.B. deleted record space not reused → performance
deterioration Best suited for bulk loading data
Ordered Files
Ordered filesAka sequential filesSorted on field – ordering fieldIf ordering field = key → ordering keyBinary search for retrievalInsertion and deletion problematic
• Need to maintain order of recordsRarely used unless 1° index exists
Hash Files
Hash files Aka random/direct files Hash function used to det. page address for storing record
• Chosen to provide most even distribution of records – min. collisions• Examples:
• Folding – applying arithmetic function to hash field e.g. + 7• Division-remainder – uses mod function to det. field value
• Each address corresponds to a page/bucket• Each bucket has slots for multiple records – placed in order of arrival
Base field – hash field If hash field = key → hash key
Collision Hash function does not calculate unique address for 2 or more
records
Hash Files
Collision management techniquesOpen addressingUnchained overflowChained overflowMultiple hashing
Collision Management
Open addressingLinear search performed to locate 1st
available slotSame procedure for searching for record
• Record doesn’t exist if empty slot found before record located
Collision Management Unchained overflow
Overflow area maintained for collisions Improves over open addressing by minimizing collisions
Staff SA9 recordStaff SL21 record
Staff SG5 recordStaff SG14 record
Staff SG37 record
Bucket
Staff SL41 record
Bucket
0
1
2
3
4
Collision Management
Chained overflow Overflow area maintained for collisions Uses synonym pointer
• Additional field that indicates whether collision occurred• If collision, contains bucket address of overflow area
Staff SA9 recordStaff SL21 record
Staff SG5 recordStaff SG14 record
Staff SG37 record
Bucket
Staff SG7 record
Bucket
0
1
2
3
4
0
0
3
Collision Management
Multiple hashingIf collision occurs, new hash function
performed2nd hash function typically used to place
record in overflow area
Indexes
Index Data structure that allows DBMS to locate particular
records in file more quickly Similar to index in book Main types of indices:
• Primary index• Index a key field
• Clustering index• File sequentially ordered on non-key field i.e. more than
record can correspond with index
• Secondary index• Index defined on non-ordering field of data file
Indexes
File can have:At most 1 primary or 1 clustering indexSeveral secondary indices
Index may be:Dense
• Index record for every search key valueSparse
• Index record for some key search values
Indexed Sequential Files
Indexed sequential fileSorted data file with primary indexHas:
• Primary storage area• Separate index• Overflow area
Multilevel Index
Multilevel indexIndex treated as file and split into smaller
indicesOvercomes problems with large indices that
span several pages
B+ Trees
Search Tree Used to guide search for a record, given the value of
one of its fields Two types of Nodes
• Internal Nodes contain Key values and node pointers
• Leaf Nodes contain Key, Record-Pointer pairs Degree/order
Max # children allowed B-tree – balanced tree
Depth from root to leaf same for every leaf
B+ Trees
The structure of internal nodes in a B+ tree of order p: Each internal node is of the form
<P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq > , where q <= p , each Pi is a tree pointer
Within each internal node, K1 < K2 < ... < Kq-1 For all values of X in the subtree pointed at by Pi , we have
Ki-1 < X < Ki for 1 < i < q , X < Ki for i=q, and Ki-1 < X for i=q Each internal node has at most p tree pointers Each internal node, except the root, has at least (p/2) tree
pointers. The root node has at least two tree pointers if it is an internal node.
An internal node with q pointers, q <= p, has q-1 search field values.
B+ Trees
B+ Trees
The structure of leaf nodes in a B+ tree of order p: Each leaf node is of the form < <K1,Pr1>, <K2,Pr2>, ..., <Kq-1,Prq-1>, Pnext > , where
q <= p , each Pri is a data pointer that points to a record or block of records
Within each internal node, K1 < K2 < ... < Kq-1 Each leaf node, has at least (p/2) values All leaf nodes are at the same level The Pnext pointer points to the next leaf node in
the tree• This give efficient sequential access to data
B+ Trees
B+ Trees
Insertion example for B+ Tree: When you insert into a leaf node that is
full, you split and pass the rightmost value up to the parent
When you insert into a full root, the root splits and a new root is created with the middle value from the child nodes
Otherwise, values are inserted into openings at the lowest level
Appendix F Assignment #7