icom 5016 – introduction to database systems lecture 13- file structures dr. bienvenido vélez...

30
ICOM 5016 – Introduction to ICOM 5016 – Introduction to Database Systems Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by Dr. Manuel Rodríguez

Upload: joy-cobb

Post on 18-Jan-2018

236 views

Category:

Documents


0 download

DESCRIPTION

ICOM 5016Dr. Manuel Rodriguez Martinez 3 Relational DBMS Architecture Disk Space Management Buffer Management File and Access Methods Relational Operators Query Optimizer Query Parser Client API Client DB Execution Engine Concurrency and Recovery

TRANSCRIPT

Page 1: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 – Introduction to ICOM 5016 – Introduction to Database SystemsDatabase Systems

Lecture 13- File StructuresDr. Bienvenido Vélez

Electrical and Computer Engineering Department

Slides by Dr. Manuel Rodríguez

Page 2: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 2

ReadingsReadings

• Read– New Book: Chapter 13

Page 3: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 3

Relational DBMS ArchitectureRelational DBMS Architecture

Disk Space Management

Buffer Management

File and Access Methods

Relational Operators

Query Optimizer

Query Parser

Client API

Client

DB

ExecutionEngine Concurrency

and Recovery

Page 4: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 4

Disk Space ManagemetDisk Space Managemet

• Disk Space Manager– DBMS module in charge of managing the disk space used to

store relations– Duties

• Allocate space• Write data• Read data• De-allocate space

• Disk Space Manager supplies a stream of data pages.– Minimal unit of I/O– Often the size of a block (sector, several sectors, or more)

Page 5: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 5

Relational DBMS ArchitectureRelational DBMS Architecture

Disk Space Management

Buffer Management

File and Access Methods

Relational Operators

Query Optimizer

Query Parser

Client API

Client

DB

ExecutionEngine Concurrency

and Recovery

Page 6: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 6

Buffer ManagementBuffer Management

• Buffer Manager supplies a stream of memory data pages.– Often the size of a block (sector, several sectors, or more)– Must provide in-memory access to more pages than

physically fit in memory– Must implement a page replacement policy– Implements a cache of disk blocks

Page 7: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 7

Relational DBMS ArchitectureRelational DBMS Architecture

Disk Space Management

Buffer Management

File and Access Methods

Relational Operators

Query Optimizer

Query Parser

Client API

Client

DB

ExecutionEngine Concurrency

and Recovery

Page 8: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 8

Disk PageDisk Page

• Disk page is simply an array of bytes• We impose the logic of an array of records!

123 Bob NY $1200

2178 Jil LA $9202

8273 Ned FL $2902

723 Al PR $300

Disk Page Records

Reading a Disk Page should be one I/O

Page 9: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 9

Storage arrangement optionsStorage arrangement options

• Suppose we need to create 10 GB of space to store a database. Each page is 4 KB is size.– How to organize the disk to accomplish this.

• Option #1: Cooked File System– User the file system provide by OS– Create a file “mydb.dat”– Write to this file N pages of size 4KB

• N must be enough to reach the size of 10GB• Page are full of bytes with zeros.

– Have a has table somewhere to store the information about this file “mydb.dat”.

– Now you can start writing pages with actual data.

Page 10: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 10

Option #2: Raw Disk PartitionOption #2: Raw Disk Partition

• Don’t use the file provided by the DBMS• Instead create a parition on the disk, but don’t format

it with OS formats (e.g. FAT, FAT32, NTFS, LINUX)• Make your own file system on the disk

– Create a directory of pages– Need to implement all operation such as read, write, check,

etc. – Need to implement you own files…– Faster and more efficient that OS files, but more complex.– Provided by more advanced (and expensive) DBMS’s

Page 11: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

Alternative DBMS Views of the Alternative DBMS Views of the Storage SystemStorage System

• Single File– Each database is stored as a single file– Example: SQLite– Pro: Easy to store and exchange databases– Cons: Concurrent/Transactional access very hard

• File System– Each database is stored as multiple files– Example: MySQL– Pro: Concurrent/Transactional access good, DBMS portable and simpler– Cons: DBMS must rely on OS for performance

• Block System (Raw Disk)– The DBMS implements its own storage system on raw disk partition– Example: Oracle 10g– Pro: RDBMS can achieve maximal performance– Cons: RDBMS complex/expensive

Page 12: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 12

File and Access Methods LayerFile and Access Methods Layer

• Buffer Manager provides a stream of pages• But higher layers of DBMS need to see a stream of

records• A DBMS file layers provides this abstraction

– File is a collection of records that belong to a relation R.• For example: Relation students might be stored in DBMS

internal file students.dat. This is internal to DBMS database!!!– File is made out of pages, and records are taken from pages

• File and Access Methods Layer implements various types of files to access the records– Access method – mechanism by which the records are

extracted from the DBMS

Page 13: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 13

File TypesFile Types

• Heap File - Unordered collection of records– Records within a page a not ordered– Pages are not ordered– Simple to use and implement

• Sorted File – sorted collection or records– Within a page, records are ordered– Pages are ordered based on record contents– Efficient access to data, but expensive to maintain

• Index File – combines storage + data structure for fast access and lookups– Index entries – store value of attributes as search keys– Data entries – hold the data in the index file

Page 14: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 14

Heap FileHeap File

123 Bob NY $102

8387 Ned SJ $73

121 Jil NY $5595

81982 Tim MIA $4000

2381 Bill LA $500

4882 Al SF $52303

9403 Ned NY $3333

1237 Pat WI $30

Page 0

Page 1

Page 2

Page 15: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 15

Sorted FileSorted File

121 Jil NY $5595

123 Bob NY $102

1237 Pat WI $30

2381 Bill LA $500

4882 Al SF $52303

8387 Ned SJ $73

9403 Ned NY $3333

81982 Tim MIA $4000

Page 0

Page 1

Page 2

Page 16: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 16

Index FileIndex File

121 Jil NY $5595

123 Bob NY $102

1237 Pat WI $30

2381 Bill LA $500

4882 Al SF $52303

8387 Ned SJ $73

9403 Ned NY $3333

81982 Tim MIA $4000

100

2000

9000

Index entry

Dataentries

Page 17: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 17

Index files structureIndex files structure

• Index entries – Store search keys– Search key – a set of attributes in a tuple can be used to

guide a search• Ex. Student id

– Search key do not necessarily have to be candidate keys• For example: gpa can be a search key on relation: Students(sid, name, login, age, gpa)

• Data entries– Store the data records in the index file– Data record can have

• Actual tuples for the table on which index is defined• Record identifier for tuples that match a given search key

Page 18: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 18

Issues with Index filesIssues with Index files

• Index files for a relation R can occur in three forms:– Data entries store the actual data for relation R.

• Index file provides both indexing and storage.– Data entries store pairs <k, rid>:

• k – value for a search key.• rid – rid of record having search key value k.• Actual data record is stored somewhere else, perhaps on a

heap file or another index file .– Data entries store pairs <k, rid-list>

• K – value for a search key• Rid-list – list of rid for all records having search key value k• Actual data record is stored somewhere else, perhaps on a

heap file or another index file.

Page 19: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 19

Operations on filesOperations on files

• Allocate file• Scan operations

– Grab each records one after one • Can be used to step through all records

• Insert record– Adds a new record to the file

• Each record as a unique identifier called the record id (rid)

• Update record• Find record with a given rid• Delete record with a given rid• De-allocate file

Page 20: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 20

Implementing Heap FilesImplementing Heap Files

• Heap file links a collection of pages for a given relation R.

• Heap files are built on top of Buffer Manager.• Each page has a page id

– Often, we need to know the page size (e.g. 4KB)• All pages for a given file have the same size.

– Page id and page size can be used to compute an offset in a cooked file where the page is located.

– In raw disk partition, page id should enable DBMS to find block in disk where the page is located.

Page 21: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 21

Linked Implementation of Heap FilesLinked Implementation of Heap Files

HeaderPage

DataPage

DataPage

DataPage

DataPage

DataPage

Linked List of pagesWith free space

Linked List of full pages

Page 22: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 22

Linked List of pagesLinked List of pages• Each page has:

– records – pointer to next page – pointer to previous page

• Pointer here means the integer with the page id of the next page.

• Header has two pointers– First page in the list of pages with free space– First page in the list of full pages

• Tradeoffs– Easy to use, good for fixed sized records– Complex to find space for variable length records

• need to iterate over list with space

Page 23: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 23

Directory of pagesDirectory of pagesData

Page 1

DataPage 2

DataPage 3

DataPage N

.

.

.

header

Page 24: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 24

Directory of pagesDirectory of pages

• Linked list of directory pages • Directory page has

– Pointer to a given page– Bit indicating if page is full or not– Alternatively, have amount of space that is available

• More complex to implement• Makes it easier to find page with enough room to

store a new record

Page 25: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 25

Page formatsPage formats

• Each page holds– records– Optional metadata for finding records within the page

• Page can be visualized as a collection of slots where records can be placed

• Each record has a record id in the form:– <page_id, slot number>– page_id – id of the page where the record is located.– slot number – slot where the record is located.

Page 26: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 26

Packed Fixed-Length RecordPacked Fixed-Length Record

. . .

Slot 1Slot 2Slot 3

N

Slot N

Free Space

Page header

numberof records

Page 27: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 27

Unpacked Fixed-Length RecordUnpacked Fixed-Length Record

. . .

Slot 1Slot 2Slot 3

N

Slot N

Free Space

Page header

numberof slots

1013 2 1

Slot bit vector

Page 28: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 28

Variable-Length recordsVariable-Length records

rid=(i,N)

Page i

rid=(2,N)rid=(1,N)

12 … 15 24 N

FreeSpace

N … 2 1entries

12 bytes

Page 29: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 29

Fixed-Length records for variable-Fixed-Length records for variable-length attributeslength attributes• Size of each record is determined by maximum size

of the data type in each column

F1 F2 F3 F4

2 6 2 4Size in bytes

Offset of F1: 0Offset of F2: 2Offset of F3: 8Offset of F4: 10

Need to understand the schema and sizes to finda given column

Page 30: ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by

ICOM 5016 Dr. Manuel Rodriguez Martinez 30

Variable-length records and attributesVariable-length records and attributes

• Either 1. use a special symbol to separate fields2. use a header to indicate offset of each field

F1 $ F2 $ F3 $ F4Option 1

F1 F2 F3 F4Option 2

Option 1 has the problem of determining a good $Option 2 handles NULL easily