chapter 3 representing data elements 1.how to lay out data on disk 2.how to move it to memory

63
Chapter 3 Representing Data Elements 1. How to lay out data on disk 2. How to move it to memory

Upload: reynard-scot-rodgers

Post on 02-Jan-2016

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Chapter 3

Representing Data Elements

1. How to lay out data on disk

2. How to move it to memory

Page 2: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Attributes

Records

Blocks

Files

Memory

Data Elements

Page 3: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Representing Relational Database Elements

CREATE TABLE MovieStar(

name CHAR(30) PRIMARY KEY,

salary INTEGER,

address VARCHAR(255),

gender CHAR(1),

birthdate DATE

);

Page 4: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Representing Objects

Interface star{

attribute string name;

attribute Struct Addr{

string street, string city} address;

relationship set<movie> starredIn

inverse Movie::stars;

Page 5: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Some Differences

• Objects can have methods.

• Objects have an object identifier.

• Objects can have relationships to other objects.

So we need a way to represent address as objects have OID, and may have relationships to other objects. We also need the ability to represent arbitrarily long lists of objects, such as arbitrarily long lists of movies for a given star.

Page 6: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

What are the data attributes we want to store?

• a salary

• a name

• a date

• a picture

What we have available: Bytes

8bits

Page 7: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

To represent:

• Integer :2 bytes (short), 4 bytes (long)e.g., 35 is

00000000 00100011

• Float: 4 bytes, Double Float: 8 bytes

Page 8: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• Characters

various coding schemes suggested,

most popular is ASCII

To represent:

Example:A: 1000001a: 11000015: 0110101LF: 0001010

Page 9: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• Fixed-Length Character Strings

To represent:

A CHAR(5)

A: cat cat

Here is the “pad” character, whose 8-bit code is not one of legalcharacters for SQL strings

Page 10: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• Variable-Length Character Strings– Null terminated

e.g.,

– Length given

e.g.,

c ta

c ta3

To represent:

The SQL type VARCHAR(n) actually represents fields of fixed length, although its value has a length that varies, as n+1 bytes are dedicated to the value of the string regards of how long it is.

Page 11: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

To represent:

SQL2 Standard :

Date : YYYY-MM-DD

Time : HH:MM:SS

20:19:02 1958-05-14

Time to include fractions of a second :‘ 20:19:02.25’

Page 12: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• Bag of bits

Length Bits

To represent:

A sequence of bits are packed eight to a byte.

Page 13: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• Boolean

e.g., TRUE FALSE

1111 1111

0000 0000

To represent:

1000 0000

0000 0000

Page 14: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

To represent:

• Enumerated Typese.g., RED 1 GREEN 3

BLUE 2 YELLOW 4 …

Can we use less than 1 byte/code?Yes, but only if with less than 256 values

Page 15: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Representing Records

• Fixed-Length Records

• Variable-Length Records

• Record Addresses

• Record Modification

Page 16: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Record - Collection of related dataitems (called FIELDS)

E.g.: Employee record:

name field,

salary field,

date-of-hire field, ...

Page 17: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Types of records:

• Main choices:– FIXED vs VARIABLE FORMAT– FIXED vs VARIABLE LENGTH

Page 18: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Fixed-Length Records

0 30 286 287 297

name address birthdate

gender

A MovieStar Record

name address birthdate

gender

0 32 288 292 304

The layout of MovieStar tuples when fields are required to start at multiple of 4 bytes

Page 19: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

A SCHEMA (not record) contains

following information

- # fields

- type of each field

- order in record

- meaning of each field

Fixed format

Page 20: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Example: fixed format and length

Employee record

(1) E#, 2 byte integer

(2) E.name, 10 char. Schema

(3) Dept, 2 byte code

55 s m i t h 02

83 j o n e s 01

Records

Page 21: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

To schema

lengthtimestamp

name address

gender

birthdate

0 12 44 300 304 316header

Adding head information to records representing tuples of the Moviestar relation

The attributes of the relation

Their types

The order in which attributes appear in the tuple

Constraints on the attributes and relation itself

Page 22: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Header Record1 Record2 … Record n

Packing Fixed-Length Records into Blocks

• Links to one or more other blocks that are part of a network of blocks.

•Information about the role played by this block in such a network.

•Information about which relation the tuples of this block belong to.

• A “ directory” giving the offset of each record in the block

• A “block ID”

• Timestamp(s) indicating the time of the block ‘s last modification and/or access

A typical block holding records

Page 23: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Block and Record Addresses

Client Server

Application Address Space Database Address Space

Logical Addresses

Physical Address

Memory Addresses

Page 24: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• How does one refer to records?

Indirection

Rx

Many options:

Physical Indirect

Page 25: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Purely Physical

Device ID

E.g., Record Cylinder #

Address = Track #

or ID Block #

Offset in block

Block ID

Page 26: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Physical Addresses: (1) Host Name(2) Device ID(3) Cylinder #(4) Track #(5) Block #(6) Offset

Logical Addresses: Byte strings of some fixed length

logical physical

Logical address

Physical address

A map table translates logical to physical address

Page 27: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Fully Indirect

E.g., Record ID is arbitrary bit string

map

rec ID

r address

a

Physicaladdr.Rec ID

Page 28: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Tradeoff

Flexibility Cost

to move records of indirection

(for deletions, insertions)

Page 29: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Physical Indirect

Many optionsin between …

Page 30: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Logical and Structured Addresses

• A physical address of the block + the key value for the record being referred to.

• A physical address of the block + the offset of the entry in the block’s offset table for the record being referred to.

Page 31: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Ex #1 Indirection in block

Header

A block: Free

space

R3

R4

R1 R2

Page 32: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Block header - data at beginning that describes block

May contain:- File ID (or RELATION or DB ID)

- This block ID - Record directory

- Pointer to free space- Type of block (e.g. contains recs type 4;

is overflow, …)- Pointer to other blocks “like it”- Timestamp ...

Page 33: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

record4

A block with a table of offsets telling us the position of each record within the block

header unused

offset table

record3 record2 record1

1. Moving the record around within the block by changing the record’s entry in the offset table.

2. Moving the record to another block by holding a “ forwarding address” in its offset-table entry.

3. Deleting the record by leaving a tombstone in its offset-table entry.

Page 34: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

DBaddr mem-addr

database address(logical address, physical address)

memory address

The translation table turns database addresses into their equivalents in memory.

Database Address and Memory Address

Page 35: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Swizzled

Disk MemoryRead into memory

Block 1

Block 2

Unswizzled

Pointer Swizzling

Page 36: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Strategies to Swizzle Pointers

• Automatic Swizzling

• Swizzling on Demand

• No Swizzling

• Programmer Control of Swizzling

Page 37: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

When a block is moved from memory back to disk, any pointers within the block must be “unswizzled”.

Select memaddr

From translationtable

Where dbaddr=x; (creating index on the dbAddr )

Select dbaddr

From translationtable

Where memaddr=y; (creating index on the memaddr)

Returning Blocks to Disk

Page 38: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

A block in memory is said to be pinned if it is referred to by a swizzled pointer from somewhere else. When a block is pinned, we must either unpin it, or let the block remain in memory.

x y

y

y

Translation table

Swizzled pointer

Page 39: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Variable-Length Records

• Data items whose size varies

• Repeating fields

• Variable-format records

• Enormous fields

Page 40: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Records With Variable-Length Fields

Other header informationRecord length

to address

gender

birthdate name address

The first variable-length field needs no pointer.

Page 41: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Records With Repeating Fields

to address

to movie pointers

name address

other header informationrecord length

pointers to movies

A record with a repeating group of references to movies

Page 42: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Record

Additional space

address name

record header informationto name

length of name

to addresslength of address

to movie referencesnumber of references

Storing variable-length fields separately from the record

Page 43: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Variable-Form at Records

Clint eastwood Hog’s breath inn …N S 14 R S 16

code for namecode for string type

length

code for restaurant owned

code for string type

length

A record with tagged fields

Page 44: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Spanned Records

block header

record1record2-a

record2-b

Record 3

block 1 block 2

record header

Storing spanned records across blocks

Page 45: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

BLOBS

• Storage of BLOBS

Stored on a sequence of blocks; Striped across several disks.

• Retrieval of BLOBS

Several blocks at a time; Suitable index structure.

Page 46: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Record Modification

• Insertion

• Deletion

• Update

Page 47: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Easy case: records not in sequence

Insert new record at end of file or in deleted slot

If records are variable size, not as easy...

Insert

Page 48: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Hard case: records in sequence

If free space “close by”, not too bad...

Or use overflow idea...

Insert

Page 49: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Block

Deletion

Rx

Page 50: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Options:

(a) Immediately reclaim space

(b) Mark deleted– May need chain of deleted records

(for re-use)– Need a way to mark:

• special characters• delete field• in map

Page 51: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

As usual, many tradeoffs...

• How expensive is to move valid record to free space for immediate reclaim?

• How much space is wasted?– e.g., deleted records, delete fields, free

space chains,...

Page 52: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Dangling pointers

Concern with deletions

R1 ?

Page 53: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

E.g., Leave “MARK” in old location

Solution: Tombstones

A block

This space This space cannever re-used be re-used

Page 54: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

ID LOC

7788

map

Never reuseID 7788 nor

space in map...

E.g., Leave “MARK” in map

Solution : Tombstones

Page 55: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Update

• Fixed-length record: no effort on the storage system

• Variable-length record: with methods of insertion and deletion but without tombstone

Page 56: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Interesting problems:

• How much free space to leave in each block, track, cylinder?

• How often do I reorganize file + overflow?

Page 57: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• There are 10,000,000 ways to organize my data on disk…

Which is right for me?

Comparison

Page 58: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Issues:

Flexibility Space Utilization

Complexity Performance

Page 59: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

To evaluate a given strategy, compute following parameters:-> space used for expected data-> expected time to- fetch record given key- fetch record with next key- insert record- append record- delete record- update record- read all file- reorganize file

Page 60: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Example

How would you design Megatron 3000 storage system? (for a relational DB, low end)– Variable length records?– Spanned?– What data types?– Fixed format?– Record IDs ?– Sequencing?– How to handle deletions?

Page 61: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

• How to lay out data on disk

Data Items

Records

Blocks

Files

Memory

DBMS

Summary

Page 62: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

How to find a record quickly,

given a key

Next

Page 63: Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory

Exercises of Chapter 2, 3

• EX 2.2.1

• EX 2.2.2

• EX 2.3.1

• EX 2.6.7

• EX 3.2.2

• EX 3.3.4