chapter 3 representing data elements 1.how to lay out data on disk 2.how to move it to memory
TRANSCRIPT
Chapter 3
Representing Data Elements
1. How to lay out data on disk
2. How to move it to memory
Attributes
Records
Blocks
Files
Memory
Data Elements
Representing Relational Database Elements
CREATE TABLE MovieStar(
name CHAR(30) PRIMARY KEY,
salary INTEGER,
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
);
Representing Objects
Interface star{
attribute string name;
attribute Struct Addr{
string street, string city} address;
relationship set<movie> starredIn
inverse Movie::stars;
Some Differences
• Objects can have methods.
• Objects have an object identifier.
• Objects can have relationships to other objects.
So we need a way to represent address as objects have OID, and may have relationships to other objects. We also need the ability to represent arbitrarily long lists of objects, such as arbitrarily long lists of movies for a given star.
What are the data attributes we want to store?
• a salary
• a name
• a date
• a picture
What we have available: Bytes
8bits
To represent:
• Integer :2 bytes (short), 4 bytes (long)e.g., 35 is
00000000 00100011
• Float: 4 bytes, Double Float: 8 bytes
• Characters
various coding schemes suggested,
most popular is ASCII
To represent:
Example:A: 1000001a: 11000015: 0110101LF: 0001010
• Fixed-Length Character Strings
To represent:
A CHAR(5)
A: cat cat
Here is the “pad” character, whose 8-bit code is not one of legalcharacters for SQL strings
• Variable-Length Character Strings– Null terminated
e.g.,
– Length given
e.g.,
c ta
c ta3
To represent:
The SQL type VARCHAR(n) actually represents fields of fixed length, although its value has a length that varies, as n+1 bytes are dedicated to the value of the string regards of how long it is.
To represent:
SQL2 Standard :
Date : YYYY-MM-DD
Time : HH:MM:SS
20:19:02 1958-05-14
Time to include fractions of a second :‘ 20:19:02.25’
• Bag of bits
Length Bits
To represent:
A sequence of bits are packed eight to a byte.
• Boolean
e.g., TRUE FALSE
1111 1111
0000 0000
To represent:
1000 0000
0000 0000
To represent:
• Enumerated Typese.g., RED 1 GREEN 3
BLUE 2 YELLOW 4 …
Can we use less than 1 byte/code?Yes, but only if with less than 256 values
Representing Records
• Fixed-Length Records
• Variable-Length Records
• Record Addresses
• Record Modification
Record - Collection of related dataitems (called FIELDS)
E.g.: Employee record:
name field,
salary field,
date-of-hire field, ...
Types of records:
• Main choices:– FIXED vs VARIABLE FORMAT– FIXED vs VARIABLE LENGTH
Fixed-Length Records
0 30 286 287 297
name address birthdate
gender
A MovieStar Record
name address birthdate
gender
0 32 288 292 304
The layout of MovieStar tuples when fields are required to start at multiple of 4 bytes
A SCHEMA (not record) contains
following information
- # fields
- type of each field
- order in record
- meaning of each field
Fixed format
Example: fixed format and length
Employee record
(1) E#, 2 byte integer
(2) E.name, 10 char. Schema
(3) Dept, 2 byte code
55 s m i t h 02
83 j o n e s 01
Records
To schema
lengthtimestamp
name address
gender
birthdate
0 12 44 300 304 316header
Adding head information to records representing tuples of the Moviestar relation
The attributes of the relation
Their types
The order in which attributes appear in the tuple
Constraints on the attributes and relation itself
Header Record1 Record2 … Record n
Packing Fixed-Length Records into Blocks
• Links to one or more other blocks that are part of a network of blocks.
•Information about the role played by this block in such a network.
•Information about which relation the tuples of this block belong to.
• A “ directory” giving the offset of each record in the block
• A “block ID”
• Timestamp(s) indicating the time of the block ‘s last modification and/or access
A typical block holding records
Block and Record Addresses
Client Server
Application Address Space Database Address Space
Logical Addresses
Physical Address
Memory Addresses
• How does one refer to records?
Indirection
Rx
Many options:
Physical Indirect
Purely Physical
Device ID
E.g., Record Cylinder #
Address = Track #
or ID Block #
Offset in block
Block ID
Physical Addresses: (1) Host Name(2) Device ID(3) Cylinder #(4) Track #(5) Block #(6) Offset
Logical Addresses: Byte strings of some fixed length
logical physical
Logical address
Physical address
A map table translates logical to physical address
Fully Indirect
E.g., Record ID is arbitrary bit string
map
rec ID
r address
a
Physicaladdr.Rec ID
Tradeoff
Flexibility Cost
to move records of indirection
(for deletions, insertions)
Physical Indirect
Many optionsin between …
Logical and Structured Addresses
• A physical address of the block + the key value for the record being referred to.
• A physical address of the block + the offset of the entry in the block’s offset table for the record being referred to.
Ex #1 Indirection in block
Header
A block: Free
space
R3
R4
R1 R2
Block header - data at beginning that describes block
May contain:- File ID (or RELATION or DB ID)
- This block ID - Record directory
- Pointer to free space- Type of block (e.g. contains recs type 4;
is overflow, …)- Pointer to other blocks “like it”- Timestamp ...
record4
A block with a table of offsets telling us the position of each record within the block
header unused
offset table
record3 record2 record1
1. Moving the record around within the block by changing the record’s entry in the offset table.
2. Moving the record to another block by holding a “ forwarding address” in its offset-table entry.
3. Deleting the record by leaving a tombstone in its offset-table entry.
DBaddr mem-addr
database address(logical address, physical address)
memory address
The translation table turns database addresses into their equivalents in memory.
Database Address and Memory Address
Swizzled
Disk MemoryRead into memory
Block 1
Block 2
Unswizzled
Pointer Swizzling
Strategies to Swizzle Pointers
• Automatic Swizzling
• Swizzling on Demand
• No Swizzling
• Programmer Control of Swizzling
When a block is moved from memory back to disk, any pointers within the block must be “unswizzled”.
Select memaddr
From translationtable
Where dbaddr=x; (creating index on the dbAddr )
Select dbaddr
From translationtable
Where memaddr=y; (creating index on the memaddr)
Returning Blocks to Disk
A block in memory is said to be pinned if it is referred to by a swizzled pointer from somewhere else. When a block is pinned, we must either unpin it, or let the block remain in memory.
x y
y
y
Translation table
Swizzled pointer
Variable-Length Records
• Data items whose size varies
• Repeating fields
• Variable-format records
• Enormous fields
Records With Variable-Length Fields
Other header informationRecord length
to address
gender
birthdate name address
The first variable-length field needs no pointer.
Records With Repeating Fields
to address
to movie pointers
name address
other header informationrecord length
pointers to movies
A record with a repeating group of references to movies
Record
Additional space
address name
record header informationto name
length of name
to addresslength of address
to movie referencesnumber of references
Storing variable-length fields separately from the record
Variable-Form at Records
Clint eastwood Hog’s breath inn …N S 14 R S 16
code for namecode for string type
length
code for restaurant owned
code for string type
length
A record with tagged fields
Spanned Records
block header
record1record2-a
record2-b
Record 3
block 1 block 2
record header
Storing spanned records across blocks
BLOBS
• Storage of BLOBS
Stored on a sequence of blocks; Striped across several disks.
• Retrieval of BLOBS
Several blocks at a time; Suitable index structure.
Record Modification
• Insertion
• Deletion
• Update
Easy case: records not in sequence
Insert new record at end of file or in deleted slot
If records are variable size, not as easy...
Insert
Hard case: records in sequence
If free space “close by”, not too bad...
Or use overflow idea...
Insert
Block
Deletion
Rx
Options:
(a) Immediately reclaim space
(b) Mark deleted– May need chain of deleted records
(for re-use)– Need a way to mark:
• special characters• delete field• in map
As usual, many tradeoffs...
• How expensive is to move valid record to free space for immediate reclaim?
• How much space is wasted?– e.g., deleted records, delete fields, free
space chains,...
Dangling pointers
Concern with deletions
R1 ?
E.g., Leave “MARK” in old location
Solution: Tombstones
A block
This space This space cannever re-used be re-used
ID LOC
7788
map
Never reuseID 7788 nor
space in map...
E.g., Leave “MARK” in map
Solution : Tombstones
Update
• Fixed-length record: no effort on the storage system
• Variable-length record: with methods of insertion and deletion but without tombstone
Interesting problems:
• How much free space to leave in each block, track, cylinder?
• How often do I reorganize file + overflow?
• There are 10,000,000 ways to organize my data on disk…
Which is right for me?
Comparison
Issues:
Flexibility Space Utilization
Complexity Performance
To evaluate a given strategy, compute following parameters:-> space used for expected data-> expected time to- fetch record given key- fetch record with next key- insert record- append record- delete record- update record- read all file- reorganize file
Example
How would you design Megatron 3000 storage system? (for a relational DB, low end)– Variable length records?– Spanned?– What data types?– Fixed format?– Record IDs ?– Sequencing?– How to handle deletions?
• How to lay out data on disk
Data Items
Records
Blocks
Files
Memory
DBMS
Summary
How to find a record quickly,
given a key
Next
Exercises of Chapter 2, 3
• EX 2.2.1
• EX 2.2.2
• EX 2.3.1
• EX 2.6.7
• EX 3.2.2
• EX 3.3.4