01 whirlwind tour
TRANSCRIPT
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The Whirlwind Tour
Chapter 1a
Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro &
terminologyTP mons& ORBs
Logging &res. Mgr.
Files &Buffer Mgr.
Structuredfiles
11:00 Reliability Lockingtheory
Res. Mgr. &Trans. Mgr.
COM+ Access paths
13:30 Faulttolerance
Lockingtechniques
CICS & TP& Internet
CORBA/EJB + TP
Groupware
15:30 Transactionmodels
Queueing AdvancedTrans. Mgr.
Replication Performance& TPC
18:00 Reception Workflow Cyberbricks Party FREE
2
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transactions: Where It All Started
[Cuneiform] documents now number about half a million, three- quarters of them more or less directly related to the history of law - dealing, as they do, with contracts, acknowledgment of debts, receipts, inventories, and accounts, as well as containing records and minutes of judgments rendered in courts, business letters, administrative and diplomatic correspondence, laws, international treaties, and other official transactions. The total evidence enables the historian to reach back as far as the beginnings of writing, to the dawn of history.[ ... ]Moreover, because of the inconvenience of writing in stone or clay, Mesopotamians wrote only when economic or political necessity demanded it.
(Encyclopaedia Britannica, 1974 edition)
3
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
From Transactions to Transaction Processing Systems - I
Database. An abstract system state, represented as marks on clay tablets, was maintained. Today, we would call this the database.
Transactions. Scribes recorded state changes with new records (clay tablets) in the database. Today, we would call these state changes transactions.
The Sumerian way of doing business involved two components:
4
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
From Transactions to Transaction Processing Systems - II
Change
Reality Abstraction
Transaction
Que
ry
AnswerDB'
DB
The real state is represented by an abstraction, called the database, and the transformation of the real state is mirrored by the execution of a program, called a transaction, that transforms the database.
5
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transactions Are In ...
Each time you make a phone call, there is a call setup transaction that allocates some resources to your conversation; the call teardown is a second transaction, freeing those resources. The call setup increasingly involves complex algorithms to find the callee (800 numbers could be anywhere in the world) and to decide who is to be billed (800 and 900 numbers have complex billing). The system must deal with features like call forwarding, call waiting, and voice mail. After the call teardown, billing may involve many phone companies.
Communications:
6
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transactions Are In ...
Each time you purchase gas using a credit card, the point-of-sale terminal connects to the credit card company's computer. In case that fails, it may alternatively try to debit the amount to your account by connecting to your bank.
This generalizes to all kinds of point-of-sale terminals such as cash registers, ATMs, etc.
When banks balance their accounts with each other (electronic fund transfer), they use transactions for reliability and recoverability.
Finance:
7
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transactions Are In ...
Making reservations for a trip requires many related bookings and ticket purchases from airlines, hotels, rental car companies, and so on.
From the perspective of the customer, the whole trip package is one purchase. From the perspective of the multiple systems involved, many transactions are executed: One per airline reservation (at least), one for each hotel reservation, one for each car rental, one for each ticket to be printed, on for setting up the bill, etc.
Along the way, each inquiry that may not have resulted in a reservation is a transaction, too.
Travel:
8
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transactions Are In ...
Order entry, job and inventory planning and scheduling, accounting, and so on are classical application areas of transaction processing. Computer integrated manufacturing (CIM) is a key technique for improving industrial productivity and efficiency. Just-in-time inventory control, automated warehouses, and robotic assembly lines each require a reliable data storage system to represent the factory state.
Manufacturing:
9
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transactions Are In ...
This application area includes all kinds of physical machinery that needs to interact with the real world, either as a sensor, or as an actor. Traditionally, such systems were custom made for each individual plant, starting from the hardware. The usual reason for that was that 20 years ago off-the-shelf systems could not guarantee real-time behavior that is critical in these applications. This has changed, and so has the feasibility of building entire systems from scratch. Standard software is now used to ensure that the application will be portable.
Real-Time Systems:
10
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
A Transaction Processing System
A transaction processing system (TP-system) provides tools to ease or automate application programming, execution, and administration of complex, distributed applications.
Transaction processing applications typically support a network of devices that submit queries and updates to the application.
Based on these inputs, the application maintains a database representing some real-world state.
Application responses and outputs typically drive real-world actuators and transducers that alter or control the state.
The applications, database, and network tend to evolve over several decades.
Increasingly, the systems are geographically distributed, heterogeneous (they involve equipment and software from many different vendors), continuously available (there is no scheduled
downtime), and have stringent response time requirements.
11
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
ACID Properties: First Definition
Atomicity: A transaction’s changes to the state are atomic: either all happen or none happen. These changes include database changes, messages, and actions on transducers.
Consistency: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of the integrity constraints associated with the state. This requires that the transaction be a correct program.
Isolation: Even though transactions execute concurrently, it appears to each transaction T, that others executed either before T or after T, but not both.
Durability: Once a transaction completes successfully (commits), its changes to the state survive failures.
12
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Structure of a Transaction Program
The application program declares the start of a new transaction by invoking BEGIN_WORK().
All subsequent operations will be covered by the transaction. Eventually, the application program will call COMMIT_WORK(), if a new consistent state has been reached. This makes sure the new state becomes durable.
If the application program cannot complete properly (violation of consistency constraints), it will invoke ROLLBACK_WORK(), which appeals to the atomicity of the transaction, thus removing all effects the program might have had so far.
If for some reason the application fails to call either commit or rollback (there could be an endless loop, a crash, a forced process termination), the transaction system will automatically invoke ROLLBACK_WORK() for that transaction.
13
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The End User’s View of a Transaction Processing System
Delete Message Cancel Message
Logon
Name______
Password___
From Subject Jim hi Chris it's raining Betty more bugs
from: Jim subject: hi <text>
Headers
Read Message to: Jim subject: dinner <text, sound, image>
Send Message
Mailboxes and MailOperations on Mail and Mailboxes
Andreas
Jim
Bruce
Chris
Betty
14
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The Administrator's/Operator’s View of a TP System
Data Base
Data Comm
Hong Kong
Berlin
New York
Application
Mail Gateway Other Mail Systems Repository
Administrator & Operator
15
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Performance Measures of Interactive Transactions
Performance/ Small/Simple Medium Complex
Transaction
________________________________________________________________
Instr./transaction 100k 1M 100M
Disk I/O / TA 1 10 1000
Local msgs. (B) 10 (5KB) 100 (50KB) 1000 (1MB)
Remote msgs. (B) 2 (300B) 2 (4KB) 100 (1MB)
Cost/TA/second 10k$/tps 100k$/tps 1M$/tps
Peak tps/site 1000 100 1
16
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Client-Server Computing: The Classical Idea
DeleteSendRead
Headers
Logon
Presentation In Workstation
Workstation Client Host Server(s)
Data communications
Transactional Remote
Procedure Call
TP Monitor
Headers
Read
Logon
Send
Delete
Services
Dat
a B
ase
17
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Client-Server Computing: The CORBA Idea
Client on WSPresentationServices etc
IDLStub
IDLSkeleton
Object Implementation:Jim´s Mailbox
Request: Delete
Object Request Broker
18
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Client-Server Computing: The WWW Idea
WWW-Browser
Java-Applet
+
Java DatabaseConnection
(JDBC)Driver Code
HTTPServer
Java-applet
JDBC-driver code
DatabaseServer
proprietary protocol
JDBC-ODBC-bridge
ODBCdriver
prop.protocol
JDBC networkdriver
public protocol
(e.g. TCP/IP)
JDBCdriver
19
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
User Client TP Monitor Service (server)
Scre
en
Net
wor
k
Net
wor
k
Another TP-Monitor and Server
Dat
abas
e
Tim
e
Using Transactional Remote Procedure Calls (TRPCs)
20
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Terms We Have Introduced So Far
Resource manager: The system comes with an array of transactional resource managers that provide ACID operations on the objects they implement. Database systems, persistent programming languages, and queue managers are typical examples.
Durable state: Application state represented as durable data stored by the resource managers.
TRPC: Transactional remote procedure calls allow the application to invoke local and remote resource managers as though they were local. They also allow the application designer to decompose the application into client and server processes on different computers.
Transaction program: Inquiries and state transfor-mations are written as programs in conventional or specialized programming languages. The programmer brackets the successful execution of the program with a Begin-Commit pair and brackets a failed execution with a Begin-Rollback pair.
21
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Terms We Have Introduced So Far
Atomicity: At any point before the commit, the application or the system may abort the transaction, invoking rollback. If the transaction is aborted, all of its changes to durable objects will be undone (reversed), and it will be as though the transaction never ran.
Consistency: The work within a Begin-Commit pair must be a correct transformation.
Isolation: While the transaction is executing, the resource managers ensure that all objects the transaction reads are isolated from the updates of concurrent transactions.
Durability: Once the commit has been successfully executed, all the state transformations of that transaction are made durable and public.
22
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The World According to the Resource Manager
Application
Application Servers
Resource Managers
Resource Managers
TransactionApplication
Servers
Transaction Manager
23
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Where To Split Client/Server?
Presentation
Flow Control
Application Logic (=business objects)
Data Access
Server
Thin
ThinFat
Fat
24
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Client/Server Infrastructure
Client ServerMiddleware
GUI
OOUI
SystemMgmt.
OS
Objects
Group-ware
TP-Mon.
DBMS
OS
SQLORB
TRPC
Security
Transport
WWW
Files
etc.
25
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transactional Core Services
Recovery Manager
Write Commit Log Record & Force Log
Commit Phase 1? Yes/No
Commit Phase 2 ack
Transaction Recovery Functions
Work RequestsResource Manager
Normal Funcitons
Lock Requests
Log Records
Work Requests
Lock Manager
transid
Log Manager
Application
Begin_Work()
Commit_Work()
Join_Work
26
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The X/Open TP-Model
RM Resource Manager
TM Transaction Manager
Application
Requests
Begin Commit Abort
Join
Prepare, Commit, Abort
27
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The X/Open Distributed Transaction Processing Model
TM Transaction
Manager
Application
Requests
Begin Commit Abort
TM Transaction
Manager
RM Resource Manager
Server
RequestsRemote Requests
Start
CM Communications
Manager
CM Communications
Manager
IncomingOutgoing
RM Resource Manager
Prepare, Commit, AbortPrepare, Commit, Abort
28
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The OTS Model
transactionoriginator
TA-context
TA-context
TA-context
recoverableserver
Transactionservice
transmittedwith request
creationtermination
invocationcommitcoordination
29
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transaction Processing System Feature List
Application development features
Application generators; graphical programming interfaces; screen painters; compilers; CASE tools; test data generators; starter system with a complete set of administrative and operations functions, security, and accounting.
Repository features
Description of all components of the system, both hardware and software. Description of the dependencies among components (bill-of-material). Description of all changes to all components to keep track of different versions. The repository is a database. Its role in the system must be complete, extensible, active and allow for local autonomy.
TP-Monitor Features
Process management; server classes; transactional remote procedure calls; request-based authentication and authorization; support for applications and resource managers in implementing ACID operations on durable objects.
30
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Transaction Processing System Feature List
Data communications features
Uniform I/O interfaces; device independence; virtual terminal; screen painter support; support for RPC and TRPC; support for context-oriented communication (peer-to-peer).
Database features
Data independence; data definition; data manipulation; data control; data display; database operations.
Operations featuresArchiving; reorganization; diagnosis; recovery; disaster recovery; change control; security; system extension.
Education and testing featuresImbedded education; online documentation; training systems; national language features; test database generators; test drivers.
31
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Data Communications Protocols
SNA LU0
SNA LU6.2 PU2.1
OSIX.25 TCP IP
Named Pipes
Standard Interface To All Networks
Applications
add: transactions, rpc, naming, security, reliable messaeges, and uniform interface.
32
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Presentation Management
1 LOGON 2 NAME PIC X(20) 2 PIN PIC 9(4)
READ TERMINAL CHECK PIN DISPLAY HELLO OR NO
OUR BANK
NAME_____
PASSWORD_
Form Description Repository
Device Description
Application
PM
33
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
SQL Data DefinitionTABLE (=File)
TUPLE (=record)
COLUMN (=field)
dept loc
emp view
SELECT dept,loc FROM employee where loc = 7;
DEFINE VIEW emp_view AS
VIEW
DOMAIN (= type)
name dept loc
employee
34
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
SQL Data Manipulation
name dept loc
employee
a
name dept loc
employee address
a
PROJECT (column subset)
SELECT (row
subset)
JOIN (matching values)
join
name dept loc
employee
a
project select
dept mgr
35
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Summary of Chapter 1
A transaction processing system is a large web of application generators, system design and operation tools, and the more mundane language, database, network, and operations software.
The repository and the applications that maintain it are the mechanisms needed to manage the TP system. The repository is a transaction processing application.
It represents the system configuration as a database and supplies change control by transactions that manipulate the configuration and the repository.
The transaction concept, like contract law, is intended to resolve the situation when exceptions arise. The first order of business in designing a system is, therefore, to have a clear model of system failure modes. What breaks? How often do things break?
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro &
terminologyTP mons& ORBs
Logging &res. Mgr.
Files &Buffer Mgr.
Structuredfiles
11:00 Reliability Lockingtheory
Res. Mgr. &Trans. Mgr.
COM+ Access paths
13:30 Faulttolerance
Lockingtechniques
CICS & TP& Internet
CORBA/EJB + TP
Groupware
15:30 Transactionmodels
Queueing AdvancedTrans. Mgr.
Replication Performance& TPC
18:00 Reception Workflow Cyberbricks Party FREE
Chapter 1b
Basic Terminology
37
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
A Word About Words (Chapter 2)
Humpty Dumpty: “When I use a word, it means exactly what I chose it to mean; nothing more nor less.” Alice: “The question is, whether you can make words mean so many different things.”Humpty Dumpty: “The question is, which is to be master, that’s all.”
Lewis Carroll
38
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Basic Computer Terms
To get any confusion that might be caused by the many synonyms in our field out of the way, let us adopt the following conventions for the rest of this class:
domain = data type = ...field = column = attribute = ...record = tuple = object = entity = ...block = page = frame = slot = ...file = data set = table = ...process = task = thread = actor = ...function=request=method=...
All the other terms and definitions we need will be briefly introduced and explained during the session.
39
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Basic Hardware Architecture I
In Bell and Newell’s classic taxonomy, hardware consists of three types of modules: Processors, memory, and communications (switches or wires).
Processors execute instructions from a program, read and write memory, and send data via communication lines.
Computers are generally classified as supercomputers, mainframes, minicomputers, workstations, and personal computers. However, these distinctions are becoming fuzzy with current shifts in technology.
40
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Basic Hardware Architecture II
Today’s workstation has the power of yesterday’s mainframe. Similarly, today’s WAN (wide area network) has the communications bandwidth of yesterday’s LAN (local area network). In addition, electronic memories are growing in size to include much of the data formerly stored on magnetic disk.
These technology trends have deep implications for transaction processing.
41
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Basic Hardware Architecture III
Distributed processing: Processing is moving closer to the producers and consumers of the data (workstations, intelligent sensors, robots, and so on).
Client-server: These computers interact with each other via request-reply protocols. One machine, called the client, makes requests to another, called the server. Of course, the server may in turn be a client to other machines.
Clusters: Powerful servers consist of clusters of many processors and memories, cooperating in parallel to perform common tasks.
42
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Basic Hardware Architecture IV
processor
Memory
processor
Memory
processor
processor
The Network
processor
Memory
processor
43
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Memories - The Economic Perspective I
The processor executes instructions from virtual memory, and it reads and alters bytes from the virtual memory. The mapping between virtual memory and real memory includes electronic memory, which is close to the processor, volatile, fast, and expensive, and magnetic memory, which is "far away" from the processor, non-volatile, slow, and cheap. The mapping process is handled by the operating system with some hardware assistance.
Memory performance is measured by its access time: Given an address, the memory presents the data at some later time. The delay is called the memory access time. Access time is a combination of latency (the time to deliver the first byte), and transfer time (the time to move the data). Transfer time, in turn, is determined by the transfer size and the transfer rate. This produces the following overall equation:memory access time = latency + ( transfer size / transfer rate )
44
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Memories - The Economic Perspective II
Memory price-performance is measured in one of two ways: Cost/byte. The cost of storing a byte of data in that media. Cost/access. The cost of reading a block of data from that media.
This is computed by dividing the device cost by the number of accesses per second that the device can perform.
The actual units are cost/access/second, but the time unit is implicit in the metric’s name.
These two cost measures reflect the two different views of a memory’s purpose: it stores data, and it receives and retrieves data.
45
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Memories- The Economic Perspective III
Kilo Byte
Mega Byte
Giga Byte
Tera Byte
Peta Byte
access time (seconds)10 10 10
10
10
10
3
5
7
10 10-9 -6 -3 0 3
10
10
10
9
11
13
cache
electronic main
1015
electronic secondary
(RAMdisc)
magnetic optical
discs
online tape
nearline tape and
optical disc
Size vs Speedoffline tape
Typ
ical
larg
e sy
stem
cap
acit
y
46
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Memories- The Economic Perspective VI
10 10 10-9 -6 -3 10 0 10 3
$
electr. main electronic
secondary
magnetic optical discs
online tape
nearline tape,
optical disc
Price vs Speed
10
10
10
10
-4
-2
0
10
10
2
4
6
access time (seconds)
offline tape
$ / M
B
47
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Magnetic Memory
There are two types of magnetic storage media: disk and tape. Disks rotate, passing the data in the cylinder by the electronic read-write heads every few milliseconds. This gives low access latency. The disk arm can move among cylinders in tens of milliseconds. Tapes have approximately the same storage density and transfer rate, but they must move long distances if random access is desired. Consequently, tapes have large random access latencies—on the order of seconds.
Disk Access Time = Seek_Time +
Rotational_Latency +
(Transfer_Size/ Transfer_Rate)
48
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Magnetic Memory
Compare the times required for two access patterns to 1MB stored in 1000 blocks on disk:
Sequential access: Read or write sectors [x, x + 1, ..., x + 999] in ascending order. This requires one seek (10 ms) and half a rotation (5 ms) before the data in the cylinder begins transferring the megabyte at 10 MBps (the transfer takes 100 ms, ignoring one-cylinder seeks).
The total access time is 115ms.
Random access: Read the 1000 sectors [x, ..., x + 999] in random order. In this case, each read requires a seek (10 ms), half a rotation (5 ms), and then the 1 kb transfer (.1 ms). Since there are 1000 of these events, the total access time is 15.1 seconds.
49
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Memory Hierarchies
off line
processor
cache
main memory
online external storage
near line (archive) storage
memory capacity
current data
registers
cache
block addressed non-volatile electronic or magnetic
tape or disc robots
electronic storage
50
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Memory Hierarchies
The hierarchy uses small, fast, expensive cache memories to cache some data present in larger, slower, cheaper memories.
If hit ratios are good, the overall memory speed approximates the speed of the cache.
At any level of the memory hierarchy, the hit ratio is defined as:hit ratio = references satisfied by cache / all references to cache
Suppose a cache memory with access time C has hit rate H, and suppose that on a miss the secondary memory access time is S. Further, suppose that C = .01 • S. The effective access time of the cache will be as follows:Effective memory access time = H • C + (1 - H) • S
= H • (.01 • S) + ( 1 - H) • S = (1 - .99 • H) • S (1 - H) • S
51
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The Five Minute Rule Assume there are no special response time (real-time) requirements; the decision to
keep something in cache is, therefore, purely economic. To make things simple, suppose that data blocks are 10 KB. At 1995 prices, 10 KB of main memory cost about $1. Thus, we could keep the data
in main memory forever if we were willing to spend a dollar. With 10 KB of disk costing only $.10, we could save $.90 if we kept the 10 KB on
disk. In reality, the savings are not so great; if the disk data is accessed, it must be moved
to main memory, and that costs something. How much, then, does a disk access cost?
A disk, along with all its supporting hardware, costs about $3,000 (in 1995) and delivers about 30 acc./sec.; the cost, therefore, is about $100. At this rate, if the data is accessed once a second, it costs $100.10 to store it on disk (disk storage and disk access costs). That is considerably more than the $1 to store it in main memory.
The break-even point is about one access per 100 seconds. At that rate, the main memory cost is about the same as the disk storage cost plus the disk access costs. At a more frequent access rate, diskstorage is more expensive. At a less frequent rate, disk storage is cheaper. Anticipating the cheaper main memory that will result from technology changes, this observation is called the five-minute rule rather than the two-minute rule.
52
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The Five Minute Rule
Keep a data item in electronic memory if its access frequency is five minutes or higher; otherwise keep it in magnetic memory.
Similar arguments apply to objects stored on tape and cached on disk. Given the object size, the cost of cache, the cost of secondary memory, and the cost of accessing the object in secondary memory once per second, the frequency at the break-even point in units of accesses per second (a/s) is given by the following formula:
Frequency ((Cache_Cost/Byte - Secondary_Cost/Byte) . Object_Bytes) / (Object_Access_Per_Second_Cost) a/s
53
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The Rules of Exponential Growth
Electronic memory:
MemoryChipCapacity(year) = 4 Kb/chip
for year in [1970...2000] Moore’s Law
Magnetic memory:
MagneticAreaDensity(year) = 10 Mb/inch2
for year [1970...2000] Hoagland’s Law
Processors:
SunMips(year) = 2 MIPS
for year in [1984...2000] Joy’s Law
((year-1970)/3)
((year-1970)/10)
(year-1984)
54
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Communication Hardware
The definition of the four kinds of networks by their diameters. These diameters imply certain latencies (based on the speed of light). In 1990, Ethernet (at 10 Mbps) was the dominant LAN. Metropolitan networks typically are based on 1 Mbps public lines. Such lines are too expensive for transcontinental links at present; most long-distance lines are therefore 50 Kbps or less. As you will get from the news, these things are changing fast.
Cluster 100 m .5 µs 1 Gbps 10 µs
LAN (local area network) 1 km 5. µs 10 Mbps 1 ms
MAN (metro area network) 100 km .5 ms 1 Mbps 10 ms
WAN (wide area network) 10,000 km 50. ms 50 Kbps 210 ms
The early 90s
55
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Communication Hardware
Type of Network Diameter Latency Bandwidth Send 1 KB
Cluster 100 m .5 µs 1 Gbps 5 µs
LAN (local area network) 1 km 5. µs 1 Gbps 10 µs
MAN (metro area network) 100 km .5 ms 100 Mbps .6 ms
WAN (wide area network) 10,000 km 50. ms 100 Mbps 50 ms
Point-to-point bandwidth likely to be common among computers by the year 2000.
Scenario 2000
56
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Processor Architectures
processorprocessor
processorprocessorprocessorprocessor
The Network
Private Memory
processor
Shared MemoryGlobal Memory Shared Disks /
tapes
Private Memories
57
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Processor Architectures
Shared nothing: In a shared-nothing design, each memory is dedicated to a single processor. All accesses to that data must pass through that processor. Processors communicate by sending messages to each other via the communications network.
Shared global: In a shared-global design, each processor has some private memory not accessible to other processors. There is, however, a pool of global memory; shared by the collection of processors. This global memory is usually addressed in blocks (units of a few kilobytes or more) and is RAM disk or disk.
Shared memory: In a shared-memory design, each processor has transparent access to all memory. If multiple processors access the data concurrently, the underlying hardware regulates the access to the shared data and provides each processor a current view of the data.
58
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Address Spacesse
gmen
ts
process
address space
process
shar
ed c
ode
segm
ents
shar
ed d
ata
segm
ents
process
address space address space
59
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Address Spaces
Memory segmentation and sharing: A process executes in an address space—a paged, segmented array of bytes. Some segments may be shared with other address spaces. The sharing may be execute-only, read-only, or read-write. Most of the segment slots are empty (lightly shaded boxes), and most of the occupied segments are only partially full of programs or data.
To simplify memory addressing, the virtual address space is divided into fixed-size segment slots, and each segment partially fills a slot.
Typical slot sizes range from 2**24 to 2**32 bytes. This gives a two-dimensional address space, where addresses are {segment_number, byte}. Again, segments are often partitioned into virtual memory pages, which are the unit of transfer between main and secondary memory. If an object is bigger than a segment, it can be mapped into consecutive segments of the address.
60
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Processes
A process is a virtual processor. It has an address space that contains the program the process is executing and the memory the process reads and writes. One can imagine a process executing Java programs statement by statement, with each statement reading and writing bytes in the address space or sending messages to other processes.
Processes provide an ability to execute programs in parallel; they provide a protection entity; and they provide a way of structuring computations into independent execution streams. So they provide a form of fault containment in case a program fails.
Processes are building blocks for transactions, but the two concepts are orthogonal. A process can execute many different transactions over time, and parts of a single transaction may be executed by many processes.
Each process executes on behalf of some user, or authority, and with some priority. The authority determines what the process can do: which other processes, devices, and files the process can address and communicate with. The process priority determines how quickly the process’s demand for resour-ces will be serviced if other processes make competing demands. Short tasks typically run with high priority, while large tasks are given lower priority.
61
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Protection Domains
There are two ways to provide protection :
Process = protection domain: Each subsystem executes as a separate process with its own private address space. Applications execute subsystem requests by switching processes, that is, by sending a message to a process.
Address space = protection domain: A process has many address spaces: one for each protected subsystem and one for the application. Applications execute subsystem requests by switching address spaces. The address space protection domain of a subsystem is just an address space that contains some of the caller’s segments; in addition, it contains program and data segments belonging to the called subsystem. A process connects to the domain by asking the subsystem or OS kernel to add the segment to the address space. Once connected, the domain is callable from other domains in the process by using a special instruction or kernel call.
62
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Protection Domains
process
Application DataBase Network OS Kernel
A process may have many protection domains.
63
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Threads
There is a need for multiple processes per address space:
For example, to scan through a data stream, one process is appointed the producer, which reads the data from an external source, while the second process processes the data. Further examples of cooperating processes are file read-ahead, asynchronous buffer flushing, and other housekeeping chores in the system.
Processes can share the same address space simply by having all their address spaces point to the same segments. Most operating systems do not make a clean distinction between address spaces and processes. Thus a new concept, called a thread or a task, is introduced.
But note: Several operating systems do not use the term process at all. For example, in the Mach operating system, thread means process, and task means address space; in MVS, task means process, and so on.
64
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Threads
The term thread often implies a second property: inexpensive to create and dispatch. Threads are commonly provided by some software that found the operating system processes to be too expensive to create or dispatch. The thread software multiplexes one big operating system process among many threads, which can be created and dispatched hundreds of times faster than a process.
The term thread is used in the following to connote these light-weight processes. Unless this light-weight property is intended, “process” is used. Several threads usually share a common address space. Typically, all the threads have the same authorization identifier, since they are part of the same address space domain, but they may have different scheduling priorities.
65
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Messages and Sessions
There are two styles of communication among processes:
Datagrams: The sender of a message determines the recipient's address (e.g. the process name) and constructs an envelope consisting of the sender's name and address, the recipient's name and address, and the message text. This envelope is delivered to the capable hands of the communication system. It is analogous to sending letters by mail.
Sessions: Before any messages are sent, a fixed connection is established between sender and receiver, a so-called session. Once it has been established, both parties can send and receive messages via this session. This symmetry is often referred to as "peer-to-peer". Establishing a session requires a datagram. A session must at some point be closed down explicitly. It is analogous to a phone conversation.
66
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Advantages of Sessions
Shared state: A session represents shared state between the client and the server. A datagram might go to any process with the designated name, but a session goes to a particular instance of that name.
Authorization: Processes do not always trust each other. The server often checks the client’s credentials to see that the client is authorized to perform the requested function. The authentication protocols require multi-message exchanges. Once the session key is established, it is shared state.
Error correction: Messages flowing in each session direction are numbered sequentially. These sequence numbers can detect lost messages and duplicate messages.
Performance: The operations described are fairly costly. Each of the steps often involves several messages. By establishing a session, this information is cached.
67
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Clients and Servers
The question of how computations consisting of many interacting processes should be structured has no simple answer. Currently, two styles are particularly popular: peer-to-peer and client-server.
The debate about which style is "better" often creates the impression that they are radically different. But in reality, peer-to-peer is more general and more complex, and it subsumes client-server. Here is a brief characterization:
Peer-to-peer: The two processes are independent peers, each executing its computation and occasionally exchanging data with the other.
Client-server: The two processes interact via request-reply exchanges in which one process, the client, makes a request to a second process, the server, which performs this request and replies to the client.
68
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Clients and Servers
The limitation of the client-server model lies in the fact that it implies a synchronous pattern of one request/one response.
There are, however, cases in which one request generates thousands of replies, or where thousands of requests generate one reply. Operations that have this property include transferring a file between the client and server or bulk reading and writing of databases. In other situations, a client request generates a request to a second server, which, in turn, replies to the client. Parallelism is a third area where simple RPC is inappropriate. Because the client-server model postulates synchronous remote procedure calls, the computation uses one processor at a time. However, there is growing interest in schemes that allow many processes to work on problems in parallel. The RPC model in its simplest form does not allow any parallelism.
69
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Remote Procedure Calls (RPCs)
LOCAL PROCDURE CALL
REMOTE PROCDURE CALL
z = add(x,y)
z
add(int x,y) { return x + y }
z = add(x,y)
add(int x,y) { return x + y }
add, x, y
x + y
unpack & call
pack and sendunpack,return
pack & send
z
Server
70
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Naming
Naming has to do with the problem of how a client denotes a server it wants to invoke. Typical naming schemes distinguish between an object's name, its address, and its location. The name is an abstract identifier for the object, the address is the path to the object, and the location is where the object is.
An object can have several names. Some of these names may be synonyms, called aliases. Let us say that Bruce and Lindsay are two aliases for Bruce Lindsay. For this to be explicit, all names, addresses, and locations must be interpreted in some context, called a directory. For example, in our RPC context, Bruce means Bruce Nelson, and in our publishing context, Bruce means Bruce Spatz. Within the 408 telephone area, Bruce Lindsay’s address is 927-1747, and outside the United States it is +1-408-927-1747.
71
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Name Servers
Names are grouped into a hierarchy called the name space. An international commission has defined a universal name space standard, X.500, for computer systems. The commission administers the root of that name space. Each interior node of the hierarchy is a directory. A sequence of names delimited by a period (.) gives a path name from the directory to the object.
No one stores the entire name space—it is too big, and it is changing too rapidly. Certain processes, called name servers, store parts of the name space local to their neighborhood; in addition, they store a directory of more global name servers.
72
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Authentication Techniques
Passwords are the simplest technique. The client has a secret password, a string of bytes known only to it and the server. The client sends his password to the server to prove the client’s identity. A second password is then needed to authenticate the server to the client. Thus, two passwords are required, and they must be sent across the wire.
Challenge-response uses only one password or key. In this scheme, the client and the server share a secret encryption key. The server picks a random number, N, and encrypts it with the key as EN. The server sends EN to the client and challenges the client to decrypt it using the secret key. If the client responds with N, the server believes the client knows the secret encryption key. The client can also authenticate the server by challenging it to decrypt a second random number. The shared secret is stored at both ends, but random numbers are sent across the wire.
73
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Authentication Techniques
Public key system: Each authid has a pair of keys—a public encryption key, EK, and a private decryption key, DK. The keys are chosen so that DK(EK(X)) = X, but knowing only EK and EK(X) it is hard to compute X. Thus, a process’s ability to compute X from EK(X) is proof that the process knows the secret DK. Each authid publishes its public key to the world. Anyone wanting to authenticate the process as that authid goes through the challenge protocol: The challenger picks a random number X, encrypts it with the authid’s public key EK, and challenges the process to compute X from EK(X). Secrets are stored in one place only, and they do not go across the wire.
74
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Scheduling
The purpose of scheduling is to make sure all requests get processed, i.e. are assigned to a specific server process. There are basically two additional constraints:
Short response times: The requests should not wait longer than necessary before they get serviced.
Economic usage of resources: The required throughput should be achieved with the minimum number of resources (processors, nodes, links, etc.).
Throughput and response time at resource utilization r are related by the following formula:
Average_Response_Time(r) = (1/ (1 - r)) • Service_Time
75
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
The Scheduling Problem
0
10
20
30
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Utilization:
Res
pons
e T
ime
(in
mul
tipl
es o
f se
rvic
e ti
me) Response Time vs Utilization
76
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
File Organizations
File
unstructured structured
entry sequenced relative key sequenced hash
associativedirect
77
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
SQL in a Distributed Environment
Application Program
SQL : set oriented logic
File System: record logic
Network: msg. transport
SQL: set oriented logic
File Server: records and files
Network: message transport
SQL: set oriented logic
File Server: records and files
Network: message transport
SQL: set oriented logic
File Server: records and files
Network: message transport
SQL: set oriented logic
File Server: records and filesNetwork: message transport
Client
SQL Servers
78
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Software Performance
1,000,000
procedure call
domain switch
LAN rpc
local rpc
WAN rpc
process create
sequential write recordrandom read memory record
simple database transaction
1
10
100
1,000
10,000
100,000
.1
INSTRUCTIONS MICROSECONDS (with 10 mips and Ethernet)
disc accessWAN transmit delay
1KB on Ethernet
1KB memory copy
null transactionmain memory transation
process dispatch
random read/write disc record
sequential read record
random write memory record
1
10
100
1,000
10,000
100,000
79
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
message formats
protocol machine protocol machine
Client Machine
Operating System
ServerOperating System
Unix VMS
API compilerPortable Program linker/loader
"local" compiled program
Porting and Installation Steps
Client process
FAPServer Machine
Operation and Inter-Operation
Protocol Standards
80
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Relevant FAP-Standards
CSMA/CD, Token Ring, etc.: Low-level protocols that specify how bits are physically transmitted across a shared medium.
IP/TCP, NetBIOS, HTTP: Transport level protocols. LU6.2: SNA´s peer-to-peer protocol that allows both session oriented and
client-server-style communication under transaction protection. OSI-TP: ISO´s rendering of a protocol that provides a functionality very
similar to LU6.2. ASN.1: Protocol for exchanging data formatting and structuring
information. Required for RPCs in a heterogeneous environment. DRDA: Interoperability standard for IBM SQL-systems. ODBC, JDBC: Interoperability standards for general SQL-systems.
81
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Relevant API-Standards
SQL: Portability standard for accessing relational databases (lots of proprietary extensions).
APPC, CPI-C: Two of IBM´s APIs for the LU6.2 protocol. X/Open-XA, X/Open-XA+, etc.: APIs by the X/Open
consortium on ISO´s OSI-TP protocols. IDL: OMG´s interface definition language to let objects be
integrated through an object request broker. STDL: Language for programming TP-applications; based on
the ACMS TP-monitor. Java: The web´s favorite programming language; comes with
its own FAP-component.
82
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
OSI Standards and X/Open APIs
CM Communi-
cations Manager
TM Transaction
Manager
RM Resource Manager
Application
requests
begin commit abort
prepare, commit,
abort
transid is leaving this node
CM Communi-
cations Manager
TM Transaction
Manager
RM Resource Manager
Server
requests
prepare, commit,
abort
new transid is arriving
remote requests
OSI/TP and CCR protocols
start
prepare, commit, abort
+ack, -ack, restart
83
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
A Last Glance at TP-Standards
PARTICIPANTS PROTOCOL/API DEFINERapplication : TM TX X/Open DTPapplication : RM RM specific various
(e.g. SQL, Queues)application:server RPC or ROSE OSI + application
TM : RM XA X/Open DTPTM: CM XA+ X/Open DTPTM-TM OSI-TP + CCR OSI
Each resource manager (RM) registers with its local transaction manager (TM). Applications start and commit transactions by calling their local TM. At commit, the TM invokes every participating RM. If the transaction is distributed, the communications manager informs the local and remote TM about the incoming or outgoing transaction, so that the two TMs can use the OSI-TP protocol to commit the transaction.
84
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999
Summary
Transaction processing systems comprise all parts of a system, software and hardware.
Building such a system requires to consider end-to-end arguments at all levels of abstraction.
The performance of distributed TP systems is influenced by the hardware architecture (what is shared), by software issues (which protocols are used), and by configuration aspects (what limits scaleability).
The multitude of those influences gives rise to a constant dilemma: Should one restrict the variety to few (proprietary) components for better tuning and performance, or should one embrace all the standards for openness - at the risk of poor scaleability and performance?