life beyond distributed transactions: an apostate’s opinion pat helland partner architect...
TRANSCRIPT
Life Beyond Distributed Transactions: An Apostate’s OpinionPat HellandPartner ArchitectMicrosoft Corporation
Apostate: noun “One who renounces a previously held belief.”
Slide 2
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Slide 3
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Session Objectives And Takeaways• Distributed transactions aren’t used much in practice
• They are fragile and impede availability• Local transactions are wonderful!
• Designing for scalability requires planning• Need to think about separate items as the scope of transactions• Even when separate items are on the same machine (today), you
must plan for them to be repartitioned later• Interacting across items requires messaging
• Managing the messaging is complex• Each partner must track the state of its interaction with partner
items• Scalable applications become state-driven workflow
• Surprise: the fine granularity of the participants in the workflow
Today’s Goal:Offer hopefully insightful opinions about scaleable apps.
incite-ful
Slide 4
Pointer to Paper
• Paper delivered at CIDR-2007• Conference on Innovation Database Research• http://www-db.cs.wisc.edu/cidr/cidr2007/papers/
cidr07p15.pdf
• Terminology changes from Paper:
• Entity Item• Activity Partner-State-Machine
Slide 5
Slide 6
Want Almost-Infinite Scaling• More of everything… Year by year, bigger and bigger• If it fits on your machines, multiply by 10, if that fits, multiply by
1000…• Strive to scale almost linearly (N log N for some big log).
Assumptions(Don’t Have to Prove These… Just Plain Believe Them)
Grown-Ups Don’t Use Distributed Transactions• The apps using distributed transactions become too fragile…• Let’s just consider local transactions. Multiple disjoint scopes of serializability
Want Scale-Agnostic Apps• Two layers to the
application: scale-agnostic and scale-aware
• Consider scale-agnostic API
Scale Agnostic Code
Scale-Aware-Code
Application
Upper Layer
Lower Layer
Scale Agnostic API
Data
Transaction Data
Transaction Data
Disjoint Scopes of Serializability• Assume transactions only within a single machine
• OK, I’ll give you a small cluster…but not all the machines!• Repartitioning moves data
• To expand the app, some data moves to a new data-store• Which data can you count on for a transaction?
• Remember, it might get moved…• What’s on one machine today may get moved to another
tomorrow!• Recall, no transactions may cross machines• What CAN you tie into a single transaction?
Slide 7
Slide 8
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Slide 9
Uniquely Keyed Items• Not all data may be in a single transaction
• We must collect the data into pieces• We must annotate the boundaries of the data guaranteed to
be transactional• Must remain transactional even if we repartition!
• An item:• A collection of data that fits on a single machine• Identified by a unique key
• Assume the scale-aware-code never partitions an item• The unique key defines the data that can’t be partitioned
The application’s data is factored into items,
eachof which has a unique
key Each item will reside on
a single machine(ignoring replication &
H/A)
ItemKey = “ABC”
ItemKey = “WPB”
ItemKey = “QLA”
ItemKey = “UNB”
Slide 10
Transactions and Items• A transaction may update a single item
• The scale-aware-code (and API) guarantee it • The item is never partitioned
• A transaction must not ever update two items• Even if the two live on one machine today• Tomorrow, they may repartition to different machines…
Item“ABC”
Item“DEF”
Transaction
Item“XYZ”
Item“RST”
Item“RAA”
Item“NAO”
Item“MOE”
Item“JKL”Item
“FXQ”Item“GHI”
Item“RST”Item“JKL”
Item“LMN”Item“JAA”
Item“ABC”
Item“ABZ”
Item“DEF”
Item“KZU”
Item“XYZ” Item
“LMN”Item
“LMN”Item“GHI”Item
“FAW”Item
“XYZ”
Repartitioning and Items• Items allow scaling
• Items remain intact even when repartitioning• The application can count on the integral nature of each
item• It is OK to know that the entire item is local• It is OK to work on anything in the item at once
No Promisethat TwoDifferent
Items Stayon the Same
Machine!!
Slide 11
Frequently the work
won’t fit on one
computer!
Slide 12
Thinking about Queries• Queries just got HARD!
• Certainly can’t do cross-item transactional queries• The items aren’t in the same scope of serializability
• Perhaps can query on stale versions of the data• Very useful… just different than classic DB
• Can do distributed queries• Send partial queries around the network• Hard as the dataset explodes in size
• Can filter copies of old versions• Keep a subset on a machine for ad-hoc queries• Subset becomes a smaller percentage as we scale…
Many Traditional Queries Are Used Today to Implement
ItemsGotta Join to Overcome the
Normalization of Rows!
Ad-Hoc Queries Get Harder…Scaling Means It Won’t All
Fit
Slide 13
Thinking about Alternate Indices• Items must have a unique key
• Unless the you begin with the same key, you aren’t the same• CANNOT guarantee the alternate index will co-locate with
the item’s primary key• By definition, alternate indices don’t have the same key!• We must index them with a different key…
• Alternate indices CANNOT be updated in the same transaction as the primary data• There is no way to guarantee they are on the same machine• They must be updated in different transactions…
Item Keys Indexed
by 1st Alternate Key
PK
:123
PK
:217
PK
:332
PK
:589
PK
:719
Item Keys Indexed
by 1st Alternate Key
A1:A
BC
A1:D
EF
A1:G
HI
A1:JK
L
A1:M
NO
Item Keys Indexed
by 2nd Alternate Key
A2:a
bc
A2:d
ef
A2:f
gh
A2:g
hu
A2:k
lw
Slide 14
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Slide 15
Items Are Connected by Messaging• Items are key-named boundaries for transactional work
• Transactions never span items• The scale-aware-code may move them to repartition
• The only way to communicate across items is with messaging!• The scale-aware-code is responsible for finding the correct item
(by key-name) and for routing the message to it
Boundary ofTransactions
Boundary ofTransactions
Item-X Item-YSend To:Item-Y
“Messaging” Is in Quotes… Work Is Invoked -- Potentially across Machines -- Definitely across Transactions!
Slide 16
Keeping Notes Before You Speak• Transactions update the data within an item
• They also update the intent to send a message• Must not send a message unless the intent commits
• Otherwise, the message could arrive and the intent to send the message aborts with the sending transaction
• Output queues are frequently transactional• Otherwise even more confusing things can happen
TransactionItemItem
PrivateData
App Logic
Slide 17
At-Least-Once Delivery Semantics
• Each message is sent at-least-once• Given infinite time…
• The sender tries and tries and tries until acked• Eventually, the message is delivered
Dialogs and Exactly-Once Delivery• It is Possible to Implement Exactly-Once Delivery Within a
Relationship• Dialog:
• Similar to TCP-IP but Long-Running• Can Guarantee Exactly-Once Delivery OR Failure-Notification• Requires Interesting Platform Support
• Not the Topic of this Talk• See Microsoft SQL Server 2005 – SQL Service Broker
Slide 18
Idempotence: It’s Not a Medical Condition• Requests get lost…
• Gotta retry them to handle lost requests• Requests arrive more than once…
• Those pesky retries may actually arrive• Idempotent means it’s OK to arrive multiple times
• As long as the request is processed at least once, the correct stuff occurs
• In today’s world, you must design your requests to be idempotent
Not idempotent
Baking a cakestarting fromingredients
Naturally idempotent
Sweeping the floor
Naturally idempotent
Read record “X”
IdempotentIf haven’t yet
doneWithdrawal
#XYZfor $1 billion,then withdraw$1 billion andlabel as #XYZ
Not idempotentWithdrawing
$1 billion
IdempotentBaking a
cakeStarting fromthe shoppinglist (if money
doesn’t matter)
Slide 19
Out of Order Arrival• Any message may arrive multiple times
• Even after a long while• This can be very confusing…
• Lots of possible message deliveries
Applications find itdifficult to ensure
there are no latent bugs
------------------Esoteric late retriesof messages may
finduntested windows…
A C A A
Arrg!
Item ItemA B C B
Slide 20
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Messages Connect Items• Messages are the only way into and out of
items• They are produced by transactions• They are consumed by transactions• Transactions are local to the item
Slide 21
Send To:Item-A
Send To:Item-B
Send To:Item-C
From:Item-B
From:Item-C
From:Item-D
From:Item-A
Item-X
Slide 21
Slide 22
Item-YItem-XItem-W
Item-Z
Items Connected by Partnerships• Mostly, messaging occurs between two partner
items• Usually, a two-way exchange moving both items’ state• Each keeps data about how far its state has advanced…
Slide 23
Tracking with Partner-State-Machines• Partner-state-machine refers to the knowledge about a
partner item• Descriptions of what messages have been received• Descriptions of what obligations exist to the partner• The foundation for workflow to replace distributed transactions
• Two basic observations wrapped up in the “partner-state-machine”• Work across items is workflow based on two-party relationships• The granularity of the workflow participant is an item (fine-
grained)
Item-Z
Item-X Item-YItem-WPSM-X PSM-W
PSM-Z
PSM-X
PSM-X PSM-W
Slide 24
Idempotence, Partners, and Partner-State-Machines• Partner-state-machines manage
idempotence• They keep track of what’s been seen• If it’s a repeat, ignore it
• Repeated messages eliminated via partnership Item-Y
PSM-XSeen Msg-A, B, C…
Item-X
PSM-YSeen Msg-1, 2, 3…
2 31
BC A
Slide 25
Retirement of Items• It is normal for items to retire
• The shipment is shipped• The order completes
• Activities advance to completion• Incoming messages are accepted• No new messages are needed• Typical for the work of an item to complete…
• Retirement usually means “become read-only”• Sometimes old items are deleted
Sometimes Items Exist for Long-Lived Purposes:-- Inventory, Bank-Balance, Customer-- Called “Resource-Items”
Not the topic for this talk… another talk is needed!
Slide 26
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
“Append-Only” Data
• Many Kinds of Computing are “Append-Only”• Lots of observations are made about the world
• Debits, credits, Purchase-Orders, Customer-Change-Requests, etc
• As time moves on, more observations are added• You can’t change the history but you can add new
observations• Derived Results May Be Calculated
• Estimate of the “current” inventory• Frequently inaccurate
• Historic Rollups Are Calculated• Monthly bank statements
Slide 27
Databases and Transaction Logs• Transaction Logs Are the Truth
• High-performance & write-only• Describe ALL the changes
to the data• Data-Base the Current Opinion
• Describes the latest value of the data as perceived by the application
Log
DBThe Database Is a Caching
of the Transaction Log !It is the subset of the latest committed values represented in the transaction
log…
Slide 28
Accountants, Erasers, and Jail
• Accountants Go to Jail if They Use Erasers !!!• The normal accounting practices allow for corrections
but not updates• Corrections are added to the information• The derived values are recalculated
• It is a common application paradigm to keep almost all data as append-only• The transactions themselves are append-only
• Sometimes they are eventually retired.• The rollup (derived) summary may be recalculated• Periodic snapshots of the rollup (derived) data is
appended to the record• E.g. a monthly bank statement
Slide 29
Slide 30
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Versions and Distributed Systems• Can’t have
“the same” dataat many locations• Unless it is
a snapshot• Changing
distributed dataneeds versions• Creates a
snapshot…
Slide 31
ListeningPartnerService-1
ListeningPartnerService-5
ListeningPartnerService-7
ListeningPartnerService-8
Tuesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Monday’sPrice-List
Tuesday’sPrice-List
Wednesday’sPrice-List
Monday’sPrice-List
Tuesday’sPrice-List
Data Owning Service
Price-List
ListeningPartnerService-1
ListeningPartnerService-5
ListeningPartnerService-7
ListeningPartnerService-8
Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Monday’sPrice-ListMonday’sPrice-ListMonday’sPrice-List
Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Wednesday’sPrice-List
Monday’sPrice-ListMonday’sPrice-ListMonday’sPrice-List
Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List
Data Owning Service
Price-List
Data Owning Service
Price-List
DAGs of History
Data“A1”
Data“A1.1”
Data“C1”
Data“B1”
Data“D1.1”
Data“D1”
Data“C2”
Data“B2”
Data“C2.1”
Data“D1.2”
Data“D2”
Data“C3”
Data“A2”
Data“D2.1”
Data“B3”
Data“D3”
Slide 33
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Tentative Operations
• Items don’t share transactions• Now what can we do?
• Items may accept tentative operations• Like a reservation; may be cancelled later
• If cancelled, the receiving item must cope• Special logic to deal with cancellations
Slide 34
Item-BItem-A
Slide 35
Semantics of Tentative Operations• Tentative operations must be reorderable
• When cancelled, a compensation must occur• Other operations may have occurred since
• Operations and cancellations must be reorderable!
Item-B
Cancellation TentativeOp
TentativeOp
1
2
3Item-A
Item-C
Slide 36
Semantics of Cancellation and Confirmation• Cancellation
• Cope with not doing tentative operation• Not undo
• New operation to “make things right”• Accepting tentative means it’s OK to cancel
• Confirmation• Relinquish the right to cancel tentative op• Sometimes time driven
• Hotel rooms confirm in the morning
• Every tentative op confirms or cancels
Slide 37
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Increasing & Decreasing Uncertainty
• Each tentative operation increases your uncertainty• You get more and more confused each time you accept a
tentative operation• Each confirmation or cancellation decreases your
uncertainty• It resolves the confusion imparted by the tentative
operation it is confirming or canceling
UncertaintyMore
UncertainLess
Uncertain
TentativeOperation
Cancellationor Confirmation
Slide 38
Bounded Uncertainty• You can track the worst case situations for data
values you are managing• If you keep inventory, you can know the lowest possible
and highest possible values• Tentative operations move lowest and highest values apart
• This increases uncertainty• Confirmations and cancellations move lowest and highest
values together• This decreases uncertainty
• Knowing the bounds, you have Bounded Uncertainty
Widget Inventory
Pro
babili
ty
MinimumWidgetsPossible
MaximumWidgetsPossible
Slide 39
Acting on Bounded Uncertainty
• Knowing bounds on uncertainty allows many different business rules:• Refuse an order which may (in the worst case) result in
widgets overflowing the warehouse• Calculate probability of worst case overflowing the
warehouse• Cost of temporary storage vs.
value of accepting order…• Order food for hotel restaurant based on reservations and
probabilities• May result in interesting work by applying risk
management algorithms…
Slide 40
Slide 41
Outline Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Managing Uncertainty across Items
Slide 42
Vocabulary and AssertionsN
ew
voca
bu
lary
for
dis
cuss
ing
sca
le
Ass
ert
ion
sab
ou
t la
rge
scale
ap
ps
Scale-agnostic appAn application that does not need to change to support almost-infinite scaling
Almost-infinitescaling
An environment demanding rapidly increasingdata and computation over time
Item
A collection of data referenced by a single key;transactional scope of the scale-agnostic app
Partner-State-Machine
Data used inside one item to describe its workflow state with a single partner item
Alternate indicesaren’t
transactionallyconsistent
As scale increases, the primary and alternateindices cannot be guaranteed to live togetherItems cooperate
using fine-grainedtwo-party workflow
No dist-txs workflow; workflow participants are items; work coordinated across pairs
• Scale agnostic application design• Designing for scale leads you away from distributed transactions• Local transactions are great distributed transactions suck
• Programming for scale leads to separate pieces of data called items• Items must live in separate transactions• Items are only connected with messaging• “Classic” workflow but fine-grained
• Separate items messaging… but messaging is hard!• Messages get lost and need retries• Retries give at least once delivery• Must have idempotent processing of messages
• Coping with idempotent messaging requires “partner-state-machines”• One PSM per-partner per-side holds the state of the
relationship• The scale-agnostic app uses activities to cope with retries• PSMs can compose to mask complexity
Takeaways
Slide 43Slide 43
Complete your evaluation on the My Event pages of the website at the CommNet or the Feedback Terminals to win!
All attendees who submit a session feedback form within 12 hours after the session ends will have the chance to win the very latest HTC 'Touch' smartphone complete with Windows Mobile® 6 Professional