smile: a data sharing platform for mobile apps in the cloud
DESCRIPTION
SMILE: A Data Sharing Platform for Mobile Apps in the Cloud. Mohamed Sarwat UMN. Haopeng Zhang UMass, Amherst. Jagan Sankaranaryanan Hakan Hacıgümüs NEC Labs America. Motivation For Sharing in Cloud. Mobile apps run their databases in the cloud Often small databases - PowerPoint PPT PresentationTRANSCRIPT
SMILE: A Data Sharing Platform for Mobile Apps in the Cloud
Jagan SankaranaryananHakan Hacıgümüs NEC Labs America
Haopeng ZhangUMass, Amherst
Mohamed SarwatUMN
Motivation For Sharing in Cloud• Mobile apps run their databases in the cloud
– Often small databases– Often hosted in the same cloud infrastructure– Often need “fresh” data from other apps
• e.g., Calendar app wants the itinerary from airline booking app
• Need a declarative way for apps to share data
Database As a ServiceApp 1 DB App 2 DB App n DBM
ultit
enan
t D
atab
ase
Sharing MiddLe warE (SMILE)
Declarative Sharing
CloudDB
D3
D1
D2
Sharing (S1):
D1 D2 D3Datasets
Transform
Transformation: (SPJ)
Sharing (S2): …
Sharing (Sn): …
Staleness SLA
Three ways of Enabling Sharing
AppAliceData
AppBobData
SQL
DirectSharing
AppAliceData
AppBobData
Web
Ser
vice
API
Sharingvia API
Sharing using a Materialized Shared Space (i.e., view)
SMIL
E AppAliceData
AppBobData
SQL
Mat
eria
lized
Shar
ed S
pace
SQL
Service provider’s cost in keeping shared space consistent
What requirements materialization satisfies?
Sharing ExampleSimple Sharing Scenario
ps(
SP ✖
UP)
SP
SP = Stock Price
UP
UP = User Portfolio
SP ✖ UP
Sharing (S1): 1. Sources: SP, UP2. Transformation: ps(SP ✖ UP)3. Staleness: <= 5 Seconds
< 5
seco
nds
Sharing Example (Contd.)
SP UPSP
COPY
JOIN
SP ✖ UP
COPY
SP UP
COPY
UPSPSP ✖ UP
JOIN
SP SP
COPY
UP
DISTRIBUTEDJOIN
SP ✖ UP
COPY
$$, 3 second staleness $$$, 1 secondstaleness
$, 10 second staleness
Problem Formulation
• Given n sharings S:– S = {S1 Sn}– Each sharing specifies a staleness requirement in seconds
• e.g., 5 seconds
• Datasets are relations in RDBMS– Updated asynchronously (i.e., independently)
• Goal: Enable all sharing such that– Using MVs that are always consistent– All MVs under the staleness SLA– At the cheapest cost for service provider
Postgres 2
Postgres 3 Postgres 4
Postgres 1
Pos
tgre
sql
Dat
abas
e
R
SMILE
Sharing Planoptimizer
InputSharings
Sharing PlanG
atew
ay
Upd
ates
Copy
Delta
¢R
RLOG
¢R
CaptureDelta
SMILE System Architecture
Sharing Plan Optimizer
• R*-style optimizer– Varies join ordering and operator placement– Using a dynamic programming formulation
• Uses four operators to express SPJ transformations in sharings– DeltaToRel, Join, Union, CopyDelta
• Two cost models: – Dollar Cost of a plan– Time Cost of a plan
COPYDELTA
DETATOREL DETATOREL
COPYDELTA
Machine m1Machine m2
ΔA
ΔAA ΔBB
ΔB
Δ(A⋈ΔB) Δ(ΔA⋈B)
Δ(ΔA⋈B)
Δ(A⋈B) A⋈B
JOINJOIN
UNION
DETATOREL
COPYDELTA
Machine m3
Δ(A⋈ΔB)
COPYDELTA
Sharing Plan
Cost Models: Dollar and Time
Dollar cost is expense to provider to execute the sharing plan, in $/second– Use Amazon EC2 pricing
Time cost is critical data path time in seconds– Using a synthetic time model for
each operator type
staleness$
Time Cost Model
CopyDelta DeltaToRel
Join Union
We use a simple linear cost model to estimate the time taken by each operator
Generating Global Sharing Plan• Input: Set of n sharings• Step 1: For each sharing generate a sharing plan so that:
– Plan is admissible • Means that its critical time path is less than the Staleness SLA
– Generate two plans• DPD: Cheapest Dollar Cost Plan• DPT: Smallest Critical Time Path Plan
– Discard if not admissible but choose DPD is both admissible
• Step 2: Make cheaper by merging commonalities with other sharing plans in the style of Multi-query optimization– We call merging operation as ``plumbing’’
Rem
ove
Rem
ove
Plumbing Operation
SRC(pi)DST(pi)COPYDELTA
piDST(pi)
JOIN
pi
SRC (pi)
Plumbing increases the critical time path of the left plan, so valid as long as left plan is still under its staleness SLA
Perform plumbing in a greedy fashion one at a time starting with the one resulting in most cost savings
Postgres 2
Postgres 3 Postgres 4
Postgres 1
Pos
tgre
sql
Dat
abas
e
R
LOG
¢R
SMILE
Sharing Planoptimizer
Sharings SharingExecutor
Sharing Plan
Pub/Sub
Push
HeartbeatAgent
Agent Agent
Agent
Gat
eway
Upd
ates
Copy
Delta
¢R
R
CaptureDelta
SMILE System Architecture
Sharing Executor• Accounts for runtime variations in the system
– Change in the input update rate– Machine or resource contention or unavailability
• Basically obtains current timestamp of vertices and issues “push” operation– Push operation specifies how much to “synchronously” advance the
timestamp of each vertex in the sharing plan– Tries to combine work as much as possible
• Uses a feedback loop to automatically account for runtime variations
Staleness and Push
• Current STALENESS = MAX_TS(SRCS) - TS(DEST)• PUSH: How much to advance TS(DEST)?
• Cannot be more than MIN_TS(SRCS) – TS(DEST)• Look at the paper for a sharing executor that is lazy by
design and refreshes MVs just as it is about miss the staleness SLA
Experiments
• Twitter GardenHose Stream• 6 machines
– One machine generates updates and hosts base relations
– 5 machines for hosting sharing plan operators• Rate: 50—10k tweets/sec• Sharings: 5—50 sharings• SLA: 10—60 seconds
Base relations
• Unpack incoming Tweets into 9 base relations
Sharing Arrangements
• 25 sharing arrangements as SPJ transformations on base relations
Sharing Plan
25 Sharings, 6 machines
Staleness: 10k tweets/sec, 25 Sharings
Why Some Sharings have a large gap?
Tuples Moved across Sharing Plan
Staleness before vs. after push
For Varying Update Rates
• SLA violation is low even for large update rates
Actual Running Cost
Related Work
• View Maintenance• View Selection• Cache Placement• Data Quality/Staleness• Data Integration• Distributed Databases• Multi-query optimization• Other data sharing effort
View Maintenance
• When sources not always at a consistent snapshot– Need to use compensation [Zhuge et al., SIGMOD 1995]– Rolling join [Salem et al., SIGMOD 2000]
• Shows how to compose n-way asynchronous propagation queries• Sharing plan is based on this work
• How to reduce maintenance cost?– Merge common sub-expressions in the update mechanism of different
MV’s to reduce cost [Ross et al., SIGMOD 1996][Mistry et al., SIGMOD 2001]
• Staleness in data warehouse setup:– Labrinidis et al., UMD CS TR, 1998]
Summary
• SMILE is a declarative data sharing platform in the cloud
• Sharings can specify a transformation and a staleness SLA
• SMILE uses both static and runtime optimizations
• Experimental results show that it can handle high update rates and large number of sharings