class 18 updates 2 - harvard seasdaslab.seas.harvard.edu/.../cs165fall2016class18.pdf ·...
TRANSCRIPT
updates 2.0prof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 18
/40CS165, Fall 2016 Stratos Idreos 2
updates
UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value
INSERT INTO table_name VALUES (value1,value2,value3,...)
/40CS165, Fall 2016 Stratos Idreos 3
traditional applications e.g., banking
how many times per day do you send update queries to your bank account
/40CS165, Fall 2016 Stratos Idreos 4
the world has changed a little bit by now…
updates
/40CS165, Fall 2016 Stratos Idreos 5
still we spy Facebook more than the # of photos we upload or # of our twitter posts, etc…
/40CS165, Fall 2016 Stratos Idreos 6
which kind of update is more common
update, insert, delete
/40CS165, Fall 2016 Stratos Idreos 6
which kind of update is more common
update, insert, delete
so our new challenge is: reads and inserts + variable read/write ratio
/40CS165, Fall 2016 Stratos Idreos 7
monitor CPU utilization
monitor memory hierarchy utilization
monitor clicks (frequency, locations, specific links, sequences)
/40CS165, Fall 2016 Stratos Idreos 8
insert new entry (a,b,c,d,…) on table x
table xA B C D
…
update N columns, K trees, statistics, …
A
B
D
/40CS165, Fall 2016 Stratos Idreos 9
what can go wrong?failures & >1 queries
/40CS165, Fall 2016 Stratos Idreos 10
transaction any database query that can/should be seen as a single task
parse optimize
plan( read X write Z … )
/40CS165, Fall 2016 Stratos Idreos 11
AtomicityConsistencyIsolationDurability
Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations award
/40CS165, Fall 2016 Stratos Idreos 12
AtomicityConsistencyIsolationDurability
>1 queries
all or nothing
correct
persistent
/40CS165, Fall 2016 Stratos Idreos 13
update all rows where A=v1 & B=v2
to (a=a/2,b=b/4,c=c-3,d=d+2)
/40CS165, Fall 2016 Stratos Idreos 13
A B C Dsearch (scan/index) to find row to update
select+project actions
update all rows where A=v1 & B=v2
to (a=a/2,b=b/4,c=c-3,d=d+2)
/40CS165, Fall 2016 Stratos Idreos 13
A B C Dsearch (scan/index) to find row to update
select+project actions
list of rowIDs (positions)
A B C D
update all rows where A=v1 & B=v2
to (a=a/2,b=b/4,c=c-3,d=d+2)
(sort)
/40CS165, Fall 2016 Stratos Idreos 13
A B C Dsearch (scan/index) to find row to update
select+project actions
list of rowIDs (positions)
A B C D
update all rows where A=v1 & B=v2
to (a=a/2,b=b/4,c=c-3,d=d+2)
we know what to update but nothing happened yet
(sort)
/40CS165, Fall 2016 Stratos Idreos 14
level 1
CPU
A B C Dlevel 2
read page in L1 update
persist to L2
if problem (power/abort) before we write all pages
we are left with an inconsistent state
update all rows where A=v1 & B=v2 to (a=a/2,b=b/4,c=c-3,d=d+2)
/40CS165, Fall 2016 Stratos Idreos 15
recoveryclassic examplejoe owes mike 100$
both joe and mike have a Bank of Bla account
possible actionsjoe -100 mike + 100
mike + 100 joe - 100
what if there is a failure here?
/40CS165, Fall 2016 Stratos Idreos 15
recoveryclassic examplejoe owes mike 100$
both joe and mike have a Bank of Bla account
possible actionsjoe -100 mike + 100
mike + 100 joe - 100
what if there is a failure here?
actually1)read mike; 2) mike+100; 3) write new mike;
/40CS165, Fall 2016 Stratos Idreos 16
level 1
CPU
A B C Dlevel 2
read page in L1 update
persist to L2
update all rows where A=v1 & B=v2 to (a=a/2,b=b/4,c=c-3,d=d+2)
what do we need to remember (log)
WAL: keep persistent notes as we go so we can resume or undo
/40CS165, Fall 2016 Stratos Idreos 17
A B C Dsearch (scan/index) to find row to update
select+project actions
list of rowIDs (positions)
A B C D
update all rows where A=v1 & B=v2
to (a=a/2,b=b/4,c=c-3,d=d+2)
(sort)
restart: get set of positions again, and either resume from last written page or undo all previously written pages
/40CS165, Fall 2016 Stratos Idreos 18
level 1
CPU
A B C Dlevel 2
read page in L1 update
persist to L2
update all rows where A=v1 & B=v2 to (a=a/2,b=b/4,c=c-3,d=d+2)
(update in place/out of place) problems when failure
/40CS165, Fall 2016 Stratos Idreos 19
memory
disk
buffer updateslog
memory
disk
buffer updates
log
SSD
memory
SSD
buffer updates
log
non volatile memory
disk
/40CS165, Fall 2016 Stratos Idreos 20
ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead LoggingC. Mohan, Donald J. Haderle, Bruce G. Lindsay, Hamid Pirahesh, Peter M. Schwarz ACM Transactions on Database Systems, 1992
C Mohan, IBM Research ACM SIGMOD Edgar F. Codd Innovations award 1993
(rumor has it he has his own Facebook server) (also check his noSQL lecture)
/40CS165, Fall 2016 Stratos Idreos 21
query 1
parse optimize
plan( read X write Z … )
query 2
parse optimize
plan( read Z write Z … )
/40CS165, Fall 2016 Stratos Idreos 22
locks
… read X …
… get_rlock(X) read X release_rlock(X) …
… write X …
… get_wlock(X) write X release_wlock(X) … commit
R WR yes noW no no
wait until lock is free
/40CS165, Fall 2016 Stratos Idreos 23
lock granularity
…
databasetable1 table2
…
C1 C2
A survey of B-tree logging and recovery techniquesGoetz Graefe ACM Trans. Database Systems, 2012
/40CS165, Fall 2016 Stratos Idreos 24
2 phase locking1. get all locks
2. do all tasks
3. commit
4. release all locks
(and variations)
be positive! - optimistic concurrency control
HT
/40CS165, Fall 2016 Stratos Idreos 24
2 phase locking1. get all locks
2. do all tasks
3. commit
4. release all locks
(and variations)
be positive! - optimistic concurrency control
HT
3 sections about ACID properties, logging, locking and recovery
online next few days
/40CS165, Fall 2016 Stratos Idreos 25
classic examplejoe owes mike 100$
both joe and mike have a Bank of Bla accountΧwhat about logging and recovery during e-shopping
e.g., shopping cart, wish list, check out
recovery
Quantifying eventual consistency with PBSPeter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica Communications of the ACM, 2014
/40CS165, Fall 2016 Stratos Idreos 26
why noSQLdb noSQL
as apps become more complex
as apps need to be more scalable
complex legacy tuning
expensive …
simple clean
just enough …
newSQL
Mesa: Geo-Replicated, Near Real-Time, Scalable Data WarehousingAshish Gupta et al. International Conference on Very Large Databases (VLDB), 2014
F1: A Distributed SQL Database That ScalesJeff Shute et al. International Conference on Very Large Databases (VLDB), 2013
(more in 265)
/40CS165, Fall 2016 Stratos Idreos 27
remember: it all starts with how we store the data
IDEAL
a data structure that is best for reads, best for writes and does not require too much metadata
/40CS165, Fall 2016 Stratos Idreos 28
Reads
UpdatesMemory
Designing Access Methods: The RUM ConjectureM. Athanassoulis, M. Kester, L. Maas, R. Stoica, Radu, S. Idreos, A. Ailamaki, M Callaghan International Conference on Extending Database Technology (EDBT), 2016
you can’t always get what you want
/33CS165, Fall 2016 Stratos Idreos 29
a perfect access method for reads (point queries)
oracle
x
find(x)
/33CS165, Fall 2016 Stratos Idreos 29
a perfect access method for reads (point queries)
oracle
x
find(x)reads
updates
memory
/33CS165, Fall 2016 Stratos Idreos 29
a perfect access method for reads (point queries)
oracle
x
find(x)reads
updates
memory
/33CS165, Fall 2016 Stratos Idreos 29
a perfect access method for reads (point queries)
oracle
x
find(x)reads
updates
memory
/33CS165, Fall 2016 Stratos Idreos 29
a perfect access method for reads (point queries)
oracle
x
find(x)reads
updates
memory
/33CS165, Fall 2016 Stratos Idreos 30
a perfect access method for reads (point queries)
binary search to find(x)
but with no memory overhead
sorted
/33CS165, Fall 2016 Stratos Idreos 30
a perfect access method for reads (point queries)
binary search to find(x)
reads
updates
memory
but with no memory overhead
sorted
/33CS165, Fall 2016 Stratos Idreos 30
a perfect access method for reads (point queries)
binary search to find(x)
reads
updates
memory
but with no memory overhead
sorted
/33CS165, Fall 2016 Stratos Idreos 30
a perfect access method for reads (point queries)
binary search to find(x)
reads
updates
memory
but with no memory overhead
sorted
/33CS165, Fall 2016 Stratos Idreos 30
a perfect access method for reads (point queries)
binary search to find(x)
reads
updates
memory
but with no memory overhead
sorted
/33CS165, Fall 2016 Stratos Idreos 31
a perfect access method for writes (point writes)
x
update(x)
x xupdate log
/33CS165, Fall 2016 Stratos Idreos 31
a perfect access method for writes (point writes)
x
update(x)
reads
updates
memoryx x
update log
/33CS165, Fall 2016 Stratos Idreos 31
a perfect access method for writes (point writes)
x
update(x)
reads
updates
memoryx x
update log
/33CS165, Fall 2016 Stratos Idreos 31
a perfect access method for writes (point writes)
x
update(x)
reads
updates
memoryx x
update log
/33CS165, Fall 2016 Stratos Idreos 31
a perfect access method for writes (point writes)
x
update(x)
reads
updates
memoryx x
update log
/33CS165, Fall 2016 Stratos Idreos 32
to structure or not to structure
sorted fixed-width & dense
no order fixed-width & dense
sorted fixed-width with holes
insert v, delete v, update v to v’
/40CS165, Fall 2016 Stratos Idreos 33
The Transaction Concept:Virtues and LimitationsJim Gray, Tandem TR 81.3, 1981
in-place updates: the cardinal sin
A(t1) A(t2) A(t3)
…
/40CS165, Fall 2016 Stratos Idreos 34
content vs structure update
insert tuple(a1, b1, c1, …) insert(A,a1), insert(B, b1), …
say there is a secondary index on A
(1) append a1 anywhere to index (any node/buffer) (2) reorganize index to maintain structure
…
/40CS165, Fall 2016 Stratos Idreos 35
great for inserts/updates great for “recent” reads
optimize as much as possible querying “old” data
give read/write cost put(key, value) - get(key, value)
design a key-value store (265 systems project)
/40CS165, Fall 2016 Stratos Idreos 36
L1
L2
all writes go to a small buffer so updates are fast and
reads can find recent updates fast
data is ordered on ingestion order and no in place updates
getting old data is a full scan and we may have duplicates to clean up
/40CS165, Fall 2016 Stratos Idreos 37
L1
L2multi-level L2 structure
push to next level when full remove duplicates with every merge
/40CS165, Fall 2016 Stratos Idreos 38
L1
L2to improve reads:
each level may be sorted, it may be a b-tree
have a bloom filter, etc
worst case scenario: have to scan the whole thing: empty queries
/40CS165, Fall 2016 Stratos Idreos 38
L1
L2to improve reads:
each level may be sorted, it may be a b-tree
have a bloom filter, etc
worst case scenario: have to scan the whole thing: empty queries
/40CS165, Fall 2016 Stratos Idreos 38
L1
L2to improve reads:
each level may be sorted, it may be a b-tree
have a bloom filter, etc
worst case scenario: have to scan the whole thing: empty queriesmore about this design space in 265
/40CS165, Fall 2016 Stratos Idreos 39
textbook: chapters 16, 17, 18
DATA SYSTEMSprof. Stratos Idreos
class 18
updates 2.0