practical postgresql - drexel ccitwc24/cs500_su11/slides/michaelgleasman.pdf · practical...

108
Practical PostgreSQL at myYearbook.com [email protected] CS 500 Database Theory Department of Computer Science Drexel University 2010–08–05 Thursday, August 5, 2010

Upload: lamhanh

Post on 17-Oct-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Practical PostgreSQLat [email protected]

CS 500 Database TheoryDepartment of Computer Science

Drexel University2010–08–05

Thursday, August 5, 2010

· myYearboook.com

· mind/body

· failure

· myb topology

· myb ecology

· measurement

· admin/maintenance

Thursday, August 5, 2010

ecology: proxy, pool, cache, queuequestions any time

myYearbook.com

Thursday, August 5, 2010

by genderby gender

52% female

48% male

by ageby ageby age

37% 13–17

27% 18–24

13% 25–34

12% 35–44

11% 45+

myYearbook.comcasual social network

founded in 2006

Google Analytics

Thursday, August 5, 2010

myYearbook Teens and Twitter Survey (August 2009, ~%)myYearbook Teens and Twitter Survey (August 2009, ~%)myYearbook Teens and Twitter Survey (August 2009, ~%)myYearbook Teens and Twitter Survey (August 2009, ~%)

myYearbookmyYearbook FacebookFacebook

70% meet people 80% keep in touch with friends

50% play games 35% meet people

40% keep in touch with friends 30% share photos

35% flirt/date 25% play games

Thursday, August 5, 2010

comScore Teens Category (April 2009)comScore Teens Category (April 2009)comScore Teens Category (April 2009)comScore Teens Category (April 2009)comScore Teens Category (April 2009)comScore Teens Category (April 2009)

rank site visits (K) uniques (K) minutes (M) page views (M)

1 myYearbook 55,808 4,604 851 1,630

2 MEEZ 8,629 1,407 226 469

3 Zwinky 11,558 3,691 153 108

4Hearst Teen Network

6,940 2,314 53 117

5 Quizilla 7,924 2,058 75 75

Thursday, August 5, 2010

comScore page viewsJuly 2009

comScore page viewsJuly 2009

comScore page viewsJuly 2009

rank site views (M)

20 GaiaOnline 1,105

21 Chase 1,056

22 ESPN 984

23 myYearbook 953

24 Wikipedia 903

25 Onemanga 840

26 Mapquest 833

27 Foxsports 815

comScore time spentJuly 2009

comScore time spentJuly 2009

comScore time spentJuly 2009

rank site minutes (M)

20 Amazon 806

21 CNN 744

22 GaiaOnline 709

23 Bing 707

24 MSNBC 691

25 myYearbook 678

26 Iwin 670

27 NickJr 665

Thursday, August 5, 2010

mind/body

Thursday, August 5, 2010

relational theoryvalues & operators

Thursday, August 5, 2010

normalizationprevent update anomalies

Thursday, August 5, 2010

logical

Thursday, August 5, 2010

I do declare!

Thursday, August 5, 2010

logicalversus

physicalThursday, August 5, 2010

Hive

MonetDB

Truviso

Thursday, August 5, 2010

Hive: SQL over HadoopMonetDB: column storeTruviso: streaming data

plannercompiler

Thursday, August 5, 2010

thoughtworddeed

Thursday, August 5, 2010

EXPLAIN

Thursday, August 5, 2010

denormalization/

materialization

Thursday, August 5, 2010

naturalversus

surrogate

Thursday, August 5, 2010

+ size on disk, fast comparison, don’t need ON UPDATE CASCADE- additional lookups

Relative data access latencies

1 CPU Register

1–2 L1 cache

6–10 L2 cache

25–50 main memory (RAM)

10,000,000 hard disk

100,000,000 LAN

1,000,000,000 WAN

Thursday, August 5, 2010hard disk 1E7, LAN 1E7-1E8, WAN 1E9-2E9source: http://www.slideshare.net/guest22d4179/latency-trumps-all

main memory

disk

kernelbuffer

postgres

postgresbuffers

tempdataWAL

Thursday, August 5, 2010

persistence

Thursday, August 5, 2010

Fusion-iosolid state cards

Thursday, August 5, 2010

http://www.fusionio.com/Steve Wozniak is Chief Scientist

Fusion-io

Postgres buffer

kernel buffer

CPU

Fusion-io card

CPU

application buffer

Fusion-io card

CPU

direct IO buffered IO

Thursday, August 5, 2010

SSDsolid state drives

look promising

Thursday, August 5, 2010

DASdirect attached storage

SANstorage area network

Thursday, August 5, 2010

fsync

Thursday, August 5, 2010

checkpoint

Thursday, August 5, 2010

MVCC

Thursday, August 5, 2010

rollback optimized

VACUUM

Thursday, August 5, 2010

VLAMF

Thursday, August 5, 2010

vacuum like a mo-fono autovacuum: scheduled for predictabilityautovacuum is good for general case, more difficult at scale

reindexing

Thursday, August 5, 2010

failure

Thursday, August 5, 2010

failure happens

plan for it

Thursday, August 5, 2010

backup

warm standby

Thursday, August 5, 2010

pg_dumpdisk IO

network IO

Thursday, August 5, 2010

activeactive recovery

postgres

WAL

1

data

2

postgres

WAL

recovery

data

Thursday, August 5, 2010

recoveryactive recovery

postgres

WAL

1

data

2

postgres

WAL

recovery

data

Thursday, August 5, 2010

warm standbyactive

standby

postgres

WAL

1

data

2

WAL

copy

postgres

recovery

data

Thursday, August 5, 2010

log shipping

SANfor warm standby

Thursday, August 5, 2010

warm standby

SAN

VM

data 1 data 2 data 3

warm standby 1 warm standby 2 warm standby 3

postgres 1

log shipping

postgres 2

log shipping

postgres 3

log shipping

postgres spare

Thursday, August 5, 2010

SAN

VM

data 1 data 2 data 3

warm standby 1 warm standby 2 warm standby 3

postgres 1

log shipping

postgres 2

log shipping

postgres 3

postgres spare

failover

Thursday, August 5, 2010

Postgres performs single IO writes

kernel IO scheduler can’t schedule writes

writes queued in kernel

Thursday, August 5, 2010

HAhigh availability

automate failoverequivalent standby hardware

Red Hat Cluster, Steeleye, Veritas

Thursday, August 5, 2010

financial companies

If it’s not automated, it’s wrong.

Thursday, August 5, 2010

myb topology

Thursday, August 5, 2010

app-server 1

postgres 1 postgres 2

app-server 2 app-server 3

Thursday, August 5, 2010

more activity? views/month 2007 100M ➙ 2009 1.5G

more TPSmore serversmore connectionsmore configurationmore pain!

Thursday, August 5, 2010

SkypePL/Proxy

PgBouncerThursday, August 5, 2010

PL/Proxy

Thursday, August 5, 2010

app-server 1

postgres 1

app-server 2 app-server 3

postgres 2

PL/Proxy

less configuration

fewer connections

Thursday, August 5, 2010

CREATE FUNCTION pref.set_member_preference(in_member_id BIGINT, in_preference_name TEXT, in_preference_value_name TEXT) RETURNS BOOLEANLANGUAGE plpgsql STRICT VOLATILE AS $$DECLARE v_preference_id pref.preferences.preference_id%TYPE := pref.preference_id(in_preference_name); v_preference_value_id pref.preference_values.preference_value_id%TYPE := pref.preference_value_id(v_preference_id, in_preference_value); v_did_loop BOOLEAN DEFAULT FALSE;BEGIN << upsert >> LOOP UPDATE pref.member_preferences SET preference_value_id = v_preference_value_id WHERE (member_id, preference_id) = (in_member_id, v_preference_id); v_did_update := FOUND; EXIT upsert WHEN v_did_update OR v_did_loop; BEGIN INSERT INTO pref.member_preferences (member_id, preference_id, preference_value_id) VALUES (in_member_id, v_preference_id, v_preference_value_id); EXIT upsert WHEN FOUND; EXCEPTION WHEN unique_violation THEN v_did_loop := TRUE; -- loop to update END; RETURN v_did_update;END;$$;

Thursday, August 5, 2010

CREATE FUNCTION pref.member_preference(in_member_id BIGINT, in_preference_name TEXT, OUT preference_value_name TEXT) RETURNS BOOLEANLANGUAGE plpgsql STRICT STABLE AS $$DECLARE v_preference_id pref.preferences.preference_id%TYPE := pref.preference_id(in_preference_name);BEGIN SELECT INTO preference_value_name pref.preference_value_name(v_preference_id, mp.preference_value_id) FROM pref.member_preferences AS mp WHERE (mp.member_id, mp.preference_id) = (in_member_id, v_preference_id);

IF NOT FOUND THEN preference_value_name := pref.default_preference_value(v_preference_id, in_preference_name); END IF;

RETURN;END;$$;

Thursday, August 5, 2010

CREATE FUNCTION pref.member_preference(in_member_id BIGINT, in_preference_name TEXT) RETURNS TEXTLANGUAGE plproxy STRICT VOLATILE AS $$ CLUSTER 'pref'; RUN ON pref.partition_by_member_id(in_member_id);$$;

CREATE FUNCTION pref.set_member_preference(in_member_id BIGINT, in_preference_name TEXT, in_preference_value_name TEXT) RETURNS BOOLEANLANGUAGE plproxy STRICT VOLATILE AS $$ CLUSTER 'pref'; RUN ON pref.partition_by_member_id(in_member_id);$$;

SELECT get_cluster_partitions FROM plproxy.get_cluster_partitions('pref'); get_cluster_partitions --------------------------------------------------- host=10.10.10.10 port=6543 dbname=pref01 user=web host=10.10.10.10 port=6543 dbname=pref02 user=web host=10.10.10.10 port=6543 dbname=pref03 user=web host=10.10.10.10 port=6543 dbname=pref04 user=web

Thursday, August 5, 2010

app-server 1

postgres 1

app-server 2 app-server 3

postgres 2 postgres 3 postgres 4

fewer connections!

Thursday, August 5, 2010

connection overhead

1 process per backend

lock management

Thursday, August 5, 2010

app server 1

postgres

app server 2 app server 3 app server 4 app server n

Thursday, August 5, 2010

app server 1

pgbouncer

app server 2 app server 3 app server 4 app server n

postgres

Thursday, August 5, 2010

app-server 1

pgbouncer 1

app-server 2 app-server 3

postgres 1

postgres 2 postgres 3 postgres 4

Thursday, August 5, 2010

app-server 1

pgbouncer 1

app-server 2 app-server 3

postgres 1

internal pgbouncer

postgres 2 postgres 3 postgres 4

Thursday, August 5, 2010

app-server 1

pgbouncer 1

app-server 2 app-server 3

postgres 1 postgres 2 postgres 3 postgres 4

pgproxy

internal pgbouncer

reduce TPS/server

less configuration

Thursday, August 5, 2010

app-server 1 app-server 2 app-server 2 app-server n

external pgbouncer tier

app-server 1

app-pgbouncer 1

load balancer

app-server 2

app-pgbouncer 2

app-server 3

app-pgbouncer 3

app-server n

app-pgbouncer n

pgbouncer 3pgbouncer 2pgbouncer 1

pgproxy

internal pgbouncer

postgres 1 postgres 2 postgres 3 postgres 4 postgres n

Thursday, August 5, 2010

app-server 1 app-server 2 app-server 2 app-server n

external pgbouncer tier

internal pgbouncer tier

app-server 1

app-pgbouncer 1

load balancer

app-server 2

app-pgbouncer 2

app-server 3

app-pgbouncer 3

app-server n

app-pgbouncer n

pgbouncer 3pgbouncer 2pgbouncer 1

pgproxy

load balancer

pgbouncer 4

postgres 1 postgres 2 postgres 3 postgres n

pgbouncer 5

Thursday, August 5, 2010

app-server 1 app-server 2 app-server 3 app-server n

external pgbouncer tier

pgproxy 2

internal pgbouncer tier

app-server 1

app-pgbouncer 1

load balancer

app-server 2

app-pgbouncer 2

app-server 3

app-pgbouncer 3

app-server n

app-pgbouncer n

load balancer

pgbouncer 3 pgbouncer 2pgbouncer 1 pgbouncer

pgbouncer 4pgbouncer 5

pgproxy pgproxy

postgres 1 postgres 2 postgres 3 postgres n

Thursday, August 5, 2010

app-server 1 app-server 2 app-server 3 app-server n

external pgbouncer tier

pgproxy 3pgproxy 2pgproxy 1

internal pgbouncer tier

app-server 1

app-pgbouncer 1

load balancer

app-server 2

app-pgbouncer 2

app-server 3

app-pgbouncer 3

app-server n

app-pgbouncer n

pgbouncerpgbouncerpgbouncer

pgproxy

load balancer

pgproxypgproxy

pgbouncer 4pgbouncer 5

postgres 1 postgres 2 postgres 3 postgres n

Thursday, August 5, 2010

28 servers avg 90% idle

464 cores

3.3 TB memory

3.8 TB on disk

35 TB total disk

15K avg TPS (> 27K)Fall 2009

Thursday, August 5, 2010

shardinginter and intra server

Thursday, August 5, 2010

roll off old datalog.events_201005log.events_201006log.events_201007log.events_201008log.events_201009

Thursday, August 5, 2010

truncate, drop is better than delete

connection pooling

➙ pgBouncer

Thursday, August 5, 2010

simplify interface

➙ PL/Proxy

function API

Thursday, August 5, 2010

myb ecology

Thursday, August 5, 2010

reduce TPS!

➙ memcached

1 TB

get 140K/s, set 15K/s

PgFouine

Thursday, August 5, 2010

memcached

set

get

clear

memcached

application

1 get 3 set

postgres

2 get

Thursday, August 5, 2010

asynchronous

➙ message queues

Thursday, August 5, 2010

message queues

consumer count

queue depth

async

queue broker

application

postgres

sync

queue 1 queue n

consumer 1 consumer n

Thursday, August 5, 2010

measurement

Thursday, August 5, 2010

measure

Thursday, August 5, 2010

tuning(troubleshooting)one variable at a time, people

Thursday, August 5, 2010

benchmarksrepeatability

statistics

practical usage

Thursday, August 5, 2010

OLTPversus

OLAPThursday, August 5, 2010

online transaction processingonline analytical processing

pgreplayreplay log files

Thursday, August 5, 2010

7analyze

host

bloat

DTrace/SystemTap

logs

contrib

statistics collector

Thursday, August 5, 2010

cpumemory

ioThursday, August 5, 2010

SNMP

RRDtool

Staplr

Thursday, August 5, 2010

Simple Network Management Protocol

Thursday, August 5, 2010

Thursday, August 5, 2010

logs

Thursday, August 5, 2010

log_min_duration_statement

log_duration

log_lock_waits

deadlock_timeout

log_temp_files

log_connections

log_disconnections

Thursday, August 5, 2010

LOG: EXECUTOR STATISTICSDETAIL: ! system usage stats: ! 0.017621 elapsed 0.004762 user 0.000816 system sec ! [6.012501 user 0.336354 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps ! 0 [1] signals rcvd, 0/10 [4/14944] messages rcvd/sent ! 2/0 [210/0] voluntary/involuntary context switches ! buffer usage stats: ! Shared blocks: 9 read, 0 written, buffer hit rate = 0.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 writtenSTATEMENT: select * from posuta.index_statistics limit 1000;LOG: duration: 42.422 ms

log_statement_statslog_parser_statslog_planner_statslog_executor_stats

Thursday, August 5, 2010

18.7. Error Reporting and Logging18.8. Run-Time Statistics

csv2009-05-19 10:25:35.470 EDT,"grzm","posuta_production",99595,"[local]",4a12c078.1850b,28,"SELECT",2009-05-19 10:21:44 EDT,2/30525,0,LOG,00000,"EXECUTOR STATISTICS","! system usage stats:! 1.786288 elapsed 0.065964 user 0.074493 system sec! [6.079580 user 0.412469 sys total]! 2/0 [2/0] filesystem blocks in/out! 0/0 [0/0] page faults/reclaims, 0 [0] swaps! 0 [1] signals rcvd, 0/13 [5/14960] messages rcvd/sent! 1008/0 [1230/0] voluntary/involuntary context switches! buffer usage stats:! Shared blocks: 1073 read, 0 written, buffer hit rate = 0.00%! Local blocks: 0 read, 0 written, buffer hit rate = 0.00%! Direct blocks: 0 read, 0 written",,,,,"select * from posuta.index_statistics where index_id = 265 limit 1000;",,

Thursday, August 5, 2010

Splunk

Thursday, August 5, 2010

http://www.splunk.com/

admin/maintenance

Thursday, August 5, 2010

bloatpg_database_size

pg_relation_size

pg_total_relation_size

pg_column_size

pg_size_pretty

Thursday, August 5, 2010

bloat report

Thursday, August 5, 2010

key indexeskey constraints

➙ unique indexes

cached plans (≤8.2)

Thursday, August 5, 2010

reindex

Thursday, August 5, 2010

snapshot

Thursday, August 5, 2010

Chapter 26. Monitoring Database Activity

pgStatIO Heavy Index Hitters - 10/20/09 20:02:06 - Interval: 5s----------------------------------------------------------------------------------Table Last Difference Total Hits----------------------------------------------------------------------------------1. schema_a.table_a 146502 7100138408792. schema_b.table_b 92171 554732596073. schema_a.table_c 38684 1069502426904. schema_a.table_d 32797 1105412280955. schema_a.table_e 25096 439398036636. schema_a.table_f 20940 755012502347. schema_a.table_g 10982 1265582078968. schema_a.table_h 9337 147866373045

topHeapHitters

topIndexHitters

Thursday, August 5, 2010

0300

Thursday, August 5, 2010

schema management

transactional DDL

Thursday, August 5, 2010

upgradeslony

pg_upgrade

Thursday, August 5, 2010

configurationversion control

Thursday, August 5, 2010

look around you

Thursday, August 5, 2010

shoulders of giants

Thursday, August 5, 2010

Oracle · IBM DB2

InnoDB · BerkleyDB

Hadoop · Cassandra

BOOM · MongoDB

Thursday, August 5, 2010

StonebrakerIngres

Postgres, Illustra

c-store, Vertica

h-store, VoltDBThursday, August 5, 2010

ActiveMQ http://activemq.apache.org/

Fusion-io http://www.fusionio.com/

MonetDB http://monetdb.cwi.nl/

myYearbook http://myyearbook.com/

memcached http://memcached.org/

nagios http://www.nagios.org/

PgBouncer http://wiki.postgresql.org/wiki/PgBouncer

pgFouine http://pgfouine.projects.postgresql.org/

pgreplay http://pgreplay.projects.postgresql.org/

PL/Proxy http://pgfoundry.org/projects/plproxy/

RabbitMQ http://www.rabbitmq.com/

Reconnoiter https://labs.omniti.com/labs/reconnoiter

RRDtool http://oss.oetiker.ch/rrdtool/

Slony http://www.slony.info/

Splunk http://www.splunk.com/

Truviso http://www.truviso.com/

Thursday, August 5, 2010