scaling postgresql with skytools

74
Scaling with SkyTools & More Scaling-Out Postgres with Skype’s Open-Source Toolset Gavin M. Roy September 14th, 2011

Upload: gavin-roy

Post on 27-Jan-2015

117 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Scaling PostgreSQL with Skytools

Scaling with SkyTools& More

Scaling-Out Postgres with Skype’s Open-Source Toolset

Gavin M. RoySeptember 14th, 2011

Page 2: Scaling PostgreSQL with Skytools

About Me

• PostgreSQL ~ 6.5

• CTO @myYearbook.com

• Scaled initial infrastructure

• Not as involved day-to-day database operational and development

• Twitter: @Crad

Page 3: Scaling PostgreSQL with Skytools

Scaling?

Page 4: Scaling PostgreSQL with Skytools

Concurrency

6am 8am 10am 12pm 2pm 4pm 6pm 8pm 10pm 12am 2am 4am 6am

Hourly breakdown

Req

uest

s pe

r Se

cond

Page 5: Scaling PostgreSQL with Skytools

Increasing Size-On-Disk

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Size

in G

B

Page 6: Scaling PostgreSQL with Skytools

Scaling andPostgreSQL Behavior

Page 7: Scaling PostgreSQL with Skytools

Size on Disk

Page 8: Scaling PostgreSQL with Skytools

Tuples, Indexes, Overhead

Page 9: Scaling PostgreSQL with Skytools

Table Size+

Size of all combined Indexes

Relations Indexes

Page 10: Scaling PostgreSQL with Skytools

Constraints

• Available Memory

• Disk Speed

• IO Bus Speed

Page 11: Scaling PostgreSQL with Skytools

Keep it in memory.

Page 12: Scaling PostgreSQL with Skytools

Get Fast Disks & I/O.

Page 13: Scaling PostgreSQL with Skytools

Process Forking+

Locks

Page 14: Scaling PostgreSQL with Skytools

Client Connections

Page 15: Scaling PostgreSQL with Skytools

One Connection per Concurrent Request

Page 16: Scaling PostgreSQL with Skytools

Apache+PHPOne connection per backend for each pg_connect

Page 17: Scaling PostgreSQL with Skytools

PythonOne connection per connection*

Page 18: Scaling PostgreSQL with Skytools

ODBCOne connection to Postgres per ODBC connection

Page 19: Scaling PostgreSQL with Skytools

Master Process

Stats Collector

Autovacuum

Wall Writer

Wall Writer

Connection Backend Client Connection

Lock Contention?

Each backend for a connected client has to check for locks

Page 20: Scaling PostgreSQL with Skytools

Master Process

Stats Collector

Autovacuum

Wall Writer

Wall Writer

Connection Backend Client Connection

Connection Backend Client Connection

New Client Connection?

Access ShareAccess Exclusive

ExclusiveShare

Share Row ExclusiveShare UpdateRow Share

Row Exclusive

Page 21: Scaling PostgreSQL with Skytools

Master Process

Stats Collector

Autovacuum

Wall Writer

Wall Writer

Connection Backend Client Connection

Connection Backend Client Connection

Connection Backend Client Connection

...

Too many connections?

Slow performance

Page 22: Scaling PostgreSQL with Skytools

250 Apache Backendsx

1 Connection per Backendx

250 Servers=

62,500 Connections

Page 23: Scaling PostgreSQL with Skytools

Solvable Problems!

Page 24: Scaling PostgreSQL with Skytools

The Trailblazers

Page 25: Scaling PostgreSQL with Skytools

Solving Concurrency

Page 26: Scaling PostgreSQL with Skytools

pgBouncer

Page 27: Scaling PostgreSQL with Skytools

Session Pooling

Page 28: Scaling PostgreSQL with Skytools

Transactional Pooling

Page 29: Scaling PostgreSQL with Skytools

Statement Pooling

Page 30: Scaling PostgreSQL with Skytools

Connection Pooling

Clients Clients Clients

Postgres Server #1

pgBouncer

Postgres Server #2

Postgres Server #3

Tens TensTens

Hundreds HundredsHundreds

Page 31: Scaling PostgreSQL with Skytools

Add Local Pooling

Local pgBouncer Local pgBouncer Local pgBouncer

Postgres Server #1

pgBouncer

Postgres Server #2

Postgres Server #3

ClientsClients Clients

Tens TensTens

Hundreds HundredsHundreds

Tens TensTens

Page 32: Scaling PostgreSQL with Skytools

Easy to runUsage: pgbouncer [OPTION]... config.ini -d, --daemon Run in background (as a daemon) -R, --restart Do a online restart -q, --quiet Run quietly -v, --verbose Increase verbosity -u, --user=<username> Assume identity of <username> -V, --version Show version -h, --help Show this help screen and exit

Page 33: Scaling PostgreSQL with Skytools

userlist.txt

“username” “password”“foo” “bar”

Page 34: Scaling PostgreSQL with Skytools

pgbouncer.ini

Page 35: Scaling PostgreSQL with Skytools

Specifying Connections[databases]; foodb over unix socketfoodb =

; redirect bardb to bazdb on localhostbardb = host=localhost dbname=bazdb

; access to dest database will go with single userforcedb = host=127.0.0.1 port=300 user=baz password=foo client_encoding=UNICODE datestyle=ISO connect_query='SELECT 1'

Page 36: Scaling PostgreSQL with Skytools

Base Daemon Config

[pgbouncer]logfile = pgbouncer.logpidfile = pgbouncer.pid; ip address or * which means all ip-slisten_addr = 127.0.0.1listen_port = 6432; unix socket is also used for -R.;unix_socket_dir = /tmp

Page 37: Scaling PostgreSQL with Skytools

Authentication

; any, trust, plain, crypt, md5auth_type = trust#auth_file = 8.0/main/global/pg_authauth_file = etc/userlist.txtadmin_users = user2, someadmin, otheradminstats_users = stats, root

Page 38: Scaling PostgreSQL with Skytools

Stats Users?

SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|VERSIONSHOW FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEM

pgbouncer=# SHOW CLIENTS; type | user | database | state | addr | port | local_addr | local_port | connect_time ------+-------+-----------+--------+-----------+-------+------------+------------+--------------------- C | stats | pgbouncer | active | 127.0.0.1 | 47229 | 127.0.0.1 | 6000 | 2011-09-13 17:55:46

* Truncated columns for display purposes

Page 39: Scaling PostgreSQL with Skytools

psql 9.0+ Problem?

psql -U stats -p 6432 pgbouncerpsql: ERROR:  Unknown startup parameter

Add to pgbouncer.ini:

ignore_startup_parameters = application_name

Page 40: Scaling PostgreSQL with Skytools

Pooling Behaviorpool_mode = statement

server_check_query = select 1server_check_delay = 10

max_client_conn = 1000default_pool_size = 20

server_connect_timeout = 15server_lifetime = 1200server_idle_timeout = 60

Page 41: Scaling PostgreSQL with Skytools

Skytools

Page 42: Scaling PostgreSQL with Skytools

Read Only Copy Read Only Copy Read Only Copy Read Only Copy

Load Balancer

pgBouncer

Canonical Database

Clients Clients Clients Clients

Scale-Out Reads

Page 43: Scaling PostgreSQL with Skytools

PGQ

Page 44: Scaling PostgreSQL with Skytools

The Ticker

Page 45: Scaling PostgreSQL with Skytools

ticker.ini [pgqadm] job_name = pgopen_ticker db = dbname=pgopen # how often to run maintenance [seconds] maint_delay = 600 # how often to check for activity [seconds] loop_delay = 0.1 logfile = ~/Source/pgopen_skytools/%(job_name)s.log pidfile = ~/Source/pgopen_skytools/%(job_name)s.pid

Page 46: Scaling PostgreSQL with Skytools

Getting PGQ Running

Setup our ticker:

pgqadm.py ticker.ini install

Run the ticker daemon:

pgqadm.py ticker.ini ticker -d

Page 47: Scaling PostgreSQL with Skytools

Londiste

Page 48: Scaling PostgreSQL with Skytools

replication.ini[londiste]job_name = pgopen_to_destination provider_db = dbname=pgopen subscriber_db = dbname=destination # it will be used as sql ident so no dots/spacespgq_queue_name = pgopen logfile = ~/Source/pgopen_skytools/%(job_name)s.logpidfile = ~/Source/pgopen_skytools/%(job_name)s.pid

Page 49: Scaling PostgreSQL with Skytools

Install Londiste

londiste.py replication.ini provider install

londiste.py replication.ini subscriber install

Page 50: Scaling PostgreSQL with Skytools

Start Replication Daemon

londiste.py replication.ini replay -d

Page 51: Scaling PostgreSQL with Skytools

DDL?

Page 52: Scaling PostgreSQL with Skytools

Add the ProviderTables and Sequences

londiste.py replication.ini provider add public.auth_user

Page 53: Scaling PostgreSQL with Skytools

Add the SubscriberTables and Sequences

londiste.py replication.ini subscriber add public.auth_user

Page 54: Scaling PostgreSQL with Skytools

Great Success!

Page 55: Scaling PostgreSQL with Skytools

PL/Proxy

Page 56: Scaling PostgreSQL with Skytools

Scale-Out Reads & Writes

A-F Server G-L Server M-R Server S-Z Server

plProxy Server

Page 57: Scaling PostgreSQL with Skytools

How does it work?

Page 58: Scaling PostgreSQL with Skytools

Simple Remote Connection

CREATE FUNCTION get_user_email(username text)RETURNS SETOF text AS $$ CONNECT 'dbname=remotedb'; SELECT email FROM users WHERE username = $1;$$ LANGUAGE plproxy;

Page 59: Scaling PostgreSQL with Skytools

Sharded Request

CREATE FUNCTION get_user_email(username text)RETURNS SETOF text AS $$ CLUSTER “usercluster”; RUN ON hashtext(username);$$ LANGUAGE plproxy;

Page 60: Scaling PostgreSQL with Skytools

Sharding Setup

• Need 3 Functions:

• plproxy.get_cluster_partitions(cluster_name text)

• plproxy.get_cluster_version(cluster_name text)

• plproxy.get_cluster_config(in cluster_name text, out key text, out val text)

Page 61: Scaling PostgreSQL with Skytools

get_cluster_partitionsCREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions(cluster_name text)RETURNS SETOF text AS $$BEGIN IF cluster_name = 'usercluster' THEN RETURN NEXT 'dbname=part00 host=127.0.0.1'; RETURN NEXT 'dbname=part01 host=127.0.0.1'; RETURN; END IF; RAISE EXCEPTION 'Unknown cluster';END;$$ LANGUAGE plpgsql;

Page 62: Scaling PostgreSQL with Skytools

get_cluster_version

CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text)RETURNS int4 AS $$BEGIN IF cluster_name = 'usercluster' THEN RETURN 1; END IF; RAISE EXCEPTION 'Unknown cluster';END;$$ LANGUAGE plpgsql;

Page 63: Scaling PostgreSQL with Skytools

get_cluster_configCREATE OR REPLACE FUNCTION plproxy.get_cluster_config( in cluster_name text, out key text, out val text)RETURNS SETOF record AS $$BEGIN -- lets use same config for all clusters key := 'connection_lifetime'; val := 30*60; -- 30m RETURN NEXT; RETURN;END;$$ LANGUAGE plpgsql;

Page 64: Scaling PostgreSQL with Skytools

get_cluster_config values

• connection_lifetime

• query_timeout

• disable_binary

• keepalive_idle

• keepalive_interval

• keepalive_count

Page 65: Scaling PostgreSQL with Skytools

SQL/MED

Page 66: Scaling PostgreSQL with Skytools

SQL/Med Cluster Definition

CREATE SERVER a_cluster FOREIGN DATA WRAPPER plproxy OPTIONS ( connection_lifetime '1800', disable_binary '1', p0 'dbname=part00 hostname=127.0.0.1', p1 'dbname=part01 hostname=127.0.0.1', p2 'dbname=part02 hostname=127.0.0.1', p3 'dbname=part03 hostname=127.0.0.1' );

Page 67: Scaling PostgreSQL with Skytools

PLProxy + SQL/Med Behavior

• PL/Proxy will prefer SQL/Med cluster definitions over the plproxy.get_* functions

• PL/Proxy will fallback to plproxy.get_* functions if there are no SQL/Med clusters

Page 68: Scaling PostgreSQL with Skytools

SQL/MED User Mapping

CREATE USER MAPPING FOR bob SERVER a_cluster OPTIONS (user 'bob', password 'secret');

CREATE USER MAPPING FOR public SERVER a_cluster OPTIONS (user 'plproxy', password 'foo');

Page 69: Scaling PostgreSQL with Skytools

plproxyrc

https://github.com/myYearbook/plproxyrc

• plpgsql based api for table based management of PL/Proxy

• Used to manage complicated PL/Proxy infrastructure @myYearbook

• BSD Licensed

Page 70: Scaling PostgreSQL with Skytools

Postgres Server #1

Postgres Server #2

Postgres Server #3

pgBouncer

“Server-to-Server”

Page 71: Scaling PostgreSQL with Skytools
Page 72: Scaling PostgreSQL with Skytools

Complex PL/Proxy and pgBouncer Environment

Local pgBouncer

Local pgBouncer

Local pgBouncer

Postgres Server #1

pgBouncer

Postgres Server #3

Clients

Clients

Clients pgBouncer

Load Balancer

plProxy Server plProxy Server

Load Balancer

pgBouncer

pgBouncer

Postgres Server #3

Page 73: Scaling PostgreSQL with Skytools

Other Tools and Methods?

Page 74: Scaling PostgreSQL with Skytools

Questions?