Download - On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud [email protected] 2 Agenda

On Building Public Cloud Service for PostgreSQL and MySQL

Guangzhou Zhang, Alibaba Cloud [email protected]

2

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO hang •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing

3

Self-Intro •  Guangzhou Zhang, Developer from Alibaba Cloud RDS Team •  Alibaba Cloud RDS Team

•  Covers MySQL/PostgreSQL/SQL Server •  Handles infrastructure, database kernel (MySQL/PG), operations, customer support, etc. •  Run one of world largest database instance cluster, and still growing more than 100% YTY

4

Architecture

Master

Slave

HA

Instances

User requests thru VIP

Proxy layer Server Load Balancer

5




•  Handling IO stalls •  Handling OOM


6

Resource isolation – CPU/IO/MEM •  Use bare-metal machines (no VM no network storage )

•  Locate up to hundred of instances on the same host •  Take full control over the whole stack, minimize dependency with predictable SLA and

performance •  Control resource allocation layout

7

Resource isolation – CPU/IO/MEM •  Use cgroup for CPU/IO/MEM isolation

cgroup for Instance A

master

backend backend

backend

cgroup for Instance B

slave

backend backend

backend

Host Machine

8

Resource isolation – CPU/IO/MEM •  Use cgroup for CPU/IO/MEM isolation

•  Trick: Separate data and write-ahead-logging (WAL) disks, loose the limit of WAL disk IOPS, why?

•  Maintenance work inside database requires heavy IOs and should be finished as soon as possible

•  Strict limit on WAL IOPS on slave could result in severe replication lag

•  Strict limit on WAL IOPS on master could easily cause a failover switch

•  For better write performance

Why we have to loose the limit? •  No one will constantly be writing to database at a

high pace, since database storage has a upper limit (2T) and is expensive

•  The storage usage of WAL files is charged •  Relocate ‘hot’ instances to different hosts as needed

Why we are able to loose the limit?

cgroup for Instance A master

backend backend

backend

WAL Disk Data Disk

10000 IOPS 500 IOPS

9

Resource isolation - Disk

•  Regularly collect disk usage of each database instance, mark those with excessive disk usage as locked-down

•  Change the database kernel to add a parameter that controls user access: •  Allow read access •  Disallow any write to instance •  Allow DROP/TRUNCATE commands

10






11

Privilege Management

•  Superuser role is kept from the end users for security reason

•  Prevent users from messing up critical configurations for HA or maintenance •  Convenient for us to handle end users’ roles and maintenance roles differently in code

•  Users keep on asking for more from “superuser” privilege set for managing

data, monitoring, extensions, roles, etc.

•  We have to loose privilege check for some cases to make users happy

12

Privilege Management

•  Creating a new role – rds_superuser •  Allow this role to manage all other non-superuser’s data, kill sessions, etc. •  Allow it to set specific configuration parameters

13






14

Handling OOM

•  OOM (out of memory) cases were frequently observed •  Cloud users tend to buy small class instance firstly, then regularly upgrade to higher class •  Large objects / big temp memory are widely used in some instances •  Concurrent connections are not suitably configured by application

•  When an instance goes OOM, one of its processes is picked up and killed by the Linux kernel. The whole instance will be restarted with all connections lost.

•  How can we minimize OOM impact for users?

15

Handling OOM

Linux Box CGroup

backend

backend

backend

CGroup backen

d

backend

backend

16

Handling OOM

Linux Box CGroup

backend

backend

backend

CGroup backen

d

backend

backend

CGroup

Public CGroup

Kill –USR2

17

Handling IO stalls

•  Symbols of IO stalls (ext4 filesystem with mount –data=order on Linux) •  Long checkpoint fsync time •  Nearly all operations including setup of a new connection hang for > 10s. All instances on the

same machine are affected.

•  But IO usage is low (< 10%) for the problematic disk.

18

Handling IO stalls fsync can be slow

Linux Box CGroup

backend

backend

checkpointer

fsync

Ext4 journaling buffer for metadata

Dirty data in IO queue File 2 metadata

File 1 metadata

19

Handling IO stalls

Linux Box CGroup

backend

backend

checkpointer

fsync

Ext4 journaling buffer for metadata

Dirty data in IO queue File 2 metadata

File 1 metadata

write

write() can be blocked by fsync()

20

Handling IO stalls

•  Mount ext4 with option data=writeback •  Change the database kernel to call sync_file_range() before calling fsync in

checkpointer process

•  Linux kernel enhancement

21




•  Handling IO stall •  Handling OOM


22

Architecture

Master

Slave

HA

Instances

User requests thru VIP

Proxy layer Server Load Balacer

23

Transparent connection pool with Proxy

Backend thread/process

Proxy Instance

•  Connection establishment is costly •  Performance is bad for short-connection applications

24


Backend thread/process

Proxy DB Instance

Conn pool

25


•  Proxy needs to restore all the resources held by this connection before putting it into the connection pool

•  Authentication needs to be performed against the incoming user when connection is reused

•  Change the database kernel to support “SET AUTH_REQ = ‘user/database/md5password/salt’” for authentication

26

Transparent switch-over with Proxy

•  There are quite a few cases requiring instance restart. And we implement a restart with switch-over

•  Upgrade or degrade an instance’s class (e.g. from 2G mem to 4G) •  Move an instance from one machine to another •  Database kernel version upgrades

•  Transparent switch-over is the process that does a switch-over without breaking existing user connections. This is key to SLA and customer satisfactory

27


Master

Slave

HA Proxy layer

DB Instances

28


•  Proxy does the connection re-establishing out of transaction boundaries

•  Only for connections that have not done anything not re-creatable •  Temporary table usage •  Statement preparation

•  Change the kernel to support a command like “set connection_user to xxx”for proxy to re-establish connection without an explicit authentication process

29


Master

Slave

HA Proxy layer

DB Instances

Change user to app user

Application user

30

Read-write auto routing

Master

Slave

HA Proxy layer

DB Instances

Application user

ReadonlyReplica

31






Download - On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud [email protected] 2 Agenda

Top Related