cloud messaging service: technical overview

25
Cloud Messaging Service Technical Overview PRESENTED BY Matteo Merli September 21, 2015

Upload: messaging-meetup

Post on 15-Feb-2017

969 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Cloud Messaging Service: Technical Overview

Cloud Messaging Service Technical Overview

P R E S E N T E D B Y M a t t e o M e r l i| S e p t e m b e r 2 1 , 2 0 1 5

Page 2: Cloud Messaging Service: Technical Overview

Sections

2

1. Introduction 2. Architecture 3. Bookkeeper 4. Future 5. Q & A

CMS - Technical Overview

Page 3: Cloud Messaging Service: Technical Overview

What is CMS

3

• Hosted Pub / Sub • Multi tenant (Auth / Quotas / Load Balancer) • Horizontally scalable • Highly available, durable and consistent storage • Geo Replication • In production since 2013

CMS - Technical Overview

CMS ClusterProducer

Broker

Consumer

Bookie

ZKGlobalZK

Replication

Page 4: Cloud Messaging Service: Technical Overview

CMS key features

4 CMS - Technical Overview

• Multi-tenancy / hosted • Operating a system at scale is hard and requires deep understanding of internals • Authentication / Self service provisioning / Quotas

• SLAs (Write latency 2ms avg - 5ms 99pct) • Maintain the same latencies and throughput under backlog draining scenarios

• Simple high level API with clear ordering, durability and consistency semantics • Geo-replication

• Single API call to configure regions to replicate to • Load balancer: Dynamically optimize topics assignment to brokers • Support large number of topics • Store subscription position

• Apps don’t need to store it • Able to delete data as soon as it's consumed • Support round-robin distribution across multiple consumers

Page 5: Cloud Messaging Service: Technical Overview

Work load examples

5 CMS - Technical Overview

Challenge # Topics # Producers / topic

# Subscriptions / topic

Produced msg rate / s / topic

Fan-out 1 1 1 K 1 KThroughput & latency 1 1 1 100 K

# Topics & latency 1 M 1 10 10

Fan-in 1 1 K 1 > 100 K

• Design to support wide range of use cases • Need to be cost effective in every case

Page 6: Cloud Messaging Service: Technical Overview

2. Archi tecture

Page 7: Cloud Messaging Service: Technical Overview

Messaging model

7 CMS - Technical Overview

• Producers can attach to a topic and send messages to it • A subscription is a durable resources that is the recipient of all messages sent to

the topic, after its creation • Subscriptions do have a type:

• “Exclusive” means that only one consumer is allowed to attach to this subscription. First consumer decides the type.

• “Shared” allows multiple consumers. Messages are sent in round-robin distribution. No ordering guarantees.

• “Failover” allows multiple consumers, though only one is receiving messages at a given point, while others are in standby mode.

Consumer-5

Failover

Subscription-C

Consumer-4

Consumer-3

Consumer-2

Subscription-B

Shared

Exclusive

Consumer-1Subscription-AProducer-X

Producer-Y

Topic

Page 8: Cloud Messaging Service: Technical Overview

Client API

8

▪ Expose messaging model concepts (producer/consumer) ▪ C++ and Java ▪ Connection pooling ▪ Handle recoverable failures transparently (reconnect / resend

messages) without compromising ordering guarantees ▪ Sync / async version of every operation

CMS - Technical Overview

Page 9: Cloud Messaging Service: Technical Overview

Java producer example

9

CmsClient client = CmsClient.create("http://<broker vip>:4080");

Producer producer = client.createProducer("my-topic");

// handles retries in case of failure producer.send("my-message".getBytes());

// Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });

CMS - Technical Overview

Page 10: Cloud Messaging Service: Technical Overview

Java consumer example

10

CmsClient client = CmsClient.create(“http://<broker vip>:4080");

Consumer consumer = client.subscribe( “my-topic", "my-subscription-name", SubscriptionType.Exclusive);

// Blocks until message available Message msg = consumer.receive(); // Do something... consumer.acknowledge(msg);

CMS - Technical Overview

Page 11: Cloud Messaging Service: Technical Overview

System overview

11 CMS - Technical Overview

Broker • State-less • Maintain in memory cache of

messages • Read from Bookkeeper when

cache miss

Bookkeeper • Distributed write-ahead log • Create many ledgers

• Append entries • Read entries • Delete ledger

• Consistent reads • Single writer (the broker)

CMS Cluster

Broker

Bookie

ZKGlobal

ZK

Replication

Nativedispatcher

ManagedLedger

BK Client

Globalreplicators

Cache

LoadBalancer

Producer App

CMS client

Consumer App

CMS client

Page 12: Cloud Messaging Service: Technical Overview

System overview

12 CMS - Technical Overview

Native dispatcher • Async Netty server

Global replicators • If topic is global, republish

messages in other regions

Global Zookeeper • ZK instance with participants in

multiple US regions • Consistent data store for

customers configuration • Accept writes with one region

down CMS Cluster

Broker

Bookie

ZKGlobal

ZK

Replication

Nativedispatcher

ManagedLedger

BK Client

Globalreplicators

Cache

LoadBalancer

Producer App

CMS client

Consumer App

CMS client

Page 13: Cloud Messaging Service: Technical Overview

Partitioned topics

13

▪ Client lib has a wrapper producer/consumer implementation

▪ No API changes

▪ Producers can decide how to assign messages to partitions: ▪ Single partition ▪ Round robin ▪ Provide a key on the message ▪ Hash of the key determines the

partition

▪ Custom routing

CMS - Technical Overview

App

CMS Cluster

Broker 1

Prod

ucer

T1

P0

P1

P2

P3

P4

T1-P0

Broker 2

Broker 3

T1-P1

T1-P2

T1-P3

T1-P4

Page 14: Cloud Messaging Service: Technical Overview

Partitioned topics

14

▪ Consumers can use all subscription type with the same semantics

▪ In “Failover” subscription type, the election is done per partition

▪ Evenly spread the partitions assignment across all available consumers

▪ No need for ZK coordination

CMS - Technical Overview

CMS Cluster

Broker 1 App

Con

sum

er-1

T1

C0

C1

C2

C3

C4

T1-P0

Broker 2

Broker 3

T1-P1

T1-P2

T1-P3

T1-P4

App

Con

sum

er-2

T1

C0

C1

C2

C3

C4

Page 15: Cloud Messaging Service: Technical Overview

3. Bookkeeper

Page 16: Cloud Messaging Service: Technical Overview

CMS Bookkeeper usage

16

▪ CMS uses Bookkeeper through a higher level interface of ManagedLedger: › A single managed ledger represent the storage of a single topic › Maintains list of currently active BK ledgers › Maintains the subscription positions using an additional ledger to checkpoint the last

acknowledged message in the stream › Cache data › Deletes ledgers when all cursors are done with them

CMS - Technical Overview

Page 17: Cloud Messaging Service: Technical Overview

Bookie internal structure

17 CMS - Technical Overview

• Writes are written both to journal and to ledger storage (in different device)

• Ledger storage writes are fsynced periodically

• Reads are only coming from ledger storage

• Entries are interleaved in entry log files

• Ledger indexes are used to find entries offset

Page 18: Cloud Messaging Service: Technical Overview

Bookkeeper issues

18

▪ Performance degrades when writing to many ledgers at the same time ▪ When there are heavy reads, the ledger storage device gets slow and

will impact writes ▪ Ledger storage flushes need to fsync many ledger index files each time

CMS - Technical Overview

Page 19: Cloud Messaging Service: Technical Overview

Bookie storage improvements

19 CMS - Technical Overview

• Writes are written both to journal and to in memory write cache

• Entries are periodically flushed • Entries are sorted by ledger to

be sequential on disk (per flush period)

• Since entries are sequential, we added read-ahead cache

• Location index is mostly kept in memory and only updated during flush

Page 20: Cloud Messaging Service: Technical Overview

Bookkeeper write latency

20

▪ After hardware, next limit to achieve low latency is JVM GC ▪ GC pauses are unavoidable. Try to keep them around ~50ms and as

least as frequents as possible › Switched BK client and servers to use Netty pooled ref-counted buffers and direct

memory to hide it from GC and eliminate payload copies › Extensively profiled allocations and substantially reduced per-entry objects allocations • Use Recycler pattern to pool objects (very efficient for same thread allocate/release) • Primitive collections • Array queue instead of linked queues in executors • Open hash maps instead of linked hash maps • BTree instead of ConcurrentSkipList

CMS - Technical Overview

Page 21: Cloud Messaging Service: Technical Overview

Bookie ledgers scalability

21 CMS - Technical Overview

Single bookie — 15K write/sB

K w

rite

late

ncy

(ms)

0

1

2

3

4

Ledgers / bookie

1 1000 5000 10000 20000 50000

Avg 99pct

Page 22: Cloud Messaging Service: Technical Overview

4. Future

Page 23: Cloud Messaging Service: Technical Overview

Auto batching

23

▪ Send messages in batches throughout the system ▪ Transparent to application ▪ Configure group timing and size: e.g.: 1ms / 128Kb ▪ For the same byte/s throughput lower the txn/s through the system › Less CPU usage in broker/bookies › Lower GC pressure

CMS - Technical Overview

Page 24: Cloud Messaging Service: Technical Overview

Low durability

24

▪ Current throughput bottleneck for bookie writes is journal syncs ▪ Could add more bookies but bigger cost ▪ Some use cases are ok to lose data in rare occasions ▪ Solution › Store data in bookies • No memory limitation, can build big backlog

› Don’t write to bookie journal • Data is stored in write cache in 2 bookies + broker cache

› Can lose < 1min data in case 1 broker & 2 bookies crash

▪ Higher throughput with less bookies ▪ Lower publish latency

CMS - Technical Overview

Page 25: Cloud Messaging Service: Technical Overview

5. Q & A