continuous data streams scalable joining of photon: fault ...cloud.berkeley.edu/data/photon.pdf ·...

35
Google Confidential and Proprietary Photon: fault-tolerant and scalable joining of continuous data streams Manpreet ([email protected])

Upload: others

Post on 06-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Photon: fault-tolerant and scalable joining of continuous data streams

Manpreet ([email protected])

Page 2: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Agenda

● Problem and motivation● Systems challenges● Design● Production deployment● Future work

Page 3: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

The Stats Loop

Ads Serving

Stats Engine

Logs Persistent Storage

Users on Google.com Advertisers + Publishers

queries, clicks adsaccounts, campaigns,budgets

events

reports, billing, …

Ads inventory, budgets, …

Continuously update stats

Continuously aggregate logs

Hourly/Daily/Monthly sums of

Clicks, Impressions, Cost… per …

Advertiser, Publisher, Keyword etc

Page 4: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Problem

● Join two continuous streams of events○ Based on a shared id○ Stored in the Google File System○ Replicated in multiple datacenters

● Motivation:○ Join click with query○ Logged by multiple servers at different times

■ Cannot put query inside the click

Page 5: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Systems challenges

● Exactly-once semantics○ Output used for billing / internal monitoring

● Reliability○ Fault-tolerance in the cloud○ Automatically handle single data-center disaster

● Scalability○ Millions of joins per minute

● Latency○ O(seconds)

Page 6: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Singly-homed system

● With patched-on failover tool○ Ran in production for several years

● Very high maintenance cost● Datacenters have downtimes:

○ planned○ unplanned○ random hiccups

● Cannot provide very high up-time SLA

Page 7: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

How about running a MapReduce?

● Read both query logs and click logs● Mapper output key is query_id● Reducer outputs the joined event● Issues:

○ batch job○ high latency (setup cost, stragglers, etc)

Page 8: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Challenges of running in multiple datacenters

● Stateful system○ Need to ensure exactly-once semantics

● State needs to be replicated synchronously to multiple datacenters○ ==> Paxos (guarantees majority group members have all the updates)

Page 9: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

High-level idea

Paxos-backed storage

Pipelinein DataCenter 2

DCDCB DCE

PaxosA

Pipelinein DataCenter 1

● Run the pipeline in multiple data-centers in parallel:○ Each pipeline processes every event

● Paxos-backed storage is shared across pipelines○ Used to dedup events (at-most-once semantics)○ Maintain persistent state at event-level

Page 10: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

PaxosDB: key building block

● Key-value store built on top of raw Paxos● Runs paxos algorithm to guarantee consistent replication

○ Multi-row transactions with conditional updates○ Auto-elects new master

● Key challenge: scalability○ Need to join Millions of events/minute○ With cross-country replicas, less than 10 paxos transactions/sec

RPC server

PaxosDB library

RPC clientRPC request

DCDCB DCE

PaxosA

Page 11: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Architecture of a single IdRegistry server

Page 12: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

IdRegistry made scalable: sharding

● Run paxos transactions in parallel for independent keys

RPC server

PaxosDB library

RPC client

RPC request1

DCDCB DCE

PaxosA

RPC server

PaxosDB library

DCDCB DCE

PaxosA

RPC request2

Page 13: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

IdRegistry made scalable: server-side batching

● Batch multiple RPC requests into a single Paxos transaction

RPC server

PaxosDB library

RPC client

RPC request 1, 2, 3

DCDCB DCE

PaxosA

RPC server

PaxosDB library

DCDCB DCE

PaxosA

RPC request 4

Batch multiple RPC requests at server

Page 14: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

IdRegistry made scalable: client-side batching

● Batch multiple client requests into a single RPC

RPC server

PaxosDB library

RPC client

RPC request 1, 2, 3

DCDCB DCE

PaxosA

RPC server

PaxosDB library

DCDCB DCE

PaxosA

RPC request 4

Batch multiple RPC requests at server

Batch multiple client requests into a single RPC

Page 15: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

IdRegistry made scalable

● Sharding○ Paxos transactions happen in parallel for each shard○ Load-balance amongst shards:

■ Shard by hash(event_id) mod num_shards○ Built automated support for resharding

■ Attach timestamp to each key■ Use timestamp-based sharding■ GC old keys

● Server-side batching○ RPC thread adds input request to an in-memory queue○ Single paxos thread extracts multiple requests, performs a single

multi-row paxos transaction, sends rpc response

● Client-side batching○ Client batches multiple RPC requests into a single RPC request.

Page 16: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

JoinerEventStore Paxos-based IdRegistry

Joined Click Logs

Query LogsLog Dispatcher

Click Logs

Retry

query_idquery

click

Photon architecture in a single data-center

Step1

Step2Step3

Step4

Page 17: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Photon architecture in multiple datacenters

Joiner

EventStore

Paxos-based IdRegistry

Joiner

Joined Click LogsJoined Click Logs

Query Logs

DataCenter 1 DataCenter 2

Log Dispatcher

Click Logs

Log Dispatcher

Click Logs

EventStore

Query Logs

Retry Retry

Page 18: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

PaxosDB rpc semantics

● InsertKey:○ Returns false if the key already exists○ Else inserts the key

● What if the PaxosDB inserts the key but Joiner does not get the response in time?○ Observed 0.01% loss in production

● Write unique id for joiner along with the event_id:○ event_id is key○ Joiner id is value○ Handles Joiner rpc retries gracefully

● If Joiner crashes after writing to Paxos, run offline recovery

Page 19: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Reading input events continuously

● Events stored in Google File System● Periodically stat the directory:

○ identify new files○ check the growth update on existing files

● Keep track of <filename, next_read_offset>

Page 20: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

EventStore

● Sequentially read the query logs and store a mapping from query_id to query○ In a distributed hash table (e.g. bigtable)

● CacheEventStore:○ Majority of clicks happen within few minutes of query○ Cache mapping from query_id ---> query for recent queries in RAM○ Use distributed Cache Servers (similar to MemCache)

● LogsEventStore○ Performance optimization since our query logs are sorted○ Fallback in case of cache miss○ Use binary search in sorted event stream in disks

Page 21: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Purgatory (in Log Dispatcher)

● What if a subset of query logs are delayed?● Log Dispatcher reads every event from the log, keeps retrying until

successful join○ Maintains state in peristent storage (Google File System)○ Ensures at-least-once semantics

● Exponential backoff in case of failure

Page 22: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Operational challenges

● Need 24x7 running system○ Low maintenance

● Volume of input traffic fluctuates by 2x within a day● Bursts of traffic in case of network issues● Predicting resource requirements● Auto-resize the jobs to handle traffic growth and spike● Reliable monitoring

Page 23: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Production deployment

● System running in production for several months○ O(700)-way sharded PaxosDB○ 5 PaxosDB replicas spread across East Coast, West Coast, Mid-West○ 2 Photon pipelines running in East Coast, West Coast○ Order of magnitude higher uptime SLA than singly-homed system○ Survived multiple planned / unplanned datacenter downtimes○ Much less noisy alerts

Page 24: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Photon withstanding a real datacenter disaster

Page 25: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

End-to-end latency of Photon

Page 26: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Effectiveness of server-side batching in IdRegistry

Page 27: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

IdRegistry dynamic time-based upsharding

Page 28: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Dispatcher in one datacenter catching up after downtime

Page 29: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Photon wasted joins minimized

Page 30: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Photon EventStore lookups in a single datacenter

Page 31: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

How we addressed the systems challenges

● Exactly-once semantics○ PaxosDB ensures at-most-once○ Dispatcher retry ensures at-least-once

● Reliability○ PaxosDB needs only majority members to be up.○ Storing global state in PaxosDB allows run in multiple datacenters

● Scalability○ Reshard the number of PaxosDB servers○ All the other workers are stateless

● Latency○ Mostly RPC-based communication amongst jobs○ Most data transfers in RAM

Page 32: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Next BIG challenges

● Better resource utilization● Join multiple log sources

Page 33: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

More cool problems we are solving in AdWords Backend

● Real-time logs processing○ Read 100T logs per day○ Compute stats over 200+ dimensions with low latency

● Petabyte scale database storage engine○ 32M rows updated per sec, 140T total rows○ 200K read queries/sec with latency of 90ms

● Efficiently backfill stats for the last N years○ O(50B) rows per day○ Too big to store in a database

● And many more cool projects...○ Join us and find out more!○ Manpreet Singh ([email protected])

Page 34: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Application Process

1

2 Resume Review and Qualification

3 First Round Interviews - 2 Technical Phone Screens

4 Full-time Positions: 3-5 Onsite InterviewsInternships: 1 Host Matching Interview

5 Hiring Committee - Offer

Apply Now! Full-time: google.com/students/eng

Internships: google.com/students/intern

Page 35: continuous data streams scalable joining of Photon: fault ...cloud.berkeley.edu/data/photon.pdf · Photon: fault-tolerant and scalable joining of continuous data streams Manpreet

Google Confidential and Proprietary

Questions