payments system

50
Payments System Jiang-Ming Yang @ Airbnb PAYMENTS SYSTEM

Upload: jiang-ming-yang

Post on 12-Aug-2015

70 views

Category:

Documents


0 download

TRANSCRIPT

Payments System Jiang-Ming Yang @ Airbnb

P A Y M E N T S S Y S T E M

Outline

•Payment basics

•Availability

•Scalability

•Security

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Payment Basics

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Credit Card ____________________________

InformationAccount Number

Brand Mark

Expiration DateBIN

Chip

Hologram

Signature

Security Code

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Card Number ____________________________

Information

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

%B1234567890123456^CHRISTNER/JOEL ^1504101100001100000000447000000? CCN: 1234567890123456 Exp: 04/15 CVV: 447 Cardholder: CHRISTNER/JOEL

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

;1234567890123456=150410110000447? CCN: 1234567890123456 Exp: 04/15 PIN: 000

P A Y M E N T S S Y S T E M

Credit Card ____________________________

Transaction flow

Jiang-Ming Yang @ 2015.04

Transaction Terminologies

•Authorization

•Capture

•Void

•Refund

•Charge back

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Availability

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Database Solution

•Database replication

•Database cluster

Limitation: • Auto failover? • cross data center retries?

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Active/Active

What

• Resilient to datacenter-level failure

• Resilient to Internet routing problems

• Transparent to the merchant

• No human intervention

Why

• Every second of uptime matters to our merchants. Goal is 5 9s.

• Much easier and safer to perform datacenter-level maintenance.

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Challenges

Inconsistent state between datacenters

Datacenters can’t tell if a transaction has already been processed elsewhere.

Limited idempotence

Payment networks can’t reliably guarantee idempotence on retries.

Real-time latency requirements

We can’t just wait until our datacenters get in sync.

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Concepts

Client idempotence key

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Concepts

Client idempotence key Server transaction

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Concepts

Client idempotence key Server transaction Transaction progression

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Card Processing ____________________________

Multi-DC resolution

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Multi-Tender ____________________________

Multi-DC resolution

Scenario

When Merchant try to sell items/products to customers, customers will have the option to pay with multiple tenders.

APIs

1. CreateBill

2. AddTender

3. CompleteBill / CancelBIll

Challenges

1. Each time we receive a tender request, we need to process this tender immediately. Thus different tenders for the same bill may be processed at different data centers.

2. When receiving the CompleteBill request, we may need to wait for the tender information from remote data center.

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Multi-Tender ____________________________

Multi-DC resolution

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Multi-Tender ____________________________

Multi-DC resolution

State Machine

Tender state machine

Bill state machine

Correctness

1. A formal proof

2. Simulate all the possible operational combinations and verify the results

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Caveats

Eventually consistent

Asynchronous, eventually consistent systems are harder to reason about.

Complex

Active/active systems are harder to design, implement, and test.

Data Loss

If the original data center is down and never comes back, we may not be able the perform the capture due to the loss of original auth.

Downstream effects

Not all downstream effects are reversible.

Is this the ideal solution?

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Scalability

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

Database Sharding ____________________________

• Shard Key • User Id / Merchant Id / Transaction Id / …

• Shard function • Hash / logical->physical mapping / dynamic / …

Primary

Replication

Secondary

Shard 1

Primary

Replication

Secondary

Shard 2

... Primary

Replication

Secondary

Shard X

Backend Database (MySQL )

P A Y M E N T S S Y S T E M

Database Sharding

P A Y M E N T S S Y S T E M

Shard Key

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

`

Select shard key per components • Pros:

• flexible for each component • Cons:

• hard for data migration, e.g. achieving/migrating all data for a given user. • Issue on a single DB instance may impact the whole system.

Subscription / membership

Shard 1 Shard 2

...Shard X

Billing

Shard 1 Shard 2

...Shard X

PI

Shard 1 Shard 2

...Shard X

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

`

Use a master key for all components and organize system by “scale-out unit”

• Pros: • Isolate the impact of a single shard • minimize the cross shard accesses • Optimize for deployment roll-out • Dependency control • Capacity planning

• Cons: • Load balancing

Primary

Replication

Secondary

Shard 1

...

...

Primary

Replication

Secondary

...

Primary

Replication

Secondary

...

Shard 2 Shard X

P A Y M E N T S S Y S T E M

Shard Function

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

`

Shard Function

• Hash

• Logical Shard / Physical Shard mapping

• Dynamic Sharding

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

`

ID Generation

• ID range per shard

• Encode logical partition inside ID

• Benefits: • Adjust the shard function without impact the existing IDs • Route new traffics to new partitions

P A Y M E N T S S Y S T E M

Re-Sharding

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Re-Sharding

• Scenarios for Re-Sharding

• Active shard vs Achieve shard

• Load balance

• Scale out

• Route new traffics to new partitions

• Split existing partitions

• Dynamic sharding

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Split Existing Shards

Split one shard into two shards

Read/write from new shards

Data clean up

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Dynamic Sharding

Re-sharding by lookup:

• Challenge: the scalability and availability of lookup layer • Database replication

• lookup and fallback in case of replication latency• Consistency hash • Using a key/value store

• Data migration • Lease (if migration cannot be done in a single transaction)

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Dynamic Sharding

Re-sharding by buckets:

• Group multiple records into a small bucket and migrate them together.

• For example: • If we group 1k records into a bucket, for 10 million records:

10,000,000 / 1,000 * (8 bytes (shard key) + 4 bytes (shard ID)) = 117.2MB

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Tips for logical shard

• To route new traffics to new shards: please reserve a big range of unused logical shards

• For shard splitting: please reserve a big range of logical shards for a single physical shard.

• To support re-sharding by buckets: please keep the number of records per logical shard small enough.

Data Migration

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Data Migration

• The typical four steps for data migration: we can roll forward or roll back at each step.

• Source (read/write) | Target • Source (read/write) | Target (write) • <---migration---> • Source (write) | Target (read/write) • Source | Target (read/write)

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Tests for Data Migration

• Sanity check of Get-API results • Compare the Write-API behaviors • Check the batch jobs • Dry-run with the real production data • Tips: Be careful of SQL foreign key

No-SQL

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

NoSQL

• Motivation • Schema change • Storage-level sharding

• Options • Using MySQL as a key/value store • Riak/Cassandra • Others

• Limitation:• Transaction • Deal with conflicts • Consistency for secondary indexes

Ideal Solution

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Ideal Solution

• What do we need? • Scalability and Availability • Consistency cross datacenter • Secondary indexes • Transaction

• What we can compromise? • A little bit higher latency

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

NewSQL

• Google Spanner • https://research.google.com/archive/spanner.html

• FoundationDB • https://foundationdb.com/

• CockroachDB • https://github.com/cockroachdb/cockroach

Security

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

Security Flow ____________________________

• Data collection • Encryption right after

collecting the data • Use iFrame for most web

integration • Audit the data access

permission

• Data persistent • Key rolling • Token rolling

What we didn’t cover?

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04

Jiang-Ming Yang @ 2015.04

P A Y M E N T S S Y S T E M

What we didn’t cover

• Risk• Reconciliation• Finance reporting• Inventory management• Fulfillment• Virtual Currency• Cross currency transaction• More business logics:

• Subscription / Recurring billing• Bundle offer• Reservation

Q?

P A Y M E N T S S Y S T E M

Jiang-Ming Yang @ 2015.04