payments system
TRANSCRIPT
Outline
•Payment basics
•Availability
•Scalability
•Security
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Credit Card ____________________________
InformationAccount Number
Brand Mark
Expiration DateBIN
Chip
Hologram
Signature
Security Code
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Card Number ____________________________
Information
Jiang-Ming Yang @ 2015.04
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
%B1234567890123456^CHRISTNER/JOEL ^1504101100001100000000447000000? CCN: 1234567890123456 Exp: 04/15 CVV: 447 Cardholder: CHRISTNER/JOEL
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
;1234567890123456=150410110000447? CCN: 1234567890123456 Exp: 04/15 PIN: 000
P A Y M E N T S S Y S T E M
Credit Card ____________________________
Transaction flow
Jiang-Ming Yang @ 2015.04
Transaction Terminologies
•Authorization
•Capture
•Void
•Refund
•Charge back
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
Database Solution
•Database replication
•Database cluster
Limitation: • Auto failover? • cross data center retries?
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Active/Active
What
• Resilient to datacenter-level failure
• Resilient to Internet routing problems
• Transparent to the merchant
• No human intervention
Why
• Every second of uptime matters to our merchants. Goal is 5 9s.
• Much easier and safer to perform datacenter-level maintenance.
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Challenges
Inconsistent state between datacenters
Datacenters can’t tell if a transaction has already been processed elsewhere.
Limited idempotence
Payment networks can’t reliably guarantee idempotence on retries.
Real-time latency requirements
We can’t just wait until our datacenters get in sync.
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Concepts
Client idempotence key Server transaction
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Concepts
Client idempotence key Server transaction Transaction progression
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Card Processing ____________________________
Multi-DC resolution
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Multi-Tender ____________________________
Multi-DC resolution
Scenario
When Merchant try to sell items/products to customers, customers will have the option to pay with multiple tenders.
APIs
1. CreateBill
2. AddTender
3. CompleteBill / CancelBIll
Challenges
1. Each time we receive a tender request, we need to process this tender immediately. Thus different tenders for the same bill may be processed at different data centers.
2. When receiving the CompleteBill request, we may need to wait for the tender information from remote data center.
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Multi-Tender ____________________________
Multi-DC resolution
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Multi-Tender ____________________________
Multi-DC resolution
State Machine
Tender state machine
Bill state machine
Correctness
1. A formal proof
2. Simulate all the possible operational combinations and verify the results
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Caveats
Eventually consistent
Asynchronous, eventually consistent systems are harder to reason about.
Complex
Active/active systems are harder to design, implement, and test.
Data Loss
If the original data center is down and never comes back, we may not be able the perform the capture due to the loss of original auth.
Downstream effects
Not all downstream effects are reversible.
Jiang-Ming Yang @ 2015.04
Database Sharding ____________________________
• Shard Key • User Id / Merchant Id / Transaction Id / …
• Shard function • Hash / logical->physical mapping / dynamic / …
Primary
Replication
Secondary
Shard 1
Primary
Replication
Secondary
Shard 2
... Primary
Replication
Secondary
Shard X
Backend Database (MySQL )
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
`
Select shard key per components • Pros:
• flexible for each component • Cons:
• hard for data migration, e.g. achieving/migrating all data for a given user. • Issue on a single DB instance may impact the whole system.
Subscription / membership
Shard 1 Shard 2
...Shard X
Billing
Shard 1 Shard 2
...Shard X
PI
Shard 1 Shard 2
...Shard X
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
`
Use a master key for all components and organize system by “scale-out unit”
• Pros: • Isolate the impact of a single shard • minimize the cross shard accesses • Optimize for deployment roll-out • Dependency control • Capacity planning
• Cons: • Load balancing
Primary
Replication
Secondary
Shard 1
...
...
Primary
Replication
Secondary
...
Primary
Replication
Secondary
...
Shard 2 Shard X
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
`
Shard Function
• Hash
• Logical Shard / Physical Shard mapping
• Dynamic Sharding
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
`
ID Generation
• ID range per shard
• Encode logical partition inside ID
• Benefits: • Adjust the shard function without impact the existing IDs • Route new traffics to new partitions
P A Y M E N T S S Y S T E M
Re-Sharding
• Scenarios for Re-Sharding
• Active shard vs Achieve shard
• Load balance
• Scale out
• Route new traffics to new partitions
• Split existing partitions
• Dynamic sharding
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Split Existing Shards
Split one shard into two shards
Read/write from new shards
Data clean up
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Dynamic Sharding
Re-sharding by lookup:
• Challenge: the scalability and availability of lookup layer • Database replication
• lookup and fallback in case of replication latency• Consistency hash • Using a key/value store
• Data migration • Lease (if migration cannot be done in a single transaction)
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Dynamic Sharding
Re-sharding by buckets:
• Group multiple records into a small bucket and migrate them together.
• For example: • If we group 1k records into a bucket, for 10 million records:
10,000,000 / 1,000 * (8 bytes (shard key) + 4 bytes (shard ID)) = 117.2MB
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Tips for logical shard
• To route new traffics to new shards: please reserve a big range of unused logical shards
• For shard splitting: please reserve a big range of logical shards for a single physical shard.
• To support re-sharding by buckets: please keep the number of records per logical shard small enough.
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Data Migration
• The typical four steps for data migration: we can roll forward or roll back at each step.
• Source (read/write) | Target • Source (read/write) | Target (write) • <---migration---> • Source (write) | Target (read/write) • Source | Target (read/write)
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Tests for Data Migration
• Sanity check of Get-API results • Compare the Write-API behaviors • Check the batch jobs • Dry-run with the real production data • Tips: Be careful of SQL foreign key
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
NoSQL
• Motivation • Schema change • Storage-level sharding
• Options • Using MySQL as a key/value store • Riak/Cassandra • Others
• Limitation:• Transaction • Deal with conflicts • Consistency for secondary indexes
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Ideal Solution
• What do we need? • Scalability and Availability • Consistency cross datacenter • Secondary indexes • Transaction
• What we can compromise? • A little bit higher latency
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
NewSQL
• Google Spanner • https://research.google.com/archive/spanner.html
• FoundationDB • https://foundationdb.com/
• CockroachDB • https://github.com/cockroachdb/cockroach
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Security Flow ____________________________
• Data collection • Encryption right after
collecting the data • Use iFrame for most web
integration • Audit the data access
permission
• Data persistent • Key rolling • Token rolling
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
What we didn’t cover
• Risk• Reconciliation• Finance reporting• Inventory management• Fulfillment• Virtual Currency• Cross currency transaction• More business logics:
• Subscription / Recurring billing• Bundle offer• Reservation