predictable performance for big data in real...

39
PREDICTABLE PERFORMANCE FOR BIG DATA IN REAL-TIME

Upload: doannhan

Post on 11-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

PREDICTABLE PERFORMANCE FOR BIG DATA IN REAL-TIME

© 2012 Aerospike. All rights reserved. Confidential | Corporate Overview | Pg. 2

Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability

1.  Know whom the interaction is with u  200 M US consumers, 5 Billion mobile devices

2.  Anticipate intent based on current context u  Page views, search terms, ads served, game state, last

move, friends list, location info, pre-computed data like audience segments, location patterns

3.  Respond fast u  Display the most relevant advertisement u  Deliver the richest gaming experience u  Detect the latest attack vector u  Recommend the best product u  Treat special customers like VIPs…

4.  NEVER go down!

Interactions - faster & better decisions

© 2012 Aerospike. All rights reserved. Confidential Pg. 3

The only option for Real-time Big Data

© 2012 Aerospike. All rights reserved. Confidential Pg. 4

Aerospike fuels AdTech; Ads fund the Internet ➤  #2 Ad Network

➤  #1 Ad Agency

➤  #1 Indep Ad Exchange

➤  #1 Video Ad Network §  +

➤  #1 Data Aggregator §  +

➤  #2 Mobile Ad Platform §  +

➤  #2, #6 Publisher Network §  +

© 2012 Aerospike. All rights reserved. Confidential | Pg. 5

➤  #1 Search Syndicator •  +

➤  #1 Recommendation Engine - ATT, Tesco, Ticketmaster,

➤  #1 DSP - Canada

➤  #1 Ad server - China

➤  #1 Mobile Ads - Asia

➤  #1 Pub Net -SE Asia

➤  #3 ISP - Japan

WHY AEROSPIKE?

Why Aerospike? ➤  Fast

§  Predictable performance 99.9% in less than 1ms §  For balanced read/write transactions §  Even with synchronous replication for immediate consistency

➤  Scales §  Manages 100+ Billion objects, 10+ Terabytes of data §  Processes 500k+ TPS per node; 50k+ TPS for writes §  Scales out linearly on commodity hardware

➤  Never Fails §  Reliably stores data with immediate consistency, replication §  Cross data center multi-master replication ensures business

continuity and geographic proximity §  No performance degradation

u  Even during re-balancing/ data migration, rolling software/hardware upgrades and background backups/restores!

© 2012 Aerospike. All rights reserved. Confidential Pg. 7

Built by experts in Databases & Distributed systems

➤  Donald J. Haderle, “father of DB2”

➤  Srini V. Srinivasan – Database expert §  Responsible for Yahoo! Mobile’s global operations serving Millions of users 24x7 §  M.S. and Ph.D Computer Science (in Databases), University of Wisconsin – Madison and B.Tech Computer

Science, IIT Chennai

➤  Russell Sullivan – High Performance expert §  Founder of AlchemyDB, “Performance Man” for Redis,

20+ years experience in web scale systems at Lycos, 24/7 Real Media, top European dating site BE2 §  B.S. Computer Science, Michigan University

➤  Brian Bulkowski – Networking expert §  20+ years developing web scale infrastructures at Aggregate Knowledge, Liberate and Novell §  B.S. Mathematics/Computer Science, Brown University

➤  Roger Sippl, founder of Informix

© 2012 Aerospike. All rights reserved. Confidential Pg. 8

Zero downtime in 2+ yrs

© 2012 Aerospike. All rights reserved. Confidential Pg. 9

➤  Real-Time Bidding Platform for… §  $22B market by 2015

➤  27 Billion auctions per day §  Doubling every year

➤  1 Million TPS ➤  12 TB ➤  140 servers in 3 data centers

“Aerospike has operated without interruptions and easily scaled to meet our performance demands.” – Mike Nolet, CTO, AppNexus

2 Billion objects, 8TB data ➤  Ad Serving platform for Yahoo!, MSN, AOL, comscore sites

§  Yahoo! User data for 76% of the U.S. population §  Yahoo! Search data for 300 million+ searches per day §  50,000+ user attributes from 25+ data providers

u  “Genome takes in more data from more sources than other solutions.” - Peter Foster, Yahoo’s GM,

Audience & Performance Advertising

© 2012 Aerospike. All rights reserved. Confidential Pg. 10

2 Trillion Transactions per month ➤  BlueKai - largest data management platform

on the Internet §  2 Trillion Transactions per month §  100,000 attributes and user profiles for

e-commerce, recommendation engines, video traffic and ad targeting

➤  eXelate - 16 TB of data on 400 million consumers §  60 Billion Transactions per month §  4 data centers across US and Europe for

geographic proximity & high availability

§  “Scale. Real-time performance. Real-time replication at each of our four datacenters. Aerospike delivered on all of these requirements.”  - Elad Efraim, CTO, eXelate

© 2012 Aerospike. All rights reserved. Confidential Pg. 11

FAST KVS

Shared-Nothing Architecture

© 2012 Aerospike. All rights reserved. Pg. 13

Data Center 1

Data Center 2 Data Center 3

Every cluster node is Identical and handles both transactions and long running tasks

Replication supported with immediate consistency

Fast Key Value Store

➤ Taking advantage of modern commodity servers §  New multi-processor, multi-core machines §  Lower DRAM and SSD price points

➤ High Throughput, Elastic Scaling & ACID

© 2012 Aerospike. All rights reserved. Confidential Pg. 14

Vertical Scaling Maximizes TPS Handles traffic spikes, ensures predictable performance

Horizontal (Elastic) Scaling Maximizes Data Volumes Ensures 100% Uptime

Read more..

Intelligent Client API Shields Your Applications from the Complexity of the Cluster ➤  Implements Aerospike API

§  Easy primary key pattern §  Row with typed columns §  Optimistic row locking

➤  Optimized binary protocol

➤  Cluster tracking §  Client / server gossip protocol §  Continually learn cluster changes §  Learn and update data partition map

➤  Transaction semantics §  Global transaction ID §  Retransmit and timeout

© 2012 Aerospike. All rights reserved. Pg. 15

No Sharding! Data is Distributed Randomly, using Hash technology

➤  Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function

➤  This hash + additional data (fixed 64 bytes) are stored in RAM in the index

➤  4 bytes of this hash are used to compute the partition id

➤  There are 4096 partitions

➤  Partition id maps to node id based on cluster membership

© 2012 Aerospike. All rights reserved. Pg. 16

cookie-abcdefg-12345678

182023kh15hh3kahdjsh

Partition ID

Master node

Replica node

… 1 4

1820 2 3

1821 3 2

4096 4 1

Cross Data Center Replication (XDR)

© 2012 Aerospike. All rights reserved. Pg. 17

Data Center 1

Data Center 2 Data Center 3

Every cluster node is Identical and handles both transactions and long running tasks

Replication supported with immediate consistency

Cross Data Center Replication (XDR) ➤  Asynchronous replication for long link

delays and outages ➤  Namespace is configured to replicate to a

destination cluster – master / slave, including star and ring

➤  Replication process §  Transaction journal on partition master and

replica §  XDR process writes batches to destination §  Transmission state shared with source replica §  Retransmission in case of network fault §  When data arrives back at originating

cluster, transaction ID matching prevents subsequent application and forwarding

➤  In master / master replication, conflict resolution via multiple versions, or timestamp

© 2012 Aerospike. All rights reserved. Confidential Pg. 18

SSD-optimized Storage Layer ➤  Direct device access

i.e. raw, bypassing file system §  Data written in SSD optimal large

block patterns §  All indexes in RAM for low wear §  Continuous background

defragmentation §  Clean restart through shared

memory

➤  Random distribution using hash does not require RAID hardware

© 2012 Aerospike. All rights reserved. Pg. 19

SSD performance varies widely •  Aerospike has a certified

hardware list •  Free SSD certification tool,

CIO, is also available

Self-configuring Clusters!

➤  Automatic multicast gossip protocol for node discovery ➤  Paxos consensus algorithm determines nodes in cluster ➤  Ordered list of nodes determines data location ➤  Data partitions balanced for minimal data motion ➤  Vote initiated and terminated in 100 milliseconds

© 2012 Aerospike. All rights reserved. Pg. 20

Adding a new node

1.  Cluster discovers new node via gossip protocol

2.  Paxos vote determines new data organization

3.  Partition migrations scheduled

4.  When a partition migration starts, write journal starts on destination

5.  Partition moves atomically

6.  Journal is applied and source data deleted

© 2012 Aerospike. All rights reserved. Pg. 21

transactions continue

Consistency: Writing data safely

1.  Write sent to row master

2.  Latch against simultaneous writes

3.  Apply write to master memory

4.  Apply write synchronously to replica(s) memory

5.  Queue operations to disk

6.  Signal completed transaction (optional storage commit wait)

7.  Master applies conflict resolution policy – rollback / rollforward

© 2012 Aerospike. All rights reserved. Pg. 22

master replica

Per Node Optimization Ø  Right Architecture

Ø  Shared nothing Ø  In-memory (or multiple SSDs) Ø  Tight code loop Ø  Lock free isolation

Ø  OS, Programming Language, Libraries Ø  Modern Linux kernel Ø  C language Ø  Use epoll

Ø  Tweaks Ø  Pin threads to processor cores Ø  IRQ affinity settings for NIC Ø  CPU Socket Isolation via pairing of CPU to NIC

© 2012 Aerospike. All rights reserved. Pg. 23

Russ’s 10 Ingredient Recipe for

Making 1 Million TPS on $5K Hardware

Fast, Scales, Never Fails ➤  Cluster-aware Client Layer

(linear scale, avoids hot spots) §  Tracks nodes, ensures 1 hop transactions by routing transactions directly to the

node with the data §  Accelerates transactions with TCP/IP connection pooling §  No need to restart clients when nodes go up or down

➤  Self-managing Distribution Layer (100% uptime, immediate consistency, real-time prioritization) §  Reliably stores Terabytes of data with immediate consistency, automatic fail-over

and replication §  No cluster master, no SPOF, no sharding §  Paxos-like voting algorithm dynamically detects when nodes go up/down, §  Automatic partitioning (hash) algorithm assigns R/W masters and replicas §  Intelligent re-balancing and data migration §  Cross data center synchronization with complex ring/star topologies

➤  SSD-optimized Storage Layer (low latency, linear scale, low TCO) §  Memory efficient Index in DRAM, §  100 Million keys of any size require only 6.4GB §  Native, multi-threaded, multi-core SSD I/O §  Log structured file system §  Built-in smart evictor and defragmenter

© 2012 Aerospike. All rights reserved. Pg. 24

Want: 1) Faster & better decisions on Hot Data 2) Unified Operations & Analytics

Response time: Hours, Weeks TB to PB Read Intensive

TRANSACTIONS (OLTP)

Response time: Seconds Gigabytes of data

Balanced Reads/Writes

ANALYTICS (OLAP)

STRUCTURED DATA

Response time: Seconds Terabytes of data

Read Intensive

© 2012 Aerospike. All rights reserved. Confidential Pg. 25

BIG DATA ANALYTICS

Real-time Transactions Response time: < 10 ms 1-20 TB Balanced Reads/Writes 24x7x365 Availability

UNSTRUCTURED DATA

REAL-TIME BIG DATA

Interactics: Focus on Velocity and $$$ 1) Faster & better decisions on hot data 2) Unified Operations & Analytics

© 2012 Aerospike. All rights reserved. Confidential | Pg. 26

Fast –Flash

Expensive –DRAM

Mongo

Couch, Riak VoltDB, Hana

Cassandra

Slow –HDD

Hadoop/ HBase

Transactions -  Reads -  Variety -  Flex Data

Interactics -  Reads & Writes -  Velocity -  Hot Data

Analytics -  Writes -  Volume -  Historical Data

Velocity

Volume

# Apps - Variety

Mission Critical - $$$

Research

Flexibility

Customers moving to Aerospike from…

Mongo - adMarketplace, Sitescout §  Too hard to scale, tune, make reliable §  Poor SSD support ; not multicore

Couch - Brilig, Chango, x+1, adMeta §  Low performance, repartition unacceptable §  SSD support lacking

Cassandra - Acxiom, BlueKai, EQAds §  Low performance, fragility in production §  Java is not realtime; glitches and uncertainty §  No support from DataStax on core function

© 2012 Aerospike. All rights reserved. Confidential | Pg. 27

Faster: Independent Benchmarks

© 2012 Aerospike. All rights reserved. Confidential Pg. 28

•  YCSB++ •  Preliminary

Cheaper: 17x lower TCO “…data-in-DRAM implementations such as HANA from SAP ..should be bypassed… the current leading data-in-flash database for transactional analytic applications is Aerospike.” - David Floyer, Founder & CTO, Wikibon

© 2012 Aerospike. All rights reserved. Confidential | Pg. 29

$$$

10x Better TCO* ➤  SSD-optimized Architecture requires fewer servers

Aerospike   Other   Storage type   SSD   DRAM  

Storage per server   1.2 TB (4 x 300 GB)   80 GB (on 96 GB server)  

Cost per server   $7,000 USD   $15,000 USD  

# Servers for 1.5 TB (2x Replication)   3   16  

Total costs (USD)   $21,000   $240,000   + No Manual Operations No need to re-configure, restart servers when adding or taking down nodes  

0   + $200,000 at least per year  

+ No DIY development No caching, sharding, replication code to write; Developers write business logic, not middleware  

0   + $200,000 at least per year  

*Actual  results  calculated  by  customer  

© 2012 Aerospike. All rights reserved. Pg. 30

Comparing NoSQL Databases

MemBase MongoDB Cassandra

APIs Simple (KVS) Simple (KVS) Rich (JSON) Medium (Column)

Read & Write ✔ ✔ Read optimized Write optimized

Latency < 1 ms < 2ms 5ms ~ 20ms 10ms ~ 30ms

TPS / node 250K 30K 50K 50K

Optimized for SSD ✔ ✗ ✗ ✗

Automatic Clustering ✔ ✗

Complicated

Inconsistent

MemCache support ✔

© 2012 Aerospike. All rights reserved. Confidential Pg. 31

No other DB exists for the Internet of Things

© 2012 Aerospike. All rights reserved. Confidential | Pg. 32

Import from google spreadsheet RDBMS NoSQL Aerospike

Variety: Flex Schema

✗ ✓ ✓

Volume: Web Scale

✗ ✓ ✓

Velocity: Predictable Performance with Zero Downtime

✗ ✗ ✓

Transactions – Single Row ACID

✓ ✗ ✓

Velocity: Faster and Better Decisions with QMR

✗ ✗ ✓

Event Driven Architectures: Pub Sub, Streams ✓ ✗ Roadmap

Transactions - Multi Row ACID Serializable updates, Read Committed

✓ ✗ Roadmap

Security ✓ ✗ Roadmap

Multi-Tenancy ✗ (only Riak) Roadmap

The only option for Real-time Big Data

© 2012 Aerospike. All rights reserved. Confidential Pg. 33

Fueling

Dave Pickles, Founder & CTO, The Trade Desk

“Aerospike handled all challenges smoothly! Large datasets.. millisecond response times.. an ever increasing load, node outages caused by unauthorized upgrades by a managed data center provider, changes to the underlying data structure… This is real software, purpose built, lean and mean.”

First true Demand Side Platform

(DSP)

© 2012 Aerospike. All rights reserved. Confidential Pg. 34

Fueling

Dag Liodden, Co-founder & CTO,

Tapad

“We looked at a lot of open source products and eventually we went with a commercial product because it has much better predictability and low latency… Aerospike took a lot of the jitter in our performance just right out of the equation… So, very, very simple yet very capable NoSQL solution that performs insanely well…the throughput is awesome.”

First digital advertising solution for real-time mobile audience

buying and cross-device targeting

© 2012 Aerospike. All rights reserved. Confidential Pg. 35

Fueling

Andrei Duncan, CTO,

Liverail

“We liked the performance. Everything worked as advertised... With Aerospike, we’ve been adding new services, like auditing and reporting, that are enabling us to land deals we wouldn’t have otherwise. That’s the most important metric of all.”  ➤ Predictable (99%) response times under 5ms ➤ 3 Billion impressions per month ➤ 25% of all video advertising ➤ 2 data centers

Video advertising platform with ad serving and real-

time bidding

© 2012 Aerospike. All rights reserved. Confidential Pg. 36

Dag Liodden, Co-founder & CTO, Tapad

“A very, very simple yet very capable NoSQL solution that performs insanely

well.”

© 2012 Aerospike. All rights reserved. Confidential Pg. 37

Elad Efraim CTO, eXelate

“Scale, real-time performance, real-time replication across 4 datacenters.

Aerospike delivered.”

© 2012 Aerospike. All rights reserved. Confidential Pg. 38

The only option for Real-time Big Data

© 2012 Aerospike. All rights reserved. Confidential Pg. 39