the big data revolution is an evolution

32
Eric Lubow @elubow [email protected] The Big Data Revolution is an

Upload: planet-cassandra

Post on 05-Dec-2014

857 views

Category:

Technology


1 download

DESCRIPTION

Dealing with data doesn't only require a data store, it requires an infrastructure. At SimpleReach, we have 5 data storage layers to service all of our data needs. These range from high volume, high velocity data ingestion with real-time analytics to ad-hoc style historical analysis with search capabilities. To communicate effectively between applications, data stores sit behind a service architecture for consistent data access patterns and failover/redundancy. This talk is a story of how we came to this architecture and some of the lessons we learned along the way.

TRANSCRIPT

Page 1: The Big Data Revolution is an Evolution

Eric Lubow

@elubow

[email protected]

The Big Data Revolution is an

Page 2: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Overvie• Evolution

• SimpleReach

• Data Stores / Languages

• Architecture Implementation

Page 3: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

We're in the midst of an evolution, not a revolution.

Page 4: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

The 2 Truths

Page 5: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Even with the right tools, 80% of the work of building a big data system is acquiring and refining

The Real Truth

Page 6: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

30m plays/day + 4m user ratings + 75k movies metadata + 24.4m users metadata =

David Fincher + Kevin Spacey + British House of

Cards

Mitch Hurwitz + Will Arnett + Jason Bateman + Arrested

Development

Page 7: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

BRING IT TOGETHE

Page 8: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

evolutionrevolutionInsufficient Capabilities

Scale/Need Changes

Development & Integration

New Products

Page 9: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Page 10: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Page 11: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Millions of URLs per day

• Over 1 billion pageviews per month

• 250m events per day (~3k events/second)

• Auto-scale 90-130 machines depending on traffic

SimpleReach

Page 12: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

HUMBLE BEGINNINGS

Page 13: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Scale

Page 14: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

AND THEN...

C*

Page 15: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Large data volume ingestion at high velocity

• Really fast writes to many locations (eventual consistency)

• Query by column groups within rows (slicing)

• TTLs for small group aggregation

• Wrote Helenus, Node.js driver for Cassandra

Cassandra C*

Page 16: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Fast atomic increments (Node.js is native JSON)

• Sharding

• Solid ORM for Rails (MongoID)

• B-Tree Indexes

• Document based via JSON

• TTLs for ephemeral data

MongoDB

Page 17: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Supports hundreds of thousands transactions per second

• Great caching engine

• Supports useful variable types like sets, sorted set, lists

• Everything is guaranteed to be Memory Mapped

Redis

Page 18: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Works with standard MySQL driver

• Column Stores for ad-hoc analytics queries in SQL

• Heavy compression of data (avg 12:1)

Infobright

Page 19: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Polyglottany doesn’t only apply to data stores

• Each language has its own benefit to each stack layer

• Each language has its own individual benefits

• Each language has its own development benefits

The c0dez

Page 20: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Page 21: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Cons• Redis - Can only utilize a single core. SerDe price.

• Infobright - DELETE/UPDATEs are VERY expensive

• Cassandra - No btree indexes or probabilistic counters

• Mongo - Indexes must fit in memory. Forced Replica ping times

• Python - Whitespace. Community

• Ruby - Not high performance enough for our standards

Page 22: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Evolution Takes Work• Service Oriented Architecture (Internal API)

• Data accuracy checks: visual and programmatic

• Built framework for testing out engines (Storage, Queueing, etc)

• Access to many toolsets (for all languages, DBs, Engines)

Page 23: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Service

Internal API

Solr

Real-timeC*

C*

Page 24: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Path of a Packet

InternetEP

Inte

rnal

API

Solr

C*

Mong

Redis

IB

API

Fire Hos

SC

Cons

umer

s

Que

ue

Page 25: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Architecture DistributionUS-EAST-1a

MONGO-SHARD-0001-B

MONGO-SHARD-0000-A

CASSANDRA-0001

CASSANDRA-0010

REDIS-0001A

INFOBRIGHT-0001

iAPI-0001

US-EAST-1b

MONGO-SHARD-0002-B

MONGO-SHARD-0001-A

CASSANDRA-0002

CASSANDRA-0011

REDIS-0001B

iAPI-0002

US-EAST-1e

MONGO-SHARD-0002-A

MONGO-SHARD-0000-B

CASSANDRA-0003

CASSANDRA-0012

INFOBRIGHT-0002

iAPI-0003

Page 26: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

The Schrute of the Problem

Page 27: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Evolving Amazon Tools• Full Featured API

• Simple Queuing Service

• Data Pipelining

• OpsWorks

• Cloud Formation

• Redshift Analytics

• CloudSearch

• Elastic Beanstalk

• Elastic MapReduce

• Simple Workflow Coordinator

• S3 / Glacier

Page 28: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

DevOps Wizardry• Extensive use of AWS

• Monitor: Nagios, Statsd, and Graphite

• Manage: Chef, OpsWorks, cSSHx

• Deployments

Page 29: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Summary• Solutions Require Evolution

• Build, Use, and Integrate Tools

• Abstraction

• Distribution

• Monitoring & Automation

Page 30: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

A revolution only lasts fifteen years, a period which coincides with the

Evolution Takes Time

Page 31: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

We’re (Ask us about Food Coma Fridays)

Page 32: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Questions are guaranteed in life.Answers aren’t.

Eric Lubow

@elubow

[email protected]

Thank you.