datastax & 451 group webinar - real nosql applications in the enterprise today
DESCRIPTION
TRANSCRIPT
ApacheCassandra Jonathan Ellis, CTO DataStax Matt Aslett, 451 Group
Dec, 7 2011
Real NoSQL Applications in the Enterprise Today.
Welcome and Housekeeping
We will email the presentation after the webinar
Please ask questions using the Q&A panel. I will ask the panelists at the end of the presentation.
You can contact me at [email protected]
Our presenters
Matt Aslett - Senior Analyst 451 Group Matthew covers data management software for The 451 Group's Information Management practice, including relational and non-relational databases, data warehousing and data caching. Matthew is also an expert in open source software and contributes regularly to reports produced through the 451 Commercial Adoption of Open Source (CAOS) Research Service, as well as to the 451 CAOS Theory blog.
Jonathan Ellis – CTO DataStax Jonathan is CTO and co-founder at DataStax. Prior to DataStax, Jonathan worked extensively with Apache Cassandra while employed at Racksace. Prior to Rackspace, Jonathan built a multi-petabyte, scalable storage system based on Reed-Solomon encoding for backup provider Mozy. In addition to his work with DataStax, Jonathan is project chair of Apache Cassandra.
© 2011 by The 451 Group. All rights reserved © 2011 by The 451 Group. All rights reserved
451 Research is focused on the business of enterprise IT innovaAon. The company’s analysts provide criAcal and Amely insight into the compeAAve dynamics of innovaAon in emerging technology segments.
The 451 Group
Tier1 Research is a single-‐source research and advisory firm covering the mulA-‐tenant datacenter, hosAng, IT and cloud-‐compuAng sectors, blending the best of industry and financial research.
The UpAme InsAtute is ‘The Global Data Center Authority’ and a pioneer in the creaAon and facilitaAon of end-‐user knowledge communiAes to improve reliability and uninterrupAble availability in datacenter faciliAes.
TheInfoPro is a leading IT advisory and research firm that provides real-‐world perspecAves on the customer and market dynamics of the enterprise informaAon technology landscape, harnessing the collecAve knowledge and insight of leading IT organizaAons worldwide.
ChangeWave Research is a research firm that idenAfies and quanAfies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.
© 2011 by The 451 Group. All rights reserved
451 Research
MaRhew AsleR • Senior analyst, enterprise soTware • With The 451 Group since 2007 • www.twiRer.com/masleR
© 2011 by The 451 Group. All rights reserved
Relevant reports
NoSQL, NewSQL and Beyond Assessing the drivers behind the development and adopAon of NoSQL and NewSQL databases, as well as data grid/caching technologies
Released April 2011 Role of open source in driving innovaAon [email protected]
© 2011 by The 451 Group. All rights reserved
NoSQL, NewSQL and Beyond
NoSQL New breed of non-‐relaAonal database products
RejecAon of fixed table schema and join operaAons
Designed to meet scalability requirements of distributed architectures
And/or schema-‐less data management requirements
© 2011 by The 451 Group. All rights reserved
NoSQL, NewSQL and Beyond
NoSQL New breed of non-‐relaAonal database products
RejecAon of fixed table schema and join operaAons
Designed to meet scalability requirements of distributed architectures
And/or schema-‐less data management requirements
NewSQL
New breed of relaAonal database products
Retain SQL and ACID Designed to meet scalability requirements of distributed architectures
Or improve performance so horizontal scalability is no longer a necessity
© 2011 by The 451 Group. All rights reserved
NoSQL, NewSQL and Beyond
NoSQL New breed of non-‐relaAonal database products
RejecAon of fixed table schema and join operaAons
Designed to meet scalability requirements of distributed architectures
And/or schema-‐less data management requirements
NewSQL
New breed of relaAonal database products
Retain SQL and ACID Designed to meet scalability requirements of distributed architectures
Or improve performance so horizontal scalability is no longer a necessity
… and Beyond
In-‐memory data grid/cache products PotenAal primary pla`orm for distributed data management
© 2011 by The 451 Group. All rights reserved
NoSQL, NewSQL and Beyond
NoSQL Big tables – data mapped by row key, column key and Ame stamp
Key-‐value stores -‐ store keys and associated values
Document store -‐ stores all data as a single document
Graph databases -‐ use nodes, properAes and edges to store data and the relaAonships between enAAes
© 2011 by The 451 Group. All rights reserved
NoSQL, NewSQL and Beyond
NoSQL Big tables – data mapped by row key, column key and Ame stamp
Key-‐value stores -‐ store keys and associated values
Document store -‐ stores all data as a single document
Graph databases -‐ use nodes, properAes and edges to store data and the relaAonships between enAAes
NewSQL MySQL storage engines -‐ scale-‐up and scale-‐out
Transparent sharding -‐ reduce to manual effort required to scale
Appliances -‐ take advantage of improved hardware performance, solid state drives
New databases -‐ designed specifically for scale-‐out
© 2011 by The 451 Group. All rights reserved
NoSQL, NewSQL and Beyond
NoSQL Big tables – data mapped by row key, column key and Ame stamp
Key-‐value stores -‐ store keys and associated values
Document store -‐ stores all data as a single document
Graph databases -‐ use nodes, properAes and edges to store data and the relaAonships between enAAes
NewSQL MySQL storage engines -‐ scale-‐up and scale-‐out
Transparent sharding -‐ reduce to manual effort required to scale
Appliances -‐ take advantage of improved hardware performance, solid state drives
New databases -‐ designed specifically for scale-‐out
Data grid/cache spectrum of data management capabiliAes, from non-‐persistent data caching to persistent caching, replicaAon, and distributed data and compute grid
© 2011 by The 451 Group. All rights reserved
Photo credit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/
© 2011 by The 451 Group. All rights reserved
SPRAIN
Scalability -‐ Hardware economics
Example project/service/vendor: • BigTable, HBase, Riak, MongoDB, Couchbase, Hadoop, Cassandra
• Amazon RDS, Xeround, SQL Azure, NuoDB
• Data grid/cache
Associated use case: • Large-‐scale distributed data storage • Analysis of conAnuously updated data • MulA-‐tenant PaaS data layer
© 2011 by The 451 Group. All rights reserved
SPRAIN
Performance -‐ MySQL limitaAons
Example project/service/vendor: • Hypertable, Couchbase, Riak, Membrain, MongoDB, Redis
• Data grid/cache • VoltDB, Clustrix
Associated use case: • Real Ame data processing of mixed read/write workloads
• Data caching • Large-‐scale data ingesAon
© 2011 by The 451 Group. All rights reserved
SPRAIN
Relaxed consistency -‐ CAP Theorem
Example project/service/vendor: • Dynamo, Voldemort, Cassandra, Riak
• Amazon SimpleDB
Associated use case: • MulA-‐data center replicaAon
• Service availability • Non-‐transacAonal data off-‐load
© 2011 by The 451 Group. All rights reserved
SPRAIN
Agility -‐ polyglot persistence, schema-‐less
Example project/service/vendor: • MongoDB, CouchDB, Cassandra, Riak
• Google App Engine, SimpleDB, SQL Azure
Associated use case: • Mobile/remote device synchronizaAon
• Agile development
• Data caching
© 2011 by The 451 Group. All rights reserved
SPRAIN
Intricacy -‐ big data, total data
Example project/service/vendor: • Neo4j, GraphDB, InfiniteGraph • Apache Cassandra, Hadoop, Riak • VoltDB, Clustrix
Associated use case: • Social networking applicaAons • Geo-‐locaAonal applicaAons • ConfiguraAon management database
© 2011 by The 451 Group. All rights reserved
SPRAIN
Necessity -‐ open source
The failure of exisAng suppliers to address emerging requirements
Example projects: • BigTable: Google • Dynamo: Amazon
• Cassandra: Facebook • HBase: Powerset • Voldemort: LinkedIn
• Hypertable: Zvents • Neo4j: Windh Technologies
© 2011 by The 451 Group. All rights reserved
Use cases – database types
© 2011 by The 451 Group. All rights reserved
Use cases – new applicaAons
Web applicaAons • social games • SaaS • e-‐commerce systems • clickstream analysis • ad and offer targeAng
© 2011 by The 451 Group. All rights reserved
Use cases – new requirements
Web applicaAons • social games • SaaS • e-‐commerce systems • clickstream analysis • ad and offer targeAng
© 2011 by The 451 Group. All rights reserved
Requirements
Data analysis • read heavy • batch processing • analyAcs-‐opAmized • data locality model
© 2011 by The 451 Group. All rights reserved
Use cases – new soluAons
Data analysis • read heavy • batch processing • analyAcs-‐opAmized • data locality model
© 2011 by The 451 Group. All rights reserved
Requirements
Data analysis • batch processing • aggregaAon of mixed data sources • structured and un/semi-‐structured data • transform and load
© 2011 by The 451 Group. All rights reserved
Use cases
Data analysis • batch processing • aggregaAon of mixed data sources • structured and un/semi-‐structured data • transform and load
© 2011 by The 451 Group. All rights reserved
Target markets
Web applicaAons • social games • SaaS • e-‐commerce systems • clickstream analysis • ad and offer targeAng
APACHE CASSANDRA JONATHAN ELLIS
Real NoSQL Applications in the Enterprise Today.
28
Today’s Database Challenge
Navigating the NoSQL waters
Distributed Horizontally scalable Eventually consistent Non-relational
Column store Document stores Key-value Graph … and more
Cassandra: the best for “big data”
Elegant architecture Operational flexibility Industry-leading performance
You should be using Cassandra for applications requiring high-performance, realtime queries scalability past one machine bulletproof reliability
Bigtable, 2006 Dynamo, 2007
OSS, 2008
Incubator, 2009 TLP, 2010 1.0, October 2011
Cassandra Highlights Multi-master, multi-DC Linearly scalable Larger-than-memory datasets High performance Full durability Integrated caching Tuneable consistency
A single four-core machine; one million inserts + one million updates
Performance
The Cassandra Difference
Scalable Performance
Oracle Exadata ✖ ✔ ✔
MySQL ✖ ✔ ✔
Sharding ✔ ✔ ✖
MongoDB ✔
Operational Ease
Cost Effective
Cassandra ✔ ✔ ✔
HBase ✔ ✖ ✔
And when it comes to Performance, we’re unmatched.
*
*
✖ ✔
Why Businesses Choose Cassandra Vertical Big-Data
Scale Never Down
Very Fast
Easy to Operate
Non- Structured
Data
Flexible Schema
Multi-DC / Cloud
Cost Effective
Media / Advertising ✔ ✔ ✔ ✔ ✔ ✔ ✔
Telecomm ✔ ✔ ✔ ✔ ✔ ✔ ✔
Financial ✔ ✔ ✔ ✔ ✔ ✔
Social ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
IT (DaaS) ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
Healthcare ✔ ✔ ✔ ✔ ✔
Online Retail ✔ ✔ ✔ ✔ ✔ ✔ The most popular types of applications that use Cassandra are those that… • Are web/SaaS-based, and/or • Collect high volumes of “Data Exhaust” from machine-generated sources
“With Cassandra, we get better business agility, and we don’t have to plan capacity in advance, we don’t need to ask permission of other people to build things for us, and we don’t worry about running out of space or power.”
Adrian Cockcroft, Cloud Architect
Netflix’s problems Could not build datacenters fast enough Made decision to go to cloud (AWS) Cassandra on AWS is a key infrastructure
component of its globally distributed streaming product.
Applications include Netflix’s subscriber system, AB testing, and viewing history service (including positions at which members stopped watching a streaming program).
Netflix on Cassandra Fast Cheap Scalable Flexible No SPOF
“Without Cassandra, our engineers would’ve had to create something that could scale to our needs, that would’ve prevented us from focusing on building product and solving problems for Backupify’s users, which are far more important tasks.”
Matt Conway, VP Engineering
Backupify’s problem Cloud-based utility that enables
businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger
Needs: Horizontal scaling Ability to handle high write loads Elasticity with no manual sharding
Backupify on Cassandra Ease of scale enabled engineers to focus
on building great applications DataStax OpsCenter made it easy to
monitor the health and perf of their cluster Reliable, redundant and scalable low-
balance data storage helped eliminate down-time
Ability to offer both backup and storage, but also analysis of data eventually
“You can seamlessly add new nodes and expand your total capacity without deteriorating the performance of the data store. Cassandra has allowed us to scale very effectively.”
Harry Robertson, Tech Lead
Ooyala’s problem Ooyala provides a suite of technologies
and services that support content owners in managing, analyzing and monetizing the digital video they publish online
Needs: Elasticity, to respond to spikes in data scale Ability to respond to increasingly
sophisticated analytic needs of customers
Ooyala on Cassandra Classic “Big Data” problem did not require
re-architecting Application agility was enabled –
developers spend time building cool apps, not figuring out how to scale
Enabled more powerful and granular analytics to their customers
“Cassandra has allowed us to build bigger features faster and more reliably, while using less money and without needing to expand our staff.”
Kyle Ambroff, Sr. Engineer
Formspring’ problem Users of Formspring engage with and learn
more about each other by asking and responding to questions. With close to 4B responses in the system and 30M unique users, they needed: To support explosive growth To seamlessly syndicate user content To avoid sharding Application flexiblity
Formspring on Cassandra No sharding needed – just add nodes to
scale Performance – the popular users with
many followers saw no speed reduction. No more memcached! Flexibility of a schema-optional
architecture is very developer friendly
Why DataStax? DataStax delivers database products and services based on Apache Cassandra from experts who are at the forefront of today's data revolution.
Database Software & Tools Support & Services
DataStax Enterprise DataStax Community DataStax OpsCenter Drivers & Connectors
Production Support Consultative Help Professional Training Online Documentation
DataStax Overview Founded in April 2010 Commercial leader in Apache Cassandra™, the popular
open-source “big data” database Headquartered in San Francisco Bay area 100+ customers 35+ employees (split between San Fran and Austin) Home to Apache Cassandra Chair & most committers Secured $11M in Series B funding in Sep 2011
100+ customers
DataStax Value The simplest way to get started with Apache
Cassandra: DataStax Community Edition A smart, integrated platform that provides
Analytics and Real-Time capabilities in the same database, without any resource contention: DataStax Enterprise
The backing of the Cassandra Experts
DataStax Enterprise 1. DataStax Enterprise
Database Server
2. OpsCenter Enterprise Management solution
3. Expert production support & consultative services
Enterprise Database Server
Leverages resources on-premise or in the cloud
Guarantees uptime with a master-less distributed architecture
Allows for fast application changes via flexible schemas
Handles structured, semi-structured, and unstructured data
Provides advanced security Eliminates the need for separate analytics
system
1
2 3
4
6 5
Real-Time
Analytics
Repl
icat
ion
Enterprise-class database built to handle today’s big-data needs in a cost-effective, easy, and reliable way.
OpsCenter Enterprise
Visual, browser-based user interface
Administration tasks carried out in point-and-click fashion
Allows for visual rebalance of data across a cluster when new nodes are added
Proactive alerts that warn of impending issues
Built-in external notification abilities
OpsCenter Enterprise supplies management, monitoring, and control over DataStax Enterprise
Expert Production Support DataStax Enterprise includes production support and consultative services from the Cassandra experts.
Support service level agreements that range from business hours to 24x7x365
Consultative support for assistance on architecture, design, and tuning
Certified quarterly service packs
Hot-fix support
DataStax Enterprise Compared
Scalable Performance
Oracle Exadata ✖ ✔ ✔
MySQL ✖ ✔ ✔
Sharding ✔ ✔ ✖
MongoDB ✔
Operational Ease
Cost Effective
DataStax Enterprise ✔ ✔ ✔
HBase ✔ ✖ ✔
✖ ✔
Real-Time + Analytics
✔
✖
✔
✖
✖
✖
Oracle NoSQL DB ✔ ✔ ? ✖
DataStax – Your One-Stop Shop DataStax Enterprise and Community Editions Professional Training, Expert Consulting Documentation and Dev Center
http://www.datastax.com/docs http://www.datastax.com/dev
Whitepapers, Case Studies, FAQ’s and more http://www.datastax.com/resources/whitepapers http://www.datastax.com/resources/casestudies
Thank you!