fit for purpose: the new database revolution findings webcast
TRANSCRIPT
One Size Doesn’t Fit AllThe database revolution
April 25, 2012
Mark R. Madsenhttp://ThirdNature.net
Robin Bloorhttp://Bloorgroup.com
Wednesday, April 25, 12
Your Host
Wednesday, April 25, 12
Analysts Host
Bloor Madsen
Wednesday, April 25, 12
Introduction
Significant and revolutionary changes are taking place in database technology
In order to investigate and analyze these changes and where they may lead, The Bloor Group has teamed up with Third Nature to launch an Open Research project.
This is the final webinar in a series of webinars and research activities that have comprised part of the project
All published research will be made available through our web site: Databaserevolution.com
Wednesday, April 25, 12
Sponsors of This Research
Wednesday, April 25, 12
General Webinar Structure
Market Changes, Database Changes (Some Of The Findings)
Let’s Talk About Performance
How to Select A Database
Wednesday, April 25, 12
Market Changes, Database Changes
Wednesday, April 25, 12
Database Performance Bottlenecks
CPU saturation
Memory saturation
Disk I/O channel saturation
Locking
Network saturation
Parallelism – inefficient load balancing
Wednesday, April 25, 12
Multiple Database Roles
BIApp
BIAppBI
AppBI
AppBIApp
BIApp
BIApp
BIAppBI
App
OLAPCubesOLAP
CubesData
WarehouseStaging
Area
OperationalDataStore
DataMartsData
Marts
PersonalData
StoresPersonal
DataStores
ContentDBMS
File or DBMS
Transactional Systems BI and Analytics Systems
Unstructured Data
Structured Data
File orDBMS
File orDBMS
AppAppAppAppAppApp
File orDBMS
DBMSDBMS
DBMS
Now there are more...Wednesday, April 25, 12
The Origin of Big Data
+ Embedded Systems Data
+ Social Network Data
+ Web Data
+ Supply Chain & Cust. Data
+ Personal Data
+ Unstructured Data
CorporateDatabases
Wednesday, April 25, 12
Wednesday, April 25, 12
Big Data = Scale Out
Server 1
Data is compressed andpartitioned on disk by column and by range
Query
Sub Query 1
Sub Query 2
The query is decomposed into a sub-query for each node
CommonMemory
The columnar database scales up and out by adding more serversDatabase
Table
Cache
CPU CPU
Server 2
CPU CPU
Server 1
CPU CPU
CommonMemory
Cache
CommonMemory
Cache
DataDataDataDataDataDataDataData
DataDataDataData
Wednesday, April 25, 12
Let’s Stop Using the Term NoSQL
As the graph indicates, it’s just not
helpful. In fact it’s downright confusing.
nosql
Data Volume
Single Table
Star Schema
Snow Flake
TNF Schema
Nested Data
Graph Data
Complex Data
OLAP
newsqloldsql
Wednesday, April 25, 12
Wednesday, April 25, 12
NoSQL DirectionsSome NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability)
Some NDBMS deploy a distributed scale-out architecture with data redundancy.
XML DBMS using XQuery are NDBMS.
Some documents stores are NDBMS (OrientDB, Terrastore, etc.)
Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.)
Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.)
Graph DBMS (DEX, OrientDB, etc.) are NDMBS
Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS
Wednesday, April 25, 12
The Joys of SQL?
SQL: very good for set manipulation. Works for OLTP and many query environments.
Not good for nested data structures (documents, web pages, etc.)
Not good for ordered data sets
Not good for data graphs (networks of values)
Wednesday, April 25, 12
Wednesday, April 25, 12
The “Impedance Mismatch”
The RDBMS stores data organized according to table structures
The OO programmer manipulates data organized according to complex object structures, which may have specific methods associated with them.
The data does not simply map to the structure it has within the database
Consequently a mapping activity is necessary to get and put data
Basically: hierarchies, types, result sets, crappy APIs, language bindings, tools
Wednesday, April 25, 12
Wednesday, April 25, 12
The SQL Barrier
SQL has:DDL (for data definition)
DML (for Select, Project and Join)
But it has no MML (Math) or TML (Time)
Usually result sets are brought to the client for further analytical manipulation, but this creates problems
Alternatively doing all analytical manipulation in the database creates problems
Wednesday, April 25, 12
Wednesday, April 25, 12
Hadoop/MapReduce
Hadoop is a parallel processing environment
Map/Reduce is a parallel processing framework
Hbase turns Hadoop into a database of a kind
Hive adds an SQL capability
Pig adds analytics
BackUp/Recov
HDFS
Node 1
MappingProcess
Scheduler
BackUp/Recov
HDFS
Node i
MappingProcess
BackUp/Recov
Node i+1
ReducingProcess
BackUp/Recov
Node j
ReducingProcess
BackUp/Recov
Node k
ReducingProcess
Map Partition Combine Reduce
BackUp/Recov
Wednesday, April 25, 12
Wednesday, April 25, 12
Market Forces
A new set of products appear
They include some fundamental innovations
A few are sufficiently popular to last
Fashion and marketing drive greater adoption
Products defects begin to be addressed
They eventually challenge the dominant products
Wednesday, April 25, 12
Let’s Talk About Performance
Wednesday, April 25, 12
Performance%and%Scalability%
Scalability%and%performance%are%not%the%same%thing%
Throughput:"the"number"of"tasks"completed"in"a"given"5me"period"A"measure"of"how"much"work"is"or"can"be"done"by"a"system"in"a"set"amount"of"5me,"e.g."TPM"or"data"loaded"per"hour."
It’s"easy"to"increase"throughput"without"improving"response"5me."
Page 14
Performance%measures%
Performance%measures%
Response'8me:"the"speed"of"a"single"task"
Response"5me"is"usually"the"measure"of"an"individual's"experience"using"a"system.""
Response"5me"=""5me"interval"/"throughput"
Page 15
Scalability%vs%throughput%vs%response%<me%
Scalability"="consistent"performance"for"a"task"over"an"increase"in"a"scale"factor"
Three%possible%scale%factors%
Number of users!
Computations!
Amount of data!
Scale:%Data%Volume%
The"different"ways"people"count"make"establishing"rules"of"thumb"for"sizing"hard."
How"do"you"measure"it?"▪ Row"counts"▪ Transac5on"counts"▪ Data"size"▪ Raw"data"vs"loaded"data"▪ Schema"objects"
People's8ll'have'trouble'scaling'for'databases'as'large'as'a'single'PC'hard'drive.'
Scale:%Concurrency%(ac<ve%and%passive)%
Scalability%rela<onships%
As"concurrency"increases,"response"5me"(usually)"decreases,"
This"can"be"addressed"somewhat"via"workload"management"tools."
When"a"system"hits"a"bogleneck,"response"5me"and"throughput"will"ohen"get"worse,"not"just"level"off."
“Linear%Scalability”%
This"is"the"part"of"the"chart"most"vendors"show."
If you’re lucky they leave the bottom axis on so you know where their system flatlines.
Scale:%Computa<onal%Complexity%
A"key"point"worth"remembering:"
Performance"over"size"<>"performance"over"complexity"
Analy5cs"performance"is"about"the"intersec5on"of"both."
Database"performance"for"BI"is"mostly"related"to"size"and"query"complexity."
SOME%TECHNOLOGY%STUFF%
Large%Memories%and%Large%Databases%
Not"as"fast"as"you"expect"because"of"how"databases"were"designed"(op5mized"for"small"memories"and"disk"access)."For"example:"sequen5al"scans"and"cache"serializa5on"
512GB DB buffer cache
1B rows, 100/block = 640GB table unread
LRU overwrites older blocks
In_Memory%Databases%Today%
1. Maybe"not"as"fast"you"think."Depends"en5rely"on"the"database"(e.g."VectorWise)"
2. Applied"mainly"to"shared?everything"systems"
3. Very"large"memories"are"more"applicable"to"shared?nothing"than"shared?memory"systems"
7. S5ll"an"expensive"way"to"get"performance"
" "Box?limited "Limited"by"node"scaling"" "e.g."2"TB"max "e.g."16"nodes,"512GB"per"="8TB"
Hardware%changes%enable%new%so`ware%models%
The"extra"CPU"allows"us"to"do"things"in"sohware"that"we"avoided"in"the"past"because"of"scarce"resources."
Compression"techniques"and"columnar"database"architectures"which"that"consumed"too"much"are"now"possible."
Improving%Query%Performance:%Columnar%Databases%
Marge"Inovera"
Anita"Bath"
Ivan"Awfulitch"
Nadia"Geddit"
$150,000"
$120,000"
$166,000"
$36,000"
1"
2"
3"
4"
In a row-store model these three rows would be stored in sequential order as shown here, packed into a block.
In a column store they would be divided into columns and stored in different blocks.
ID% Name% Salary% Posi<on%
1" Marge"Inovera" $150,000" Sta5s5cian"
2" Anita"Bath" $120,000" Sewer"inspector"
3" Ivan"Awfulitch" $160,000" Dermatologist"
4" Nadia"Geddit" $36,000" DBA"
Sta5s5cian"
Sewer"inspector"
Dermatologist"
DBA"
Inser<ng%data%into%a%columnar%database%
Marge"Inovera"
Anita"Bath"
Ivan"Awfulitch"
Nadia"Geddit"
$150,000"
$120,000"
$166,000"
$36,000"
1"
2"
3"
4"
Each column is stored in its own set of blocks, written to disk separately.
Extra work for writes over rowstore, update complexity, delete complexity.
Sta5s5cian"
Sewer"inspector"
Dermatologist"
DBA"
Reading%from%a%columnar%database%
Marge"Inovera"
Anita"Bath"
Ivan"Awfulitch"
Nadia"Geddit"
$150,000"
$120,000"
$166,000"
$36,000"
1"
2"
3"
4"
SELECT * FROM emp WHERE ID = 1
4 reads, extract & stitch
Sta5s5cian"
Sewer"inspector"
Dermatologist"
DBA"
Column%elimina<on%and%I/O%
Marge"Inovera"
Anita"Bath"
Ivan"Awfulitch"
Nadia"Geddit"
$150,000"
$120,000"
$166,000"
$36,000"
1"
2"
3"
4"
SELECT AVG(salary) FROM emp
1 read
Sta5s5cian"
Sewer"inspector"
Dermatologist"
DBA"
How%do%we%scale%performance%for%queries?%
Faster"CPUs"means"quicker"response"5me,"increased"throughput."
Query
CPU
Make CPU faster
Parallelize query execution
Add CPUs
Parallel"query"execu5on"resolves"response"5me"but"it"consumes"more"resources,"reducing"concurrency"and"possibly"throughput."
More"CPUs"means"more"throughput."
Early%query%performance%scaling:%table%par<<oning%
Table"par55oning"distributes"rows"across"table"par55ons"by"range,"hash"or"round"robin"when"you"insert"or"load"the"data."
QI Sales Table
fn
Q2 Sales Table Q3 Sales Table Q4 Sales Table
Scale_up%vs.%Scale_out%Parallelism%
Uniprocessor"environments"required"chip"upgrades."
SMP"servers"can"grow"to"a"point,"then"it’s"a"forklih"upgrade"to"a"bigger"box."
MPP"servers"grow"by"adding"mode"nodes."
Slide 34 Copyright"Third"Nature,"Inc."
(a)"Scaling"up"with"a"larger"server"(b)"Scaling"out"with"many"small"servers"
Sharding,%aka%Par<<oning%at%the%Node%Level%
Sharding"is"basically"horizontal"par55oning"applied"across"mul5ple"database"servers."
Each"node"holds"a"(hopefully)"self?consistent"por5on"of"the"database."
Good"as"long"as"queried"data"lives"on"a"single"node."
One large database = several smaller databases
Query redirect
Sharding,%Databases%and%Queries%
What"happens"when"you"need"to"scan"a"full"table"or"join"tables"across"nodes?"Mul5ple"queries"and"s5tching"at"the"applica5on"level."
Sharding"works"well"for"fixed"access"paths,"uniform"query"plans,"and"data"sets"that"can"be"isolated."Mainly"this"describes"an"OLTP?style"workload."
Cloud%Hardware%Architecture%
It’s"a"scale?out"model."Uniform"virtual"node"building"blocks."
This"is"the"future"of"sohware"deployments,"albeit"with"increasing"node"sizes,"so"paying"agen5on"to"early"adopters"today"will"pay"off."
This"implies"that"an"MPP"database"architecture"will"be"needed"for"scale."
X
MPP%Database%Architecture%
Slide 38 Copyright"Third"Nature,"Inc."
Worker"nodes"
Leader"node(s)"used"by"some"
High"speed"interconnect"
Some"use"separate"loader"nodes"
Some database are symmetric (all nodes are the same). Some allow mixed worker node sizes. Some are leaderless.
Some problems with leaders, loaders, e.g. less automated management of the environment, treating bottlenecks
Key%to%MPP:%data%distribu<on%
Slide 39 Copyright"Third"Nature,"Inc."
Table data is evenly spread across all nodes.
The good: scalability to petabyte range, much faster filtering and selection on scans.
The bad: data skew (values, not rowcounts), aggregate function bottlenecks, concurrency challenges, complex multi-table joins with unlike distributions.
Single logical view of a table
MPP%challenges%mostly%hinge%on%data%distribu<on%
Imagine"fact"&"dim"tables"spread"across"all"nodes."
You"need"to"get"dim"data"to"each"node"to"join"with"fact"rows"stored"there."
Cross?node"joins"result"in"data"shipping."This"is"where"inter?node"latency,"data"skew,"node"skew"can"bog"down"query"performance."
The"real"test"of"an"MPP"database"is"not"how"fast"it"can"scan"data."That’s"easy."Test"joins"in"a"PoC."
Fact tb
Dim tb
Node 1
Fact tb
Dim tb
Node 2
MATCHING%PROBLEMS%TO%TECHNOLOGIES%
Solving%the%Problem%Depends%on%the%Diagnosis%
Three%General%Workloads%
Online"Transac5on"Processing"▪ Read,"write,"update"▪ User"concurrency"is"the"common"performance"limiter"
▪ Low"data,"compute"complexity"
Business"Intelligence"/"Data"warehousing"▪ Assumed"to"be"read?only,"but"really"read"heavy,"write"heavy,"usually"separated"in"5me"
▪ Data"size"is"the"common"performance"limiter"
▪ High"data"complexity,"low"compute"complexity"
Analy5cs"▪ Read,"write"▪ Data"size"and"complexity"of"algorithm"are"the"limiters"
▪ Moderate"data","high"compute"complexity"
Three%General%Workloads%
But…"
BI"is"not"read"only"
OLTP"is"not"write?only"
Analy5cs"is"not"purely"computa5on"
Types%of%workloads%
Write?biased:""▪ OLTP"▪ OLTP,"batch"▪ OLTP,"lite"▪ Object"persistence"▪ Data"ingest,"batch"▪ Data"ingest,"real?5me"
Read?biased:"▪ Query"▪ Query,"simple"retrieval"▪ Query,"complex"▪ Query?hierarchical"/"object"/"network"
▪ Analy5c"
Mixed
Inline analytic execution, operational BI
What%you%need%depends%on%workload%&%need%
Op5mizing"for:"▪ Response"5me?"▪ Throughput?"▪ both?"
Concerned"about"rapid"growth"in"data?"
Unpredictable"spikes"in"use?"
Bulk"loads"or"incremental"inserts"and/or"updates?"
Important%workload%parameters%to%know%
• Read?intensive""vs."write?intensive"
Important%workload%parameters%to%know%
• Read?intensive""vs."write?intensive"• Mutable"vs."immutable"data"
Important%workload%parameters%to%know%
• Read?intensive""vs."write?intensive"• Mutable"vs."immutable"data"
• Immediate"vs."eventual"consistency"
Important%workload%parameters%to%know%
• Read?intensive""vs."write?intensive"• Mutable"vs."immutable"data"
• Immediate"vs."eventual"consistency"
• Short"vs."long"access"latency"
Important%workload%parameters%to%know%
• Read?intensive""vs."write?intensive"• Mutable"vs."immutable"data"
• Immediate"vs."eventual"consistency"
• Short"vs."long"data"latency"• Predictable"vs."unpredictable"data"access"pagerns"
Important%workload%parameters%to%know%
• Read?intensive""vs."write?intensive"• Mutable"vs."immutable"data"
• Immediate"vs."eventual"consistency"
• Short"vs."long"data"latency"• Predictable"vs."unpredictable"data"access"pagerns"• Simple"vs."complex"data"types"
You"must"understand"your"workload"mix"?"throughput"and"response"5me"requirements"aren’t"enough."▪ 100"simple"queries"accessing"month?to?date"data"
▪ 90"simple"queries"accessing"month?to?date"data"and"10"complex"queries"using"two"years"of"history"
▪ Hazard"calcula5on"for"the"en5re"customer"master"
▪ Performance"problems"are"rarely"due"to"a"single"factor.""
Two%useful%concepts%to%characterize%queries%
Selec7vity"–"The"restric5veness"of"a"query"when"accessing"data."A"highly"selec5ve"query"filters"out"most"rows."Low"selec5ve"queries"read"most"of"the"rows."
"High "Low"SELECT SUM(salary) FROM emp WHERE ID = 1
SELECT SUM(salary) FROM emp
Two%useful%concepts%to%characterize%queries%
Retrieval"–"The"restric5veness"of"a"query"when"returning"data."High"retrieval"brings"back"most"of"the"rows."Low"retrieval"brings"back"rela5vely"few"rows."
"High "Low"SELECT name, salary FROM emp
SELECT SUM(salary) FROM emp
Selec<vity%and%number%of%columns%queried%
Row"store"or"column"store,"indexed"or"not?"
Chart from “The Mimicking Octopus: Towards a one-size-fits-all Database Architecture”, Alekh Jindal
Characteris<cs%of%query%workloads%
Workload% Selec<vity% Retrieval% Repe<<on% Complexity%
Repor<ng%/%BI% Moderate% Low% Moderate% Moderate%
Dashboards%/%scorecards%
Moderate% Low% High% Low%
Ad_hoc%query%and%analysis%
Low%to%high%
Moderate%to%low%
Low% Low%to%moderate%
Analy<cs%(batch)% Low% High% Low%to%High% Low*%
Analy<cs%(inline)% High% Low% High% Low*%
Opera<onal%/%embedded%BI%
High% Low% High% Low%
* Low for retrieving the data, high if doing analytics in SQL
Characteris<cs%of%read_write%workloads%
Workload% Selec<vity% Retrieval% Repe<<on% Complexity%
Online%OLTP% High% Low% High% Low%
Batch%OLTP% Moderate%to%low%
Moderate%to%high%
High% Moderate%to%high%
Object%persistence%
High% Low% High% Low%
Bulk%ingest% Low%(write)% n/a% High% Low%
Real<me%ingest% High%(write)% n/a% High% Low%
With ingest workloads we’re dealing with write-only, so selectivity and retrieval don’t apply in the same way, instead it’s write volume.
Workload%parameters%and%DB%types%at"data"scale"
Workload%parameters%
Write_biased%
Read_biased%
Updateable%data%
Eventual%consistency%ok?%
Un_predictable%query%path%
Compute%intensive%
Standard%RDBMS%
Parallel%RDBMS%
NoSQL%(kv,%dht,%obj)%
Hadoop*%
Streaming%database%
You see the problem: it’s an intersection of multiple parameters, and this chart only includes the first tier of parameters. Plus, workload factors can completely invert these general rules of thumb.
Workload%parameters%and%DB%types%at"data"scale"
Workload%parameters%
Complex%queries%
Selec<ve%queries%
Low%latency%queries%
High%concurrency%
High%ingest%rate%
Standard%RDBMS%
Parallel%RDBMS%
NoSQL%(kv,%dht,%obj)%
Hadoop%
Streaming%database%
You have to look at the combination of workload factors: data scale, concurrency, latency & response time, then chart the parameters.
Problem:%Architecture%Can%Define%Op<ons%
A%general%rule%for%the%read_write%axes%
As"workloads"increase"in"both"intensity"and"complexity,"we"move"into"a"realm"of"specialized"databases"adapted"to"specific"workloads."
Write intensity
Read intensity
OldSQL
NewSQL
NoSQL
In%general…%
Rela5onal"row"store"databases"for"conven5onally"tooled"low"to"mid?scale"OLTP"Rela5onal"databases"for"ACID"requirements"
Parallel"databases"(row"or"column)"for"unpredictable"or"variable"query"workloads"Specialized"databases"for"complex"data"query"workjloads"
NoSQL"(KVS,"DHT)"for"high"scale"OLTP"NoSQL"(KVS,"DHT)"for"low"latency"read?mostly"data"access"Parallel"databases"(row"or"column)"for"analy5c"workloads"over"tabular"data"NoSQL"/"Hadoop"for"batch"analy5c"workloads"over"large"data"volumes"
How To Select A Database
Wednesday, April 25, 12
Wednesday, April 25, 12
How To Select A Database - (1)1.What are the data management requirements and policies (if any) in
respect of:
- Data security (including regulatory requirements)?
- Data cleansing?
- Data governance?
- Deployment of solutions in the cloud?
- If a deployment environment is mandated, what are its technical characteristics and limitations? Best of breed, no standards for anything, “polyglot persistence” = silos on steroids, data integration challenges, shifting data movement architectures
2.What kind of data will be stored and used?
- Is it structured or unstructured?
- Is it likely to be one big table or many tables?
Wednesday, April 25, 12
How To Select A Database - (2)3.What are the data volumes expected to be?
- What is the expected daily ingest rate?
- What will the data retention/archiving policy be?
- How big do we expect the database to grow to? (estimate a range).4. What are the applications that will use the database?
- Estimate by user numbers and transaction numbers
- Roughly classify transactions as OLTP, short query, long query, long query with analytics.
- What are the expectations in respect of growth of usage (per user) and growth of user population?
5.What are the expected service levels?
- Classify according to availability service levels
- Classify according to response time service levels
- Classify on throughput where appropriate
Wednesday, April 25, 12
How To Select A Database - (3)6.What is the budget for this project and what does that cover?7.What is the outline project plan?
- Timescales
- Delivery of benefits
- When are costs incurred?
8.Who will make up the project team?
- Internal staff
- External consultants
- Vendor consultants
9.What is the policy in respect of external support, possibly including vendor consultancy for the early stages of the project?
Wednesday, April 25, 12
How To Select A Database - (4)10.What are the business benefits?
- Which ones can be quantified financially?
- Which ones can only be guessed at (financially)?
- Are there opportunity costs?
Wednesday, April 25, 12
A random selection of databasesSybase IQ, ASETeradata, Aster DataOracle, RACMicrosoft SQLServer, PDWIBM DB2s, NetezzaParaccelKognitioEMC/GreenplumOracle ExadataSAP HANAInfobrightMySQLMarkLogicTokyo Cabinet
EnterpriseDB LucidDBVectorwiseMonetDBExasolIlluminateVerticaInfiniDB1010 DataSANDEndecaXtreme DataIMSHive
AlgebraixIntersystems CachéStreambaseSQLStreamCoral8IngresPostgresCassandraCouchDBMongoHbaseRedisRainStorScalaris
And a few hundred more…Wednesday, April 25, 12
Product%selec<on%op<ons%
The"Subtrac5on"Model"▪ Start"with"a"full"set,"remove"what’s"bad,"evaluate"the"remainder"▪ Conven5onal"analyst"model"
▪ Works"best"with"a"stable"market"
The"Addi5on"Model"▪ Start"with"an"empty"set,"add"what’s"good,"evaluate"the"results"
▪ The"designer"model"▪ Works"best"in"an"emerging"or"changing"market"
Product Selection
Preliminary investigation
Short-list (usually arrived at by elimination)
Be sure to set the goals and control the process.
Evaluation by technical analysis and modeling
Evaluation by proof of concept.
Do not be afraid to change your mind
Negotiation
Wednesday, April 25, 12
Conclusion
Wherein all is revealed, or ignorance exposed
Wednesday, April 25, 12
Wednesday, April 25, 12
Thank YouFor YourAttention
Wednesday, April 25, 12