next generation databases july2010
DESCRIPTION
TRANSCRIPT
1
© 2010 Quest Software, Inc. ALL RIGHTS RESERVED
This is Not Your Father’s Database: Everything You Need to Know Now About Cloud Computing and Emerging Database Technology
Guy Harrison
Director Research and Development, Melbourne
www.guyharrison.net
2
Introductions
3
4
Mainframes Minicomputers Client Server Internet/Y2K Boom After the gold rush
6
Current Day Trends• Big Data• Cloud computing• Solid State Disk
7
Big Data• The Industrial Revolution of data*
– User generated data:• Twitter, Facebook, Amazon
– Machine generated data:• RFID, POS, cell phones, GPS
• Traditional RDBMS neither economic or capable
* http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html
8
Big data 1: Google
9
Map Reduce
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
10
Hadoop: Open source Map-reduce
• Yahoo! Hadoop cluster:– 4000 nodes– 16PB disk– 64 TB of RAM– 32,000 Cores– Very Low $/TB
11
Hive
SQL
Java
Re
sults
12
Big Data 2: Web 2.0
13
Twitter Growth
14
The fail whale
15
Web Servers
Database
Servers
Memcached Servers
Shard (G-O) Shard (P-Z)Shard (A-F)
Read Only Slaves
16
Clouds and Elastic provisioning
Over provisioned
Under provisioned
Capacity /
Demand
Time
Demand
Hardware upgrade
Capacity
17
CAP Theorem
Consistency
Availability
R
D
B
M
S
NO
GO
Partition
Tolerance
NoSQL
18
In search of the elastic database• Big Web sites AND Cloud applications need servers that scale
up (and down) on demand• Elastic provisioning works fine for web servers, application
servers, etc.• However RDBMS does not scale easily:
– SQL Azure limited to one database <50GB on a single host– Oracle’s RAC not supported in cloud environments– MySQL sharding “obnoxious”
• Many are willing to sacrifice relational database features for scalability and operational simplicity
19
The NoSQL movement
20
NoSQL (A.K.A.) Cloud databases• Generally DO NOT support
– SQL– Transactions– Immediate consistency
• Usually DO support:– Elasticity (scale out AND in)– Eventual consistency– Inherent redundancy and fault tolerance
21
NoSQL Data Models
Key Value Stores
Amazon Dynamo
Google BigTable
Document DB
JSON/XML DB
Graph Databases
MemcacheDB
Azure Table Services
Redis
Tokyo Cabinet
SimpleDB
Riak
Voldemort
Cassandra
Hbase
Hypertable
CouchDB
MongoDB
Neo4J
FlockDB
23
Not so easy to get the data out....
Amazon AWS Cloud
Microsoft Azure Cloud
On-Premise
(AKA private Cloud)
Data Hub
MySQL
HBase
SimpleDB
SQL Azure
Table Services SQL Server Oracle
Data Hub
SQL
SQL
26
Big Data 3: Data Warehousing
1996 1998 2000 2002 2004 2006 2008 20100
100
200
300
400
500
600
TB
27
Data Warehouse players
28
DATAllegro architecture
29
Column Databases (Vertica, Sybase)
• Data is stored together in columns
• Very fast answers to analytic aggregate queries
• Better compression• Not write optimized
30
Disk drives and Moore’s law• Transistor density doubles every 18 months• Exponential growth is observed in most electronic components:
– CPU clock speeds– RAM– Hard Disk Drive storage density
• But not in mechanical components– Service time (Seek latency) – limited by actuator arm speed and
disk circumference – Throughput (rotational latency) – limited by speed of rotation,
circumference and data density
31
Big Data vs. Fast Data
IO Rate Disk Capacity IO/GB CPU IO/CPU-1,000
-500
0
500
1,000
1,500
2,000
260 1,635
-630
1,013
-390
%ag
e ch
ang
e
Disk trends 2001-2009
32
SSD to the rescue?
Solid State Disk DDR-RAM
Solid State Disk Flash
Magnetic Disk
0 1,000 2,000 3,000 4,000 5,000
15
200
4,000
Seek time (us)
33
Power consumption
Idle
Seek
Start up
1 10 100
8
10
20
Flash SSD
SATA HDD
Watts (logarithmic scale)
34
Economics of SSD
Capacity HDDs
Performance HDDs
Flash SSDs (read)
DRAM SSDs
$0.10 $1.00 $10.00 $100.00 $1,000.00
$13.30
$16.60
$1.40
$0.50
$3.00
$28.00
$100.00
$400.00
$/GB
$/IOPs
35
Fast reads but slow writes
256 page block erase
4k page write
4k page seek
0 500 1000 1500 2000 2500
2000
250
25
microseconds
36
Hierarchical storage management
Main Memory
DDR SSD
Flash SSD
Disk
Tape
$/IOP$/
GB
37
In Memory Databases: VoltDB & H-Store• In Memory Distributed (“Sharded”) Database• No transactional IO• ACID transactions (k-safety)• Single Threaded (no latches or locks)• Java Stored Procedure transactions• Hierarchical data model
• Double Shared Nothing (disk
OR CPU)
• Spool out to DW for ad-hoc
analysis
• Very high TPS for suitable
applications
38
Oracle EXADATA
• RAC clusters provide MPP• Dedicated storage servers• High Speed infiniband
channels • Smart storage reduces data
transfer requirements • Hybrid Flash & spinning disk
storage system• Flash caching in the database
systems
39
The Next Generation?
40
© 2010 Quest Software, Inc. ALL RIGHTS RESERVED
너를 감사하십시요 Thank You Danke Schön
Gracias 有難う御座いました Merci
Grazie Obrigado 谢谢