next generation databases july2010

40
1 © 2010 Quest Software, Inc. ALL RIGHTS RESERVED This is Not Your Father’s Database: Everything You Need to Know Now About Cloud Computing and Emerging Database Technology Guy Harrison Director Research and Development, Melbourne [email protected] www.guyharrison.net

Upload: guy-harrison

Post on 04-Dec-2014

1.726 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Next generation databases july2010

1

© 2010 Quest Software, Inc. ALL RIGHTS RESERVED

This is Not Your Father’s Database: Everything You Need to Know Now About Cloud Computing and Emerging Database Technology 

Guy Harrison

Director Research and Development, Melbourne

[email protected]

www.guyharrison.net

Page 2: Next generation databases july2010

2

Introductions

Page 3: Next generation databases july2010

3

Page 4: Next generation databases july2010

4

Page 5: Next generation databases july2010

Mainframes Minicomputers Client Server Internet/Y2K Boom After the gold rush

Page 6: Next generation databases july2010

6

Current Day Trends• Big Data• Cloud computing• Solid State Disk

Page 7: Next generation databases july2010

7

Big Data• The Industrial Revolution of data*

– User generated data:• Twitter, Facebook, Amazon

– Machine generated data:• RFID, POS, cell phones, GPS

• Traditional RDBMS neither economic or capable

* http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html

Page 8: Next generation databases july2010

8

Big data 1: Google

Page 9: Next generation databases july2010

9

Map Reduce

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Page 10: Next generation databases july2010

10

Hadoop: Open source Map-reduce

• Yahoo! Hadoop cluster:– 4000 nodes– 16PB disk– 64 TB of RAM– 32,000 Cores– Very Low $/TB

Page 11: Next generation databases july2010

11

Hive

SQL

Java

Re

sults

Page 12: Next generation databases july2010

12

Big Data 2: Web 2.0

Page 13: Next generation databases july2010

13

Twitter Growth

Page 14: Next generation databases july2010

14

The fail whale

Page 15: Next generation databases july2010

15

Web Servers

Database

Servers

Memcached Servers

Shard (G-O) Shard (P-Z)Shard (A-F)

Read Only Slaves

Page 16: Next generation databases july2010

16

Clouds and Elastic provisioning

Over provisioned

Under provisioned

Capacity /

Demand

Time

Demand

Hardware upgrade

Capacity

Page 17: Next generation databases july2010

17

CAP Theorem

Consistency

Availability

R

D

B

M

S

NO

GO

Partition

Tolerance

NoSQL

Page 18: Next generation databases july2010

18

In search of the elastic database• Big Web sites AND Cloud applications need servers that scale

up (and down) on demand• Elastic provisioning works fine for web servers, application

servers, etc.• However RDBMS does not scale easily:

– SQL Azure limited to one database <50GB on a single host– Oracle’s RAC not supported in cloud environments– MySQL sharding “obnoxious”

• Many are willing to sacrifice relational database features for scalability and operational simplicity

Page 19: Next generation databases july2010

19

The NoSQL movement

Page 20: Next generation databases july2010

20

NoSQL (A.K.A.) Cloud databases• Generally DO NOT support

– SQL– Transactions– Immediate consistency

• Usually DO support:– Elasticity (scale out AND in)– Eventual consistency– Inherent redundancy and fault tolerance

Page 21: Next generation databases july2010

21

NoSQL Data Models

Page 22: Next generation databases july2010

Key Value Stores

Amazon Dynamo

Google BigTable

Document DB

JSON/XML DB

Graph Databases

MemcacheDB

Azure Table Services

Redis

Tokyo Cabinet

SimpleDB

Riak

Voldemort

Cassandra

Hbase

Hypertable

CouchDB

MongoDB

Neo4J

FlockDB

Page 23: Next generation databases july2010

23

Not so easy to get the data out....

Page 24: Next generation databases july2010

Amazon AWS Cloud

Microsoft Azure Cloud

On-Premise

(AKA private Cloud)

Data Hub

MySQL

HBase

SimpleDB

SQL Azure

Table Services SQL Server Oracle

Data Hub

SQL

SQL

Page 25: Next generation databases july2010
Page 26: Next generation databases july2010

26

Big Data 3: Data Warehousing

1996 1998 2000 2002 2004 2006 2008 20100

100

200

300

400

500

600

TB

Page 27: Next generation databases july2010

27

Data Warehouse players

Page 28: Next generation databases july2010

28

DATAllegro architecture

Page 29: Next generation databases july2010

29

Column Databases (Vertica, Sybase)

• Data is stored together in columns

• Very fast answers to analytic aggregate queries

• Better compression• Not write optimized

Page 30: Next generation databases july2010

30

Disk drives and Moore’s law• Transistor density doubles every 18 months• Exponential growth is observed in most electronic components:

– CPU clock speeds– RAM– Hard Disk Drive storage density

• But not in mechanical components– Service time (Seek latency) – limited by actuator arm speed and

disk circumference – Throughput (rotational latency) – limited by speed of rotation,

circumference and data density

Page 31: Next generation databases july2010

31

Big Data vs. Fast Data

IO Rate Disk Capacity IO/GB CPU IO/CPU-1,000

-500

0

500

1,000

1,500

2,000

260 1,635

-630

1,013

-390

%ag

e ch

ang

e

Disk trends 2001-2009

Page 32: Next generation databases july2010

32

SSD to the rescue?

Solid State Disk DDR-RAM

Solid State Disk Flash

Magnetic Disk

0 1,000 2,000 3,000 4,000 5,000

15

200

4,000

Seek time (us)

Page 33: Next generation databases july2010

33

Power consumption

Idle

Seek

Start up

1 10 100

8

10

20

Flash SSD

SATA HDD

Watts (logarithmic scale)

Page 34: Next generation databases july2010

34

Economics of SSD

Capacity HDDs

Performance HDDs

Flash SSDs (read)

DRAM SSDs

$0.10 $1.00 $10.00 $100.00 $1,000.00

$13.30

$16.60

$1.40

$0.50

$3.00

$28.00

$100.00

$400.00

$/GB

$/IOPs

Page 35: Next generation databases july2010

35

Fast reads but slow writes

256 page block erase

4k page write

4k page seek

0 500 1000 1500 2000 2500

2000

250

25

microseconds

Page 36: Next generation databases july2010

36

Hierarchical storage management

Main Memory

DDR SSD

Flash SSD

Disk

Tape

$/IOP$/

GB

Page 37: Next generation databases july2010

37

In Memory Databases: VoltDB & H-Store• In Memory Distributed (“Sharded”) Database• No transactional IO• ACID transactions (k-safety)• Single Threaded (no latches or locks)• Java Stored Procedure transactions• Hierarchical data model

• Double Shared Nothing (disk

OR CPU)

• Spool out to DW for ad-hoc

analysis

• Very high TPS for suitable

applications

Page 38: Next generation databases july2010

38

Oracle EXADATA

• RAC clusters provide MPP• Dedicated storage servers• High Speed infiniband

channels • Smart storage reduces data

transfer requirements • Hybrid Flash & spinning disk

storage system• Flash caching in the database

systems

Page 39: Next generation databases july2010

39

The Next Generation?

Page 40: Next generation databases july2010

40

© 2010 Quest Software, Inc. ALL RIGHTS RESERVED

너를 감사하십시요 Thank You Danke Schön

Gracias 有難う御座いました Merci

Grazie Obrigado 谢谢