analysis of cloud data management systems student:miro szydlowski supervisor: prof. mehmet orgun...

23
Analysis of Cloud Data Management Systems Student: Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Post on 19-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Analysis of Cloud Data Management Systems

Student: Miro SzydlowskiSupervisor: Prof. Mehmet OrgunDate: 11.11.11

Page 2: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

1970 2011

INTRODUCTION

Relational DatabaseManagement Systems

Distributed DatabasesNoSQLCloud Data Stores

?1/22

Page 3: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Presentation Plan

• Origins of Database Management Systems• Raise to power• ACID qualities

• Problems and Solutions• Consequences of being popular• Partitioning, Replication, Load Balancing,• Distributed Database Management Systems

• Challenges of Connected World• Cloud Computing

• Definition, Type• Place of DBMS in Cloud

• Cloud Data Management Systems• CAP, BASE, NoSQL and few other concepts• NoSQL by implementation type• Example: AmazonDB

• Which one to choose?2/22

Page 4: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Database Management Systems“…set of software programs that control the organisation, storage, management and data retrieval”

Database Models:Hierarchical Network

Relational Object-relational

3/22

Page 5: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Origins of Relational Database Management Systems

• 1970, University of California• In the following 20 years became not only

accepted not only essential, but considered the only solution for enterprise data storage

• Why?• Data normalisation• Metadata reuse• User Views <-> Community View <-> Storage• SQL!• Guarantees data integrity - ACID

4/22

Page 6: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

ACID

• Atomicity• Consistency• Isolation• Durability

• Provides consistent state of the database• …but at a cost

5/22

Page 7: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Problems and Solutions

Very successful solution, but the businesses were growing…• Data volume• Data warehousing, business intelligence• Merges and acquisitions• WWW

New Solutions:• Partitioning• Hardware • Horizontal• Vertical

• Replication• Multi-master• Master-Slave

• Load Balancing

• …and finally• Distributed Database

Management Systems

…but the challenges kept coming…6/22

Page 8: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Challenges of the Connected World• Search Engines• Mobile Devices• Business-To-Business (Web Services)• Stream Processing• Data Warehousing• Directory Services

Current example: 2011 Twitter statistics:• 1 Billion Tweets per week• 140 million Tweets per day in average• 177 million Tweets sent on March 11, 2011.• Current record: 6,939 TPS - set 4 seconds after midnight in Japan on New Year’s Day.

New Solutions needed ASAP7/22

Page 9: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

What is Cloud Computing?• Lots of definitions, one of them below:

“…a pool of highly scalable, abstracted infrastructure, capable of hosting end-customer applications, that is billed by consumption” (James Staten)

• Automation• Virtualization• Scalability• Pay-as-you-go pricing model

8/22

Page 10: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Cloud Computing Types

Application Software

Infrastructure Software

Operating System

Virtualisation Layer

Server Hardware

Network, Firewalls

Data Centre Infrastructure

Infr

astr

uctu

re a

s a

Serv

ice

Pla

tform

as a

Serv

ice

Soft

ware

as a

Serv

ice

By Deployment Type By Service Type

Cloud Data Management Systems?IaaS or PaaS

9/22

Page 11: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Dark CloudBeginning of 21st century – open critique of the relational database management systems:

• Too complex for an average user• Can’t cope with data volumes• Relational mapping is an overkill• One size doesn’t fit all – we want to prioritize some

features• Why do we need to build the ORM?• Distributed RDMSs are fake!• Scalability!

Why don’t we re-engineer and rebuild instead of constantly ‘patching’ RDBMS?

10/22

Page 12: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

CAP and BASEEric Brewer at ACM Symposium in 2000 made a statement:It is unachievable to implement all three qualities of a “shared-data system” at once:

• Consistency • Availability• Partition Tolerance

…so – pick any two!

Since we can’t guarantee ACID, lets BASE our systems on another principle:

• Basically Available• Soft State• Eventually Consistent

These two ideas changed the approach to the database design……and gave birth to the ‘NoSQL’ movement

11/22

Page 13: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Few new conceptsHash – based partitioning• certain property of each entity is used to calculate a hash value,

which is used to determine which database server to use to store the entity

‘Shared nothing’ architecture• cluster of independent machines that communicate over a high-

speed network

Sharding• splitting up a database across multiple machines

MapReduce• not a database system, but a programming framework• every job sent is divided into two parts: a ‘Map’, and a ‘Reduce’

12/22

Page 14: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

NoSQL Movement• Their main objection: unnecessary complexity of the

relational databases• Motto: “select a right tool for the job”• “Tool in the box” approach

• Principles of NoSQL data stores:• Built for performance• Built for real scalability• Build for high availability• Typically use a very specific data access pattern• Either schemaless or implementing very simple schemas• Weak consistency guarantees• Declarative query language (such as SQL) replaced with

simple APIs

13/22

Page 15: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

NoSQL Databases by Implementation Type

• Key/Value Stores• BigTable• Document-based• Columnar

(also, graph, object-oriented, distributed object stores and dozen of others…)

14/22

Page 16: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Key/Value Stores

• Data is stored as a key/value pair• Basic APIs – Put/Get/Remove• Scalability: Sharding or Replicating data items• Advantages: Performance and scalability• Best For: High-performance systems that deal with one

type of object• Examples: HBase, SimpleDB, Cassandra• Potential Issues: Data Integrity has to be supported by

application, supports only one type of query

15/22

Page 17: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

‘BigTable’ Databases• Named after Google’s ‘BigTable’ implementation• Each row can have different set of columns• A row can have thousands of columns• Records can have multiple fields• Records are indexed by [row-key, column-key, timestamp]• Usually sharded• Advantages: Highly optimized for write operations, highly

scalable, (quoted) extremely even performance• Examples: Google Analytics, Google Docs, Microsoft Azure

Tables• Potential Issues: Lack of text search, very difficult to import

and export data – query times out after 30 sec

16/22

Page 18: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Document Databases

• Completely schemaless• All document data is stored in the document itself• Document usually encoded in JSON, BSON, XML• Scalability: good, implementing asynchronous

replication• Advantages: client application can store data in its final

form; support custom views• Examples: Couch DB, MongoDB, Terrastore• Best For: wikis, blogs, document management systems• Potential Issues: They actually don’t outperform

RDBMS, not well supported

17/22

Page 19: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Columnar Databases• ‘Between’ SQL and NoSQL – can use SQL syntax, but use

wide columns• Each columns stored separately on different disk location• Scalability and Performance: both good because rows and

columns can be split across multiple nodes: rows – sharding, columns – column groups

• Advantages – great when you need data aggregation• Examples: Vertica, HBase• Best At: Data warehousing, data mining • Potential Issues: Not great at handling complex

relationship, better than RDBS only when row size is big and not many columns of a single row are required

18/22

Page 20: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Example: Amazon SimpleDB Data Store Type: Entity-Attribute-Value Data Model: Document Store/Big Table Cloud Type: Platform as a Service

• The data model based on domains, items, attributes and values:• Domains are currently limited to 10 GB each, and each account is limited

to 100 domains• Domains are collections of items that are described by attribute-value

pairs• Doesn’t have the concept of schema – everything is a string• Designed for reads rather than writes• Updates done to central database ONLY and distributed to ‘slaves’• Client interface: SOAP and REST• Availability: multiple geographically distributed copies of each data item• Scalability: Great• Pay as you do model: Clients are charged by data storage, data transfer and

machine utilization• Potential Issues: eventual consistency, no data types or constraints

19/22

Page 21: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Summary – RDBMS or NoSQL?

• if you have a low-volume, medium-complexity suite of applications, don’t change it – this is what the RDBMS are good at

• if your data is normalized and using joins – don’t move to the schemaless NoSQL

• if you’re looking for an off-the-shelf system and don’t want to get involved in a customized development – choose RDBMS

• if you problem can’t be resolved using RDBMS [e.g. you have serious scalability issues] and you’re determined to fix it at any cost – go ‘NoSQL’

• if you have access to sufficient quantities of sufficiently smart people - choose NoSQL.

It depends…

20/22

Page 22: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Summary – RDBMS or NoSQL?

‘choose a right tool for the job’

21/22

Page 23: Analysis of Cloud Data Management Systems Student:Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

Questions?

22/22