no sql introduction_v1.1.1
DESCRIPTION
NoSQL technical introduction NoSQL技术简介TRANSCRIPT
Technical overview of cloud storage
NoSQL
Not
OnlySQL
声明:1.本文只
用于个人学习
和交流,如有
错漏,欢迎交
流
2.大部分内容是
我在诺西工作
期间
完成,但不涉
及任何诺西产
品和技
术,在此表示
感谢
3.本文中有很多
内容来源于互
联
网,如有侵犯
任何版权,请
通知我:
[email protected]; @胖悟空
Background What’s NoSQL Why NoSQL
How to make a selection of NoSQL
Data type Data model Architecture Key technologies
Summary
Agenda
What is NoSQL
Definition NoSQL ,sometimes expanded to "not only SQL“. It is a broad class of
database management systems that differ from classic relational database management systems(RDBMSs).
These data stores may NOT require FIXED table schemas, usually avoid join operations, and typically scale horizontally.
Academia typically refers to these databases as structured storage, a term that would include classic relational databases as a subset.
Refer to Wiki page: http://en.wikipedia.org/wiki/NoSQL
SQL
NotOnly SQL
SQL Vs. NoSQLSQL
NoSQL
NoSQL is not
good at
everything,
neither is SQL.
Transactional semantics
ACID
Restricted ACID
Complex & Functionality
Simple & App Oriented
Relational& Row storage
Key-Value, Column Oriented, Document Oriented &graph
Fixed
Schema Free/Schema less
Limited & Costly
Horizontal Scalability & Massive
Reliable & Expensive
Commodity & Inexpensive
Query Model
Data Model
Schemas
Data Storage
Failure tolerance
Hardware
failure recovery slowNative & fast recovery
Why is NoSQL?
Notable with internet
players and apps, as
some of their
requirements could
not be met by
RDBMS.
Come From RequirementFast Increasing & Development
Increasing number of servers
Scale out Inexpensive & unreliable
servers Increasing data volume
Big Data Scalability
Increasing user number High throughputs High workload
All about INCREASING
Rapid change Always beta Flexible data schema
Abundant web applications Complex data Larger record size Typically read more and write less Low transaction and consistency requirements
Online services Failure tolerance Fast recovery High availability
Come From RequirementDifferent application & Ecosystem
How to select a NoSQL system?
memcachedb
What kinds of data can I store with?
Data type Classification• Structured• Unstructured• Semi-structured
Data type Classification What kind of data should be stored
?
Unstructured data• Does not have a pre-defined data model • And/or does not fit well into relational
tables
Structured data• The entities belongs to the same class
should have same attributes and attributes order
• The data structure should be predefined and couldn’t changed
Semi-structured data• Is a form of structured data • The entities belongs to the same class
may have different attributes• Contains tags or other markers to separate
semantic elements and enforce hierarchies of records and fields within the data
• the entities belongs to the same class may have different attributes even though they are grouped together, and the attributes order is not important.
• Is also known as schema less or self-describing structure.
Dynamo Voldemort
Tokyo cabinet Redis
Berkeley DB Memcache DB
My SQL Oracle
Mongo
CassandraHBase
Couch
Hyper TableBigTable
Query
Store
STRUCTURED, e.g. CRM,ERP
SEMI-STRUCTURED, e.g.Logs, mails, web pages,Blogs
UNSTRUCTURED, e.g. Documents, Videos, Audios, Images
Summary
Flexible Record
size Efficiency Transactio
nal Scalability
Flexible
Record size
EfficiencyTransactional
Scalability
UnstructuredStructuredSemi-structured
How can I express my business model?
Data model Classification• Key-Value pair based• Column Oriented store• Document Oriented store• Graph database
Key-Value pair basedSimple read and write data item is uniquely identified by a key
Key-value stores allow the application to store its data in a schema-less way. The data could be stored in a data type of a programming language or an object.
A key indicates a unique Value Anything can be stored in a value, image, document, even a
complex data structure( array, list …)
Advantages• Efficiency• Easy to use• Flexible data storage
Disadvantages• Simple query model
Many cloud based databases can be classified to Key-Value store, such as most of column oriented databases.
Notes
High-performance, scalable, distributed Graph Database
Graph database with query language called GraphQL
Column Oriented storeA Simple :Column store Vs. Row store
Neo4j JavaHigh-performance, scalable, distributed Graph Database
OrientDB Java
Sones GraphDB
C#Graph database with query language called GraphQL
Name
Neo4j
OrientDB
FlockDB
Sones GraphDB
Language
Java
Java
Scala
C#Null is free
FlockDB Scala
Empty cells are stored
NameLanguag
eNotes
Neo4j JavaHigh-performance, scalable, distributed Graph Database
OrientDB Java
FlockDB Scala
Sones GraphDB
C#Graph database with query language called GraphQL
Query 1
Query 2
Queries
Versioned
t3
Column Oriented storeBigTable data model
“<HTML>…” “CNN” “CNN.COM”
Content Anchor
Anchor: cnnsi.com Anchor: my.look.ca
Column Families
Content:
“com.cnn.www”
Row Key
t5
t6 t8
t7
“com.cnn.www/index.htm”
Cell contents( , , )
t9
Row ColumnTimestampSorted RowKey, Storing Storing pages fromthe same domain near each other
Column Oriented storeOne to Many relationship
Row Key Content
com.cnn.www <HTML>…
… …
Row Key Anchor Reference text
com.cnn.www cnnsi.com CNN
com.cnn.www my.look.ca CNN.COM
com.cnn.www … …
1 0…n
Row Key content anchor
content: anchor:cnnsi.com anchor:my.look.cn anchor:…
com.cnn.www <HTML>… CNN CNN.COM …
RDMS model
BigTable model
Vertical Extension
Horizontals Extension
JOIN
Stores content by column rather than by row.
A key identifies a row, which contains data stored in one or more Column Families(CF)
Within a CF, each row can contain multiple columns
Columns can be added dynamically Distributed multi-dimensional sparse map
(row, column, timestamp) → cell contents
Column Oriented storeBigTable liked data model
•Advantages– Versioned– Query oriented– Good for OLAP Applications– Null is free– Compression efficient – Dynamic Columns
•Disadvantages– Read entire row is not
efficient– Contains tags or other
markers to separate semantic elements
– Not well-suited for OLTP-like workloads
– Simple query model
The idea is to replace the concept of a “row” with a more flexible model
The “document.” By allowing embedded documents and arrays
the document-oriented approach makes it possible to represent complex hierarchical relationships with a single record.
Documents have some similar information and some different
Usually store documents in a JSON or JSON-like format
Document Oriented store
•Advantages– Rich RDBMS-like functions– Freedom in modeling
documents•Disadvantages
– Query logic complex.– Documents are limited in size
Document Oriented storeExamples
Row Key Content
com.cnn.www <HTML>…
… …
Row Key Anchor Reference text
com.cnn.www cnnsi.com CNN
com.cnn.www my.look.ca CNN.COM
com.cnn.www … …
1 0…n
Document 1{
“Rowkey” : “com.cnn.www”, “content”: “<HTML>…”, “Anchor”: {
“cnnsi.com”:”CNN”,“my.look.ca”:”
CNN.COM”}
}
//rowkey == " com.cnn.www "
find({" Rowkey" : " com.cnn.www "})// 20<age <30
find({"age" : {"$lt" : 30, "$gt" : 20}}) // id_num % 5 ==1
find({"id_num" : {"$mod" : [5, 1]}})// id_num % 5 !=1
find({"id_num" : {"$not" : {"$mod" : [5, 1]}}})// regular expression :name == joe and case insensitive
find({"name" : /joe/i})
TBD
Graph database
Key-Value Column oriented Document oriented
Graph
Schema Schema less Dynamic columns Complex and hierarchical data model, JSON-like format
Graph
Query model Key-value pair Key-value Affluent and complex
Data type Unstructured Semi-structured Semi-structured
Advantage Efficiency, Easy Query oriented, null is free
Functionality and Freedom in modeling
Disadvantage Sample Simple query model Complex
Systems
Summary
How can I deploy and administrate the system?
Data model Classification
• Key-Value pair based
• Column Oriented store
• Document Oriented store
• Graph databaseArchitecture Classification
• Master-Slave architecture
• P2P architecture
• Hierarchy architecture
Region Server
Region Server
Region Server
Master-Slave architectureAn example: HBase Architecture
Zookeeper
HDFS
Control flaw
Data flaw
HMaster
and many Slaves• One Master• Master manages meta data
• Slaves, Slaves report status to the master and take over the real data management
in charge of all slaves, dispatch tasks do load balance and so on
• Usually with Data flow and Control flow detach• Typically with global storage system(e.g. DFS) for data durability and fast recovery• Especially some with a distributed coordination mechanism to do master election, maintain configuration, failure detection and synchronization
Master-Slave architecture
Is a model of communication where one device or process has unidirectional control over one or more other devices. In some systems a master is elected from a group of eligible devices, with the other devices acting in the role of slaves.
•Advantages– Clear Architect– Easy to provide Strong
Consistency– Easy for Management– Easy for scalability
•Disadvantages– Single Point Failure risk– Hotspot problems
P2P ArchitectureAn example: Cassandra
4
8
26
3
1
5
7
Client
• Peers are equally privileged
• Node replica as a factor
• Gossip protocol for failure detection and maintaining cluster (node in/out)• Every member act as a proxy
for one hop routing
35
7
P2P architecture
Computing or networking is a distributed application architecture
Peers are equally privileged, equipotent participants in the application.
Peers make a portion of their resources, such as processing power, disk storage or network bandwidth, directly available to other network participants, without the need for central coordination by servers or stable hosts.
Usually used in conjunction with the consistent hash
•Advantages– High availability– Efficient for Random Read/write– Nature data distribute– Usually One-hop lookup– Minimal Administration
•Disadvantages– Weak of global status– More network communications
to maintain cluster(log(n))
Hierarchy architectureAn example: mongodb Architecture
Mongodprimary
Mongodsecondary
MongodArbiter
Config server1
Config server2
Config server3
mongos mongos
client client client
…
…
• Clients send queries to mongos servers
• Mongoses act as routing servers, queries are automatically routed to the appropriate shard • Each shard consists of multiple replicated servers per shard to ensure availability and automated failover. The set of servers within the shard comprise a replica set.
shard1
Mongodprimary
Mongodsecondary
MongodArbiter
Mongodprimary
Mongodsecondary
MongodArbiter
shard2 shard3
Replica setReplica setReplica set
• The config servers store the cluster's metadata, each config server has a complete copy of all metadata, and if meta data is changed, it will sent to Mongos for update routing information.
Hierarchy architectureAn example: mongo db Architecture(2)
Data storage
Meta data storage
Routing server
client
Distinct hierarchy dependency
Routing servers is scalable and store nothing
….
Data storage
client
Routing server
Routing server
Routing server
Routing serverscan be deployed up to client/APP,or down to data storage
Meta data storage
Meta data storage
Meta data storage
Meta data storage is not a single point,two phase submitis used, and the responsibilities of meta data servers decrease
Mongodprimary
Mongodsecondary
MongodArbiter
Data storage layeris grouped into replica sets, not onlyact as data serving also as data and service availabilitymechanism
Hierarchy architecture
Distinct hierarchy dependency
Especially with a routing layer
Less responsibility of client No clear data flow and
control flow
•Advantages– High availability– No single point failure– Each layer scalable alone– Flexible routing layer
•Disadvantages– Lower efficiency – Complex administrate
Summary
Availability Scalability Efficiency Concise Administrati
ve
Availability
Scalability
EfficiencyFunctionality
Administrative
Master-SlaveP2PHierarchy
SummaryFailover
Master-slave architecture Master fails -> Master election Slaves fails -> Reassign by Master
P2P Architecture Replica factor Hinted Handoff
Hierarchy Architecture Master election & Hinted Handoff Multi-routing process
What about the performance with the system?What about the key features of the system?
Key features Classification
• CAP classification
• Consistency mechanism
• Availability mechanism
• Partitioning & scalability
mechanism
• Data Durability mechanism
CAP Classification
• Consistency ,means all nodes see the same data at the same time•Availability ,a guarantee that every request receives a response about whether it was successful or failed•Partition tolerance ,the system continues to operate despite arbitrary message loss
All about RedundancyWhat’s the problems come from?
Redundancy is anywhere in distributed systems, especially with Commodity hardware
Consistency Availability Partitioning Reliability Concurrency Throughputs …
Service
ServiceService
Request Request Request
Data storage Data storage Data storage
Consistency mechanism
Two phase submit Strong consistency
Master-slave Eventual consistency Strong consistency
Quorum Eventual consistency Strong consistency
Paxos Strong consistency
• Consistency is opposite with Performance and Availability
Master-Slave architecture systems (such as HBase, BigTable) adopted lower availability and strong consistency
Hierarchy & P2P systems choose to do strong consistency at the expense of decreasing reading performance
Two-phase commitAn example: GFS lease implementation
• The commit-request phase : client push all data to replicas(step3), and send submit request to primary replica (step4)
• The commit phase: Primary replica request replica A and replica B to submit the data(step 5), replica A & replica B response “yes”(step 6), the submit is successful(step 7).
Master-slaveAn Example: MongoDB replica sets
MasterReplica
Replica
Write Read
Sync
Sync
Read only
Read only
MasterReplica
Replica
Write Read
Sync
Sync
• Master can be read and write• Replicas/slaves are read only
Eventually Consistency ButPerformance and Availability higher
• Only Master can be read and write• Replicas/slaves only for backup
Strong Consistency
Quorum
• Configurable consistency
• Usually with anti-entropy using Merkle trees for replica synchronization and Read Repair for Keep consistency
• (N, R, W) Tradeoff between consistency and performance– Typical configuration: R(2) + W(2) > N(3), – R + W > N yields a quorum-like system, ensure an application can always read the newest data
N: number of replicasR: minimum number of successful readW: minimum number of successful write
Quorum An example: Cassandra Read repair
Query
Closest replica
Cassandra Cluster
Replica A
Result
Replica B Replica C
Digest QueryDigest Response Digest Response
Result
Client
Read repair if digests differ
Routing mechanism Typically used in hierarchy architecture See MongoDB mongos implementation, hide the back end server changing
Failure detection Distributed coordination.
Usually used in master-slave architecture, such as zookeeper in Hbase and chubby in BigTable
Gossip protocol Usually used in P2P architecture, e.g. Dynamo & Cassandra
Master election Hinted handoff
Availability mechanism
Mongoddown
Is Used for failover When a cluster consist of a
group of n and one of them act as master/primary node.
If the node fails, the cluster will elect a new master/primary node.
Availability mechanismMaster election
Mongodprimary
Mongodsecondary
MongodArbiter
MongoDB replica set
• Each node can be primary• Secondary nodes can only act as arbiter or data nodes and arbiter
Mongodprimary
MongodrecoveringMongod
secondary
NegotiateNew master
HBase Master election
Zookeeper
HMaster
Secondary HMaster
Region Server
• Zookeeper act as a Arbiter, and keep a “token” for Hbase master, The node which get the “token” will act as master.
• If HMaster fails, the “token” that it toke form zookeeper will be released , the secondary HMaster will act as Hmaster
• Then, Zookeeper will send the change to Every nodes in the cluster
×
Writes are performed on the first N healthy nodes found by the coordinator.
If a node is down, data will be sent to the next node in the ring.
This node will keep track of the intended recipient and send later.
Replicas are stored at multiple data centers for handing the failure of the whole data center
Hinted HandoffFor temporary failure
A
B
C
DE
F
G
Hash(k)
• So called always writeable in Cassandra
Data partitioning & Scalability mechanismHierarchically structure
Multi-levels hierarchy organization 3 levels in BigTable, HBase and Hypertable(root->meta->user) 2 levels in mongo DB(meta->user)
Key range split/auto sharding for data partitioning•Advantages
– Automatic balancing for changes in data distribution
– High performance in range query
– Nearly unlimited data storage•Disadvantages
– Sequence write not efficient
Scalability mechanismConsistent hash
45
•Advantages– Nature balancing for data
partitioning &distribution – High performance in
random operations•Disadvantages
– Non-uniform data/load distribution
– Disregard of the heterogeneity of node performance
– Moving data when nod in/out
– Not good for sequence operations and range query
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)
Data Durability mechanism
Write ahead log Is a family of techniques for providing atomicity and durability (two of the ACID properties)
in database systems. In a system using WAL, all modifications are written to a log before they are applied.
Usually both redo and undo information is stored in the log.
Data replica DFS (Hbase, hypertable,bigtable) Embedded Redundancy(cassandra, mongo DB)
Data Durability mechanismAn example: HBase WAL
• Log Flushing Data streams written to a file system• Log Rolling Back check database persistence and the logs, then remove all the logs before last database persistence operations.• Log Replaying Replaying a log is simply done by reading a log and adding its entries to the database and then flush the data to disks. It can be used for fault recovery
Summary
Consistency Avalaibility Data Partitioning Data Durability Scalability failover
Two phase submit Routing mechanism Table split/auto sharding DFSHierarchically structure Reassign
Master-slave Failure detection consistent Hash Data Redundancy Consistent Hash Master election
Quorum Master electionMulti-routing process
Hinted handoff Hinted handoff
replica set/group replica factor