sql, nosql, and next generation dbmssdexa.org › previous › dexa2015 › sites › default ›...
TRANSCRIPT
SQL, NoSQL, and Next Generation DBMSs
Shahram Ghandeharizadeh
Director of the USC Database Lab
Outline
A brief history of DBMSs.
1960/70 1980+
OSs SQL
2000+
NoSQL
Before Computers
Database
DBMS/Data Store
Digital Era
Database
File System/
Data Store
0011101000000101110101
Application
programs
Before DBMSs: 1960/70s
Data
Data
Application
programs
Developer 1
Developer 2
Application
programs
After DBMSs
Application
programs
Developer 1
Developer 2
DBMS
Physical Data Independence.
SQL as a “what”-oriented language.
SQL Data Stores
Manage records/tuples
A record/tuple is a row in a table where attribute names are pre-defined in a schema.
Alternative physical designs:
Column-store versus Row-store.
Transactions with ACID properties
SQL IS OVERHYPED
Why?
Marketing campaigns have become too exaggerated!
Relational vendors claim RDBMS is the answer to all data management needs.
What are some counter examples?
Seltzer. Beyond Relational Databases. Communications of the ACM, July 2008.
Web Search
Semi-structured data
HTML pages instead of raw data.
Queries are keyword lookups and the desired response is a sorted list of possible answers.
Need for efficient inverted indices.
Bulk updates, read mostly.
Need for nontraditional indexing.
Directory Services International organizations with distributed
resources and personnel. Requirement: fast lookup of entities arranged in
a hierarchical structure that corresponds to a hierarchy of the organization.
LDAP standard. Core of identification and authentication system
from a number of vendors, e.g., IBM Tivoli, Microsoft Active Directory Server, SUN ONE Directory Server.
Bulk updates similar to data warehousing.
Multi-valued attributes.
Queries are single-row retrieval or lookups based on attribute values.
Other Examples
Mobile device caching
Your cell phone’s directory as a transient cache of a global directory.
Stream management
Real-time filtering of streams for interesting patterns. Example: identify hotly traded stock, or a stock that is not traded as heavily as expected.
Filters look like SQL selection predicates, causing developers to mistake a RDBMS as the right choice.
XML management
Summary Relational DBMS have been designed for transaction
processing and workloads consisting of ad hoc queries and significant amount of updates. 25 years ago, One market for DBMS: Business data
processing. This has changed to include different applications with different requirements.
Example applications are read-dominated: No need for transactional guarantees.
SQL is the wrong choice for stream processing.
One software architecture will not support the diverse needs of these applications. Possible solutions: 1) each application re-builds its own storage manager from
scratch,
2) provide a flexible solution that can be tailored to the needs of a particular application.
Past 25 Years
Two trends:
1. Bloated systems.
Need for a specialist, a trained DBA, to keep a system and its applications running.
2. Few applications need all the features available in today’s RDBMSs.
The application must pay for all the features even though it requires a small subset.
NOSQL DATA STORES
NoSQL Data Stores
Scale horizontally for “simple operations” using many servers.
Replicate and distribute (partition) data across many servers.
Provide a simple call level interface or protocol.
A weaker concurrency model than ACID:
Basically Available, Soft state, Eventually consistent (BASE).
Efficient use of distributed indexes and DRAM for data storage.
Ability to dynamically add new attributes to data records.
Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Record 39(4), 2010.
Ghandeharizadeh, Boghrati, and Barahmand. An Evaluation of Graph Data Models. TPCTC 2014.
NoSQL Data Model A “key-value” store:
A distributed hash table,
A key/value may be an arbitrary sequence of bytes,
E.g., memcached, Voldemort, Riak, Redis, Tokyo Cabinet, Membase, Membrain.
A “document” store:
A value may be a scalar, lists, nested documents,
Attribute names might be dynamically defined at runtime,
E.g., SimpleDB, CouchDB, MongoDB, Terrastore.
An “Extensible record” store:
A hybrid between a SQL store and a document store,
Families of attributes are defined in a schema and new attributes can be added,
Attributes may be list-valued,
E.g., BigTable, HBase, HyperTable, Cassandra, PNUTs.
MIDDLEWARE: CACHE AUGMENTED DATA STORES
Simple Operations Operations that read and write a small amount of
data.
Challenge: High volume of requests with a low latency requirement.
Person-to-person service providers in 1 Minute:
147K page views
100M queries 7K user visits
347K Tweets
Facebook, http://thenextweb.com/facebook/2014/10/28/facebook-1-35-billion-users/
Google, http://expandedramblings.com/index.php/google-plus-statistics/
Twitter, https://about.twitter.com/company
Wikipedia, http://stats.wikimedia.org/EN/Sitemap.htm
How?
Look up query result instead of query processing.
Ideal for applications with workloads that exhibit a high read to write ratio.
Key-value store as the cache manager.
Query result caching:
Key: query string, Value: result set
Trillions of cached key-value pairs.
Cache Augmented DBMSs
1. Value = Get (Key)
2. If Value is found, go to Step 6.
3. SQL queries
4. Query results Application
constructs Value using the results
5. Put(Key, Value)
6. Use Value to generate HTML result page
RDBMS
Server
Cache
Server
(KVS,
e.g., memcached)
1 23
54
CADBMS: Update
1. SQL DML Command: Insert, Delete, Update
2. Invalidate key-value pairs: Delete
Alternatives to invalidate include Refill/Refresh and incremental update
RDBMS
Server
21
Cache
Server
(KVS,
e.g., memcached)
Developer 1
Developer 2Data
Store
memcached
Cache
Server
Application
programs
Persistent
Data
In-memory
Copy of
Data
Application
programs
Stale
CADBMS Today
Physical Data Independence.
A “what”-oriented language.
Future CADBMSs
Application
programs
Application
programs
CADBMS
Data
Store
Key Value
Cache Server
Developer 1
Developer 2
Physical Data Independence.
SQL as a “what”-oriented language.
KOSAR
Application
programs
Application
programs
KOSAR
RDBMS
Key Value
Cache Server
Developer 1
Developer 2
Ghandeharizadeh et. al. A Demonstration of KOSAR. Middleware 2014.
Architecture A database driven application:
Data Store Server
…
Data Store Client
Application
Architecture: Example An RDBMS driven application authored
using Java:
MySQL Server
…
JDBC
Application
SQL Result Set
KOSAR: Transparent Caching
Simply replace the client component of your application with KOSAR and see it run much faster.
Data Store Server
…
Data Store Client
Application
Ghandeharizadeh, Yap, and Nguyen. Strong Consistency in Cache Augmented SQL Systems. Middleware 2014.
Ghandeharizadeh, Irani, Lam, Yap. CAMP: A Multi-Queue Eviction Policy for Key-Value Stores. Middleware 2014.
How?1. Lookup query result instead of query
processing.
Data Store Server
…
Data Store Client
Application
memcached Servers
…
Ideal for workloads that exhibit a high read to write ratio.
Client-Server Architecture
0
2000
4000
6000
8000
10000
12000SoAR (Actions/Second)
SLA: 95% of actions to observe a response time faster than 100 msec.
0.1% Write 10% Write
SQL-X SQL-X
CADBMSCADBMS
Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR 2013.
Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM 2013.
BG Benchmark, http://bgbenchmark.org
BG is a macro benchmark for interactive social networking actions.
BG quantifies the Social Action Rating (SoAR) of a data store:
For a given workload, the maximum number of simultaneous actions performed by a data store while satisfying a pre-specified SLA.
Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR 2013.
Barahmand and Ghandeharizadeh. D-Zipfian: A Decentralized Implementation of Zipfian. SIGMOD DBTest 2013.
Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM 2013.
Alabdulkarim, Barahmand and Ghandeharizadeh. A Scalable Benchmark for Interactive Social Networking Actions.
Ph.D. Fellowship
Client-Server Architecture
0
2000
4000
6000
8000
10000
12000SoAR (Actions/Second)
SLA: 95% of actions to observe a response time faster than 100 msec.
0.1% Write 10% Write
SQL-X SQL-X
CADBMSCADBMS
Shared Address Space1. Avoid overhead of serialization and
network communication
Data Store Server
…
Data Store Client
Application
Shared Address Space
0
20000
40000
60000
80000
100000
120000
140000
0.1% Write
SoAR (Actions/Second)
SLA: 95% of actions to observe a response time faster than 100 msec.
10% Write
CADBMS
CADBMS
SQL-X SQL-X
Shared Address Space
0
20000
40000
60000
80000
100000
120000
140000
0.1% Write
SoAR (Actions/Second)
SLA: 95% of actions to observe a response time faster than 100 msec.
10% Write
CADBMS
CADBMS
SQL-X SQL-X
Why?1. CPU overhead of query processing is
more than 85% [1, 2].
Data Store Server
…
Data Store Client
Application
Cache Servers
…
Harizopoulos et. al. OLTP: Through the Looking Glass and What We Found There. SIGMOD 2008.
Stonebraker and Cattell. 10 Rules for Scalable Performance in Simple Operation Datastores. CACM 2011.
Architectures Client-Server, Shared-Address Space,
and Hybrids.
Client-Server Shared-Address Space
Ghandeharizadeh, and Yap. Cache Augmented Data Stores. SIGMOD DBSocial 2013.
NON VOLATILE MEMORY
Non Volatile Memory
Flash
DRAM HDD
CPU
DRAM HDD
CPU Flash
DRAM HDD
CPUNVM
Traditional
2010
2017(late 2016)
Flash
DRAM
CPU
Non-Volatile Memory
Byte-addressable
Time to rewrite the key-value stores & database engine!
Configurable:
Time to re-design algorithms
Emulated
HDD
NVM
Emulated
Flash
CPU
Emulated
HDD
DRAM
NVM
Emulated
Flash
Emulated
DRAM
CPU
Digital Era
Database
File System/
Data Store
0011101000000101110101
Future (Biological) Computers
Database DBMS/Data Store