2. models - indiana university...

26
1-1 B561 Advanced Database Concepts Qin Zhang §2. Models

Upload: dodung

Post on 31-Jan-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

1-1

B561 Advanced Database

Concepts

Qin Zhang

§2. Models

Page 2: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

2-1

History

1. Hierarchical (IMS): late 1960s and 1970s2. Network (CODASYL): 1970s3. Relational: 1970s and early 1980s4. Entity-Relationship: 1970s5. Extended Relational: 1980s6. Semantic: late 1970s and 1980s7. Object-oriented: late 1980s and early 1990s8. Object-relational: late 1980s and early 1990s9. XML

10. ...

Models

Read “What Goes Around Comes Around” by MichaelStonebraker and Joseph M. Hellerstein

Page 3: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

3-1

Different Types of Data

Structured dataAll data conforms to a schema. Ex: business data

Semistructured dataSome structure in the data but implicit and irregularEx: resume, ads

Unstructured dataNo structure in data. Ex: text, sound, video, images

Page 4: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

3-2

Different Types of Data

Structured dataAll data conforms to a schema. Ex: business data

Semistructured dataSome structure in the data but implicit and irregularEx: resume, ads

Unstructured dataNo structure in data. Ex: text, sound, video, images

Focus of tranditional databases

Page 5: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

4-1

Model 1: IMS

Hierarchical data model: record types organized in a tree

Supplier(sno, sname,scity, sstate)

Part (pno,pname, psize,pcolor)

Part (pno,pname, psize,pcolor, qty, price)

Supplier (sno,sname, scity,sstate, qty, price)

– Record types must be arranged in a tree– Infomation is repeated– Existence depends on parents– Work with DL/1, a record-at-a-time language

(is this an inherent problem?)

Issues

Page 6: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

5-1

Model 2: CODASYL

Networked data model: record types organized in a network

Supplier(sno, sname,scity, sstate)

Part (pno,pname, psize,pcolor)

Supply(qty,price)

– Very complex; Programs must navigate the hyperspace– Load and recover as one gigantic object(the whole network)– Work with a record-at-a-time language

Issues

Supplies Supplied by

Page 7: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

5-2

Model 2: CODASYL

Networked data model: record types organized in a network

Supplier(sno, sname,scity, sstate)

Part (pno,pname, psize,pcolor)

Supply(qty,price)

– Very complex; Programs must navigate the hyperspace– Load and recover as one gigantic object(the whole network)– Work with a record-at-a-time language

Issues

Any advantages?

Supplies Supplied by

Page 8: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

5-3

Model 2: CODASYL

Networked data model: record types organized in a network

Supplier(sno, sname,scity, sstate)

Part (pno,pname, psize,pcolor)

Supply(qty,price)

– Very complex; Programs must navigate the hyperspace– Load and recover as one gigantic object(the whole network)– Work with a record-at-a-time language

Issues

Any advantages?

Efficiency!

Supplies Supplied by

Page 9: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

6-1

Model 3: Relational

You have already seen

– No specification of what storage looks like(In my personal opinion: structure is encoded in thetables / implicit links)– Use set-at-a-time language: algebra or calculus

Any disadvantages?

Page 10: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

7-1

Great Debate

Pro relational– CODASYL is too complex– Trees/networks not flexible to represent common cases– Record-at-a-time languages are too hard to optimize

Page 11: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

7-2

Great Debate

Pro relational– CODASYL is too complex– Trees/networks not flexible to represent common cases– Record-at-a-time languages are too hard to optimize

Against relational– COBOL programmers cannot understand relationallanguages– Impossible to implement the relational model efficiently– CODASYL can represent tables

Page 12: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

7-3

Great Debate

Pro relational– CODASYL is too complex– Trees/networks not flexible to represent common cases– Record-at-a-time languages are too hard to optimize

Against relational– COBOL programmers cannot understand relationallanguages– Impossible to implement the relational model efficiently– CODASYL can represent tables

Ultimately settled by the market place

Page 13: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

8-1

Other models and NoSQL

Next we will discuss (and compare then with relational):

1. E/R Model

2. XML

Page 14: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

8-2

Other models and NoSQL

Next we will discuss (and compare then with relational):

1. E/R Model

2. XML

NoSQL models.

Including key-value stores, document stores, graph DBsystems and column store

Page 15: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

8-3

Other models and NoSQL

Next we will discuss (and compare then with relational):

1. E/R Model

2. XML

NoSQL models.

Including key-value stores, document stores, graph DBsystems and column store

Big Data Models

1. I/O Model (SQL)

2. Data Stream Model (NoSQL)

3. MapReduce and ActiveDHT (NoSQL)

(storage andcomputational issues)

Page 16: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

9-1

Why NoSQL?

My personal opinions:challenage in Big Data:data types, storage andcomputational issues

Page 17: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

10-1

E/R Model and XML(see separate slides)

Page 18: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

11-1

NoSQL by Prof. Jennifer Widom:https://www.youtube.com/watch?v=

3pS1MF9mYJE&list=PLB142989A709CF0B8

https://www.youtube.com/watch?v=e5VLYZ_

0tjg&list=PLB142989A709CF0B8&index=2

(Not in class)

Column Oriented Databasehttps://www.youtube.com/watch?v=mRvkikVuojU

Page 19: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

12-1

Key-Value Stores

Extremely simple interface

– Data model: (key, value) pairs

– Operations: Insert(key,value), Fetch(key),

Update(key), Delete(key)

– Some allow Fetch on range of keys

Example systems

– Google BigTable, Amazon Dynamo, Cassandra,Voldemort, HBase,

Looks like a dictionary :)

Page 20: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

13-1

Document Stores

Like Key-Value Stores except value is document

– Data model: (key, document) pairs

– Document: JSON, XML, other semistructured formats

– Operations: Insert(key,document), Fetch(key),

Update(key), Delete(key)

– Also Fetch based on document contents

Example systems

– CouchDB, MongoDB, SimpleDB, ...

Page 21: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

14-1

Graph Database Systems

Data organized as a graph

– Data model: nodes and edges

– Nodes may have properties (including ID)

– Edges may have labels or roles

Reminds you CODASYL?

Page 22: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

15-1

A Quick Summarization of NoSQL

Target on specific tasks, use specific algorithmsinstead of SQL queries

Relax various requirements such as “consistency”

Not necessary to give the exact answers; Goodapproximations are often enough.

Page 23: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

16-1

Now let’s talk about efficiency

Page 24: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

17-1

§2.1 Input-Output Model(see separate slides)

Page 25: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

18-1

§2.2 Data Stream Model(see separate slides)

Page 26: 2. Models - Indiana University Bloomingtonhomes.soic.indiana.edu/qzhangcs/B561-15-fall-advanceDB/Models.pdf · B561 Advanced Database Concepts Qin Zhang x2. Models. 2-1 History 1.Hierarchical

19-1

§2.3 MapReduce and ActiveDHT(see separate slides)