the “online” edition *** lecture #11 (relational languages i) · 2020. 4. 22. · database...

18
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition *** Introduction to Data Management Lecture #11 (Relational Languages I) Instructor: Mike Carey [email protected] SQL

Upload: others

Post on 25-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

*** The “Online” Edition ***

Introduction to Data Management

Lecture #11(Relational Languages I)

Instructor: Mike Carey [email protected]

SQL

Page 2: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

Today’s Notices

v HW#3 is under way!§ Normalization (in part for ShopALot.com)§ Due at 11 PM on Friday w/usual 24-hr late window

v Web resource(s) for FDs and normalization:§ http://www.ict.griffith.edu.au/~jw/normalization/ind.php#1NFTo3NF§ http://www.ict.griffith.edu.au/~jw/normalization/ind.php#resources

v Today’s lecture plan:§ Relational languages – the next frontier…

§ Note: Today’s material won’t be on Midterm 1! (J)§ Please weigh in on the exam geo-poll on Piazza!

Page 3: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3

Quick Time Check!

Page 4: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

First: A Word on Learning...v ”When *I*was your age…” (J)

§ 8-bit !processors, LEDs, hex keypad, 16-bit for its address space, …

§ FORTRAN, Pascal, C, Lisp, Snobol, APL, Quel …§ Unix first appeared, Unix/INGRES DB 64K process design

point, and 1MB of memory was amazing!§ No web! (Just the early ArpaNet)

v Fast forward 35 years…§ Your cell phones dwarf our ~1980 computing platforms§ Python, Java, Go, C++, Ruby, SQL, SQL++, …§ Twitter… (K)§ WHAT THIS MEANS: It’s critical to learn how to read and

how to learn, how to search for info/resources, etc…!

Z(1) = Y + W(1)

Page 5: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5

First: A Word on Learning...v ”When *I*was your age…” (J)

§ 8-bit !processors, LEDs, hex keypad, 16-bit for its address space, …

§ FORTRAN, Pascal, C, Lisp, Snobol, APL, Quel …§ Unix first appeared, Unix/INGRES DB 64K process design

point, and 1MB of memory was amazing!§ No web! (Just the early ArpaNet)

v Fast forward 35 years…§ Your cell phones dwarf our ~1980 computing platforms§ Python, Java, Go, C++, Ruby, SQL, SQL++, …§ Twitter… (K)§ WHAT THIS MEANS: It’s critical to learn how to read and

how to learn, how to search for info/resources, etc…!

Page 6: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

On to Relational Query Languages!

v Query languages: Allow manipulation and retrievalof data from a database.

v Relational model supports simple, powerful QLs:§ Strong formal foundation based on logic.§ Allows for much optimization.

v Query Languages ≠ programming languages!§ QLs not expected to be �Turing complete.�§ QLs not intended to be used for complex calculations.§ QLs support easy, efficient access to large data sets.

Page 7: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7

Formal Relational Query Languages

v Two mathematical Query Languages form the basis for �real� languages (e.g., SQL), and for their implementation:§ Relational Algebra: More operational, very useful

for representing execution plans.§ Relational Calculus: Lets users describe what they

want, rather than how to compute it. (Non-operational, or declarative.)

Page 8: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

Preliminaries

v A query is applied to relation instances, and the result of a query is also a relation instance.§ Schemas of input relations for a query are fixed (but

query will run regardless of instance!)§ The schema for the result of a given query is also

fixed! Determined by applying the definitions of the query language’s constructs.

v Positional vs. named-field notation: § Positional notation easier for formal definitions, but

named-field notation far more readable. § Both used in SQL (but try to avoid positional stuff!)

Page 9: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

Example Instances

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day22 101 10/10/9658 103 11/12/96

R1

S1

S2

v �Sailors� and �Reserves�relations for our examples.

v We’ll use positional or named field notation, and assume that names of fields in query results are “inherited” from names of fields in query input relations (when possible).

Page 10: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

Relational Algebra

v Basic operations:§ Selection ( ) Selects a subset of rows from relation.§ Projection ( ) Omits unwanted columns from relation.§ Cross-product ( ) Allows us to combine two relations.§ Set-difference ( ) Tuples in reln. 1, but not in reln. 2.§ Union ( ) Tuples in reln. 1 and in reln. 2.

v Additional operations:§ Intersection, join, division, renaming: Not essential, but

(very!) useful. (I.e., don’t add expressive power, but…)v Since each operation returns a relation, operations

can be composed! (Algebra is �closed�.)

s

!

π

Page 11: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

Projectionsname ratingyuppy 9lubber 8guppy 5rusty 10

πsname,rating(S2)

age35.055.5

page S( )2

v Removes attributes that are not in projection list.

v Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation.

v Relational projection operator has to eliminate duplicates! (Why??)§ Note: real systems typically

don’t do duplicate elimination unless the user explicitly asks for it. (Q: Why not?)

Page 12: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

Selection

s rating S>8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname ratingyuppy 9rusty 10

p ssname rating rating S, ( ( ))>8 2

v Selects rows that satisfy a selection condition.

v No duplicates in result! (Why?)

v Schema of result identical to schema of its (only) input relation.

v Result relation can be the input for another relational algebra operation! (This is operator composition.)

Page 13: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13

Union, Intersection, Set-Difference

v All of these operations take two input relations, which must be union-compatible:§ Same number of fields.§ “Corresponding” fields

are of the same type.v What is the schema of result?

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

sid sname rating age31 lubber 8 55.558 rusty 10 35.0

S S1 2È

S S1 2Ç

sid sname rating age22 dustin 7 45.0

S S1 2- Q: Any issues w/duplicates?

Page 14: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

Cross-Productv S1 x R1: Each S row is paired with each R1 row.v Result schema has one field per field of S1 and R1,

with field names “inherited” if possible.§ Conflict: Both S1 and R1 have a field called sid.

r ( ( , ), )C sid sid S R1 1 5 2 1 1® ® ´

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96

§ Renaming operator:

Result relation name

Source expression (anything!)

Attribute renaming list

Page 15: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15

Renamingv Conflict: S1 and R1 both had sid fields, giving:

v Several renaming notations available:

ρ (S1R1(1→sid1), S1×R1)

(sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 … … … … … … … 58 rusty 10 35.0 58 103 11/12/96

ρ (TempS1(sid→sid1), S1)TempS1×R1

(πsid→sid1,sname,rating,age(S1))×R1

Positional renaming

Name-based renaming

Generalized projection(I like this notation best! J)

Page 16: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

Joinsv Condition Join:

v Result schema same as that of cross-product.v Fewer tuples than cross-product, so might be

able to compute more efficientlyv Sometimes (often!) called a theta-join.

R c S c R S!" = ´s ( )

(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96

S1⨝S1.sid < R1.sid R1

Page 17: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17

More Joins

v Equi-Join: A special case of condition join where the condition c contains only equalities.

v Result schema similar to cross-product, but only one copy of fields for which equality is specified.

v Natural Join: An equijoin on all commonly named fields.

sid sname rating age bid day22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 11/12/96

S1⨝S1.sid = R1.sid R1

Page 18: The “Online” Edition *** Lecture #11 (Relational Languages I) · 2020. 4. 22. · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

To Be Continued….