the “online” edition *** lecture #11 (relational languages i) · 2020. 4. 22. · database...
TRANSCRIPT
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
*** The “Online” Edition ***
Introduction to Data Management
Lecture #11(Relational Languages I)
Instructor: Mike Carey [email protected]
SQL
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Today’s Notices
v HW#3 is under way!§ Normalization (in part for ShopALot.com)§ Due at 11 PM on Friday w/usual 24-hr late window
v Web resource(s) for FDs and normalization:§ http://www.ict.griffith.edu.au/~jw/normalization/ind.php#1NFTo3NF§ http://www.ict.griffith.edu.au/~jw/normalization/ind.php#resources
v Today’s lecture plan:§ Relational languages – the next frontier…
§ Note: Today’s material won’t be on Midterm 1! (J)§ Please weigh in on the exam geo-poll on Piazza!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Quick Time Check!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
First: A Word on Learning...v ”When *I*was your age…” (J)
§ 8-bit !processors, LEDs, hex keypad, 16-bit for its address space, …
§ FORTRAN, Pascal, C, Lisp, Snobol, APL, Quel …§ Unix first appeared, Unix/INGRES DB 64K process design
point, and 1MB of memory was amazing!§ No web! (Just the early ArpaNet)
v Fast forward 35 years…§ Your cell phones dwarf our ~1980 computing platforms§ Python, Java, Go, C++, Ruby, SQL, SQL++, …§ Twitter… (K)§ WHAT THIS MEANS: It’s critical to learn how to read and
how to learn, how to search for info/resources, etc…!
Z(1) = Y + W(1)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
First: A Word on Learning...v ”When *I*was your age…” (J)
§ 8-bit !processors, LEDs, hex keypad, 16-bit for its address space, …
§ FORTRAN, Pascal, C, Lisp, Snobol, APL, Quel …§ Unix first appeared, Unix/INGRES DB 64K process design
point, and 1MB of memory was amazing!§ No web! (Just the early ArpaNet)
v Fast forward 35 years…§ Your cell phones dwarf our ~1980 computing platforms§ Python, Java, Go, C++, Ruby, SQL, SQL++, …§ Twitter… (K)§ WHAT THIS MEANS: It’s critical to learn how to read and
how to learn, how to search for info/resources, etc…!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
On to Relational Query Languages!
v Query languages: Allow manipulation and retrievalof data from a database.
v Relational model supports simple, powerful QLs:§ Strong formal foundation based on logic.§ Allows for much optimization.
v Query Languages ≠ programming languages!§ QLs not expected to be �Turing complete.�§ QLs not intended to be used for complex calculations.§ QLs support easy, efficient access to large data sets.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
Formal Relational Query Languages
v Two mathematical Query Languages form the basis for �real� languages (e.g., SQL), and for their implementation:§ Relational Algebra: More operational, very useful
for representing execution plans.§ Relational Calculus: Lets users describe what they
want, rather than how to compute it. (Non-operational, or declarative.)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
Preliminaries
v A query is applied to relation instances, and the result of a query is also a relation instance.§ Schemas of input relations for a query are fixed (but
query will run regardless of instance!)§ The schema for the result of a given query is also
fixed! Determined by applying the definitions of the query language’s constructs.
v Positional vs. named-field notation: § Positional notation easier for formal definitions, but
named-field notation far more readable. § Both used in SQL (but try to avoid positional stuff!)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
Example Instances
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day22 101 10/10/9658 103 11/12/96
R1
S1
S2
v �Sailors� and �Reserves�relations for our examples.
v We’ll use positional or named field notation, and assume that names of fields in query results are “inherited” from names of fields in query input relations (when possible).
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
Relational Algebra
v Basic operations:§ Selection ( ) Selects a subset of rows from relation.§ Projection ( ) Omits unwanted columns from relation.§ Cross-product ( ) Allows us to combine two relations.§ Set-difference ( ) Tuples in reln. 1, but not in reln. 2.§ Union ( ) Tuples in reln. 1 and in reln. 2.
v Additional operations:§ Intersection, join, division, renaming: Not essential, but
(very!) useful. (I.e., don’t add expressive power, but…)v Since each operation returns a relation, operations
can be composed! (Algebra is �closed�.)
s
-´
!
π
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
Projectionsname ratingyuppy 9lubber 8guppy 5rusty 10
πsname,rating(S2)
age35.055.5
page S( )2
v Removes attributes that are not in projection list.
v Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation.
v Relational projection operator has to eliminate duplicates! (Why??)§ Note: real systems typically
don’t do duplicate elimination unless the user explicitly asks for it. (Q: Why not?)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Selection
s rating S>8 2( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sname ratingyuppy 9rusty 10
p ssname rating rating S, ( ( ))>8 2
v Selects rows that satisfy a selection condition.
v No duplicates in result! (Why?)
v Schema of result identical to schema of its (only) input relation.
v Result relation can be the input for another relational algebra operation! (This is operator composition.)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
Union, Intersection, Set-Difference
v All of these operations take two input relations, which must be union-compatible:§ Same number of fields.§ “Corresponding” fields
are of the same type.v What is the schema of result?
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
sid sname rating age31 lubber 8 55.558 rusty 10 35.0
S S1 2È
S S1 2Ç
sid sname rating age22 dustin 7 45.0
S S1 2- Q: Any issues w/duplicates?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Cross-Productv S1 x R1: Each S row is paired with each R1 row.v Result schema has one field per field of S1 and R1,
with field names “inherited” if possible.§ Conflict: Both S1 and R1 have a field called sid.
r ( ( , ), )C sid sid S R1 1 5 2 1 1® ® ´
(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96
§ Renaming operator:
Result relation name
Source expression (anything!)
Attribute renaming list
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
Renamingv Conflict: S1 and R1 both had sid fields, giving:
v Several renaming notations available:
ρ (S1R1(1→sid1), S1×R1)
(sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 … … … … … … … 58 rusty 10 35.0 58 103 11/12/96
ρ (TempS1(sid→sid1), S1)TempS1×R1
(πsid→sid1,sname,rating,age(S1))×R1
Positional renaming
Name-based renaming
Generalized projection(I like this notation best! J)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
Joinsv Condition Join:
v Result schema same as that of cross-product.v Fewer tuples than cross-product, so might be
able to compute more efficientlyv Sometimes (often!) called a theta-join.
R c S c R S!" = ´s ( )
(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96
S1⨝S1.sid < R1.sid R1
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17
More Joins
v Equi-Join: A special case of condition join where the condition c contains only equalities.
v Result schema similar to cross-product, but only one copy of fields for which equality is specified.
v Natural Join: An equijoin on all commonly named fields.
sid sname rating age bid day22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 11/12/96
S1⨝S1.sid = R1.sid R1
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
To Be Continued….