introduction to relational database systems€¦ · ra is expressive : all sql (dml) queries as we...

40
INTRODUCTION TO RELATIONAL DATABASE SYSTEMS DATENBANKSYSTEME 1 (INF 3131) Torsten Grust Universität Tübingen Winter 2015/16 1

Upload: others

Post on 04-Jun-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

INTRODUCTION TO

RELATIONAL DATABASE SYSTEMS

DATENBANKSYSTEME 1 (INF 3131)

Torsten GrustUniversität Tübingen

Winter 2015/16

1

Page 2: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

THE RELATIONAL ALGEBRA

The Relational Algebra (RA) is a query language for the relational data model.

The definition of RA is concise: the core of RA consist of five basic operators.More operators can be defined in terms of the core but this does not add to RA’s expressiveness.

RA is expressive: all SQL (DML) queries as we have studied them in this course have an equivalentRA form (if we omit ORDER BY, GROUP BY, aggregate functions).

RA is the original query language for the relational model. Proposed by E.F. Codd in 1970. Querylanguages that are as expressive as RA are called relationally complete.(SQL is relationally complete.)

There are no RDBMSs that expose RA as a user-level query language but almost all systems useRA as an internal representation of queries.

Knowledge of RA will help in understanding SQL, relational query processing, and theperformance of relational queries in general (→ course “Datenbanksysteme II”).

‐‐

2

Page 3: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

THE RELATIONAL ALGEBRA

RA is an algebra in the mathematical sense: an algebra is a system that comprises a

a set (the carrier) and

operations, of given arity, that are closed with respect to the set.

Example: (ℕ, {*, +}) forms an algebra with two binary (arity 2) operations.

In the case of RA,

the carrier is the set of all finite relations (= sets of tuples ⚠),

the five operations are (selection), (projection), (Cartesian product), (set union), and (set difference).

Closure: Any RA operator

takes as input one or two relations (the unary operators , take additional parameters) and

returns one relation.

Relations and operators may be composed to form complex expression (or queries).

‐1.

2.

‐‐

1.

2. σ π × ∪ ∖

‐‐ σ π

‐‐

3

Page 4: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: SELECTION ( )

SelectionIf the unary selection operator is applied to input relation , the output relation holds thesubset of tuples in that satisfy predicate .

Example: apply (also consider: , , , ⚠ ) to relation

AA BB CC1 true 201 true 102 false 10

to obtainAA BB CC1 true 201 true 10

Selection does not alter the input relation schema, i.e., .

σ

σp R

R p

‐ σA=1 σB σA=A σC>20 σD=0R

‐ sch( (R)) = sch(R)σp

4

Page 5: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: SELECTION ( )

Predicate in is restricted: is evaluated for each input tuple in isolation and thus must exclusivelybe expressed in terms of

literals,

attribute references (occurring in of input relation ),

arithmetic, comparison ( , , , …), and Boolean operators ( , , ).

In particular, quantitifiers ( , ) or nested algebra expressions are not allowed in .

In PyQL, has the following implementation:

def select(p, r): """Return those rows of relation r that satisfy predicate p.""" return [ row for row in r if p(row) ]

select(), and thus , are a higher-order functions.

σ

‐ p σp p

1.

2. sch(R) R

3. = < ⩽ ∧ ∨ ¬

‐ ∃ ∀ p

‐ (r)σp

‐ σ

5

Page 6: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: PROJECTION ( )

ProjectionIf the unary operator is applied to input relation , it applies function to all tuples in .The resulting tuples form the output relation.

Function argument (the “projection list”) computes one output tuple from one input tuple. constructs new tuples that may

discard input attributes (DB slang: “attributes are projected away”)

contain newly created output attributes whose value is derived in terms of expressions over inputattributes, literals, and pre-defined (arithmetic, string, comparison, Boolean) operators.

Note:

in general

⚠ (Why?)

π

πℓ R ℓ R

‐ ℓ ℓ

1.

2.

‐‐ sch( (R)) ≠ sch(R)πℓ

‐ | (R)| ⩽ |R|πℓ

6

Page 7: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: PROJECTION ( )

PyQL implementation of :

def project(l, r): """Apply function l to relation r and return resulting relation.""" return dupe([ l(row) for row in r ]) # dupe(): eliminate duplicate list elements

is a higher-order function (in functional programming languages, is known as map).

Common cases and notation for (refer to relation shown above):

Retain some attributes of the input relation, i.e., project (throw) away all others:

Rename the attributes of the input relation, leaving their value unchanged:

Compute new attribute values:

π

‐ πℓ

‐ πl π

‐ π R(A, B, C)

‐(R)πC,A

‐(R)πX←A,Y←B

‐(R)πX←A+C, Y←¬B, Z←"LEGO"

7

Page 8: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: PROJECTION ( )

Examples:

Apply to relation

AA BB CC1 true 201 true 102 false 10

to obtainXX YY ZZ21 false LEGO11 false LEGO12 true LEGO

Apply to obtainXX YY1 true2 false

π

‐1. πX←A+C, Y←¬B, Z←"LEGO"

R

2. πX←A,Y←B

8

Page 9: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: CARTESIAN PRODUCT ( )

Cartesian ProductIf the binary operator is applied to two input relations , , the output relation containsall possible concatenations of all tuples in both inputs.

The schemata of inputs and must not share any attribute names. (This is no real restrictionbecause we have .) We have:

PyQL implementation:

def cross(r1, r2): """Return the Cartesian product of relations r1, r2.""" assert(not(schema(r1) & schema(r2))) # &: set intersection return [ row1 ⨁ row2 for row1 in r1 for row2 in r2 ] # ⨁: dict merging

×

× R1 R2

‐ R1 R2

π

sch(R×S) = sch(R) sch(S) |R×S| = |R| ⋅ |S|∪⋅

9

Page 10: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: CARTESIAN PRODUCT ( )

Example: Given graph adjacency (edge) relation , compute the paths of length two(“Where can I go in two hops?”):

fromfrom totoA BB AB CA D

Algebraic query:

Result:fromfrom totoA AA CB BB D

×

‐ G(from, to)

( (G× (G)))πfrom,to←to2 σto=from2 πfrom2←from,to2←to

10

Page 11: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: JOIN ( )

The algebraic two-hop query relied on a combination of - that is typical: (1) generate all possible(arbitrary) combinations of tuples, then (2) filter for the interesting/sensible combinations.

JoinThe join of input relations , with respect to predicate is defined as

Note:

Join does not add to the expressiveness of RA ( is a derived operator or an “RA macro” in asense).

comes with the same preconditions as and : and may onlyrefer to attributes in .

⋈p

‐ σ ×

R1 R2 p

:= ( × )R1 ⋈p R2 σp R1 R2

‐‐ ⋈p

‐ ⋈p σ × sch( ) ∩ sch( ) = ∅R1 R2 p

sch( ) sch( )R1 ∪⋅

R2

11

Page 12: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: JOIN ( )

RDBMS are equipped with an entire family of algorithms that efficiently compute joins. In particular,the potentially large intermediate result (after ) is not materialized.

Consider a join implementation in PyQL. Equational reasoning:

join(p, r1, r2)

# definition of join ≣ select(p, cross(r1, r2))# definition of cross ≣ select(p, [ row1 ⨁ row2 for row1 in r1 for row2 in r2 ])# definition of select ≣ [ row for row in [ row1 ⨁ row2 for row1 in r1 for row2 in r2 ] if p(row) ]# [ e1(y) for y in [ e2(x) for x in xs ] ] = [ e1(y) for x in xs for y in [e2(x)] ] ≣ [ row for row1 in r1 for row2 in r2 for row in [row1 ⨁ row2] if p(row) ]# [ e1(y) for x in xs for y in [e2(x)] ] = [ e1(e2(x)) for x in xs ]

≣ [ row1 ⨁ row2 for row1 in r1 for row2 in r2 if p(row1 ⨁ row2) ]

⋈p

‐×

12

Page 13: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

PLANS (OPERATOR TREES)

Depict RA queries (or plans) as data-flow trees.Tuples flow from the leaves (relations) to the root which represents the final result.

Example: Two-hop query using and :

    

‐ × ⋈p

13

Page 14: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: NATURAL JOIN ( )

Idiomatic relational database design often leads to joins between input relations , in which thejoin predicate performs equality comparisons of attributes of the same name (think of key–foreign keyjoins).

Example: Consider the LEGO database:setssetset namename catcat xx yy zz weightweight yearyear imgimg containssetset piecepiece colorcolor extraextra quantityquantity 

We have sets contains set and attribute set exactly determines the joincondition.

Associated RA key–foreign key join query:

‐ R1 R2

‐ sch( )∩sch( )= { }

(πset,name,cat,x,y,z,weight,year,img,piece,color,extra,quantity

sets (contains))⋈set2=set πset2←set,piece,color,extra,quantity14

Page 15: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: NATURAL JOIN ( )

Natural JoinThe natural join of input relations , performs a operation where is a conjunctionof equality comparisons between the attributes :

Note: the final (top-most) projection ensures that the shared attributes only occur oncein the result schema (we have anyway).

Terminology:Joins with a conjunctive all-equalities predicates are also known as equi-joins. Otherwise, isalso referred to as -join (theta-join).

R1 R2 ⋈p p

{ , … , } = sch( ) ∩ sch( )a1 an R1 R2

⋈ :=R1 R2 (πsch( )∪sch( )R1 R2

( ))R1 ⋈ = ∧⋯∧ =a1 a′1 an a′

nπ ← ,…, ← ,sch( )∖{ ,…, }a′

1 a1 a′n an R2 a1 an

R2

‐ { , … , }a1 an

=ai a′i

‐⋈p p ⋈p

θ

15

Page 16: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: NATURAL JOIN ( )

Natural Join Quiz: Consider relations

AA BB CC1 true 201 true 102 false 10

BB CC DDtrue 20 Xfalse 10 Ytrue 30 Z

and natural joins

‐R1

R2

1. ⋈ R1 R2

2. ( ) ⋈ ( )πB,C R1 πB,C R2

3. ⋈ R1 R1

4. ( ) ⋈ ( )πA,C R1 πB,D R216

Page 17: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: LAWS

Much like for the algebra (ℕ, {*, +}), the operators of RA obey laws, i.e. strict equivalences that holdregardless of the state of the input relations.

RA Laws (⚠ Excerpt Only)

(and ) are associative and commutative: and

may be pushed down into , provided that … ⟨fill in precondition⟩ …:

and may be merged:

Among these laws, selection pushdown is considered essential for query optimization. Why?

‐ ⋈ ⋈p

( ⋈ ) ⋈ = ⋈ ( ⋈ )R1 R2 R3 R1 R2 R3 ⋈ = ⋈ R1 R2 R2 R1

‐ σp ⋈q

( ) = ( )σp R1 ⋈q R2 R1 ⋈q σp R2

‐ σp σq

( (R)) = (R)σp σq σp∧q

17

Page 18: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: A COMMON QUERY PATTERN

This plan has the SQL equivalent:

   

18

Page 19: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: - - QUERIES

Quiz: Find piece ID and name of all LEGO bricks that belong to a category related to animals.

Relevant relations:brickspiecepiece namename catcat weightweight imgimg xx yy zz 

categoriescatcat namename

Animal

Algebraic plan:

π σ ⋈

‐‐

19

Page 20: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: SET OPERATIONS ( , )

Relations are sets of tuples. The usual family of binary set operations applies:

Union ( ), Difference ( )The binary set operations and compute the union and difference of two input relations

, . The schemata of the input relations must match:

The two set operations complete the operator core of RA ( , , , , ). Any query language that isexpressive as this core is relationally complete.

⚠ Set intersection ( ) is not considered a core RA operator. There is more than one way to expressintersection as an RA macro:

∪ ∖

∪ ∖∪ ∖

R1 R2

sch( ) sch( ) = sch( ∪ ) = sch( ∖ ).R1 =!R2 R1 R2 R1 R2

‐ σ π × ∪ ∖

‐ ∩

∩ :=R1 R2

20

Page 21: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: SET OPERATIONS ( , )

def union(r1, r2): """Return the union of relations r1, r2.""" assert(matches(schema(r1), schema(r2))) return dupe([ row1 for row1 in r1 ] + [ row2 for row2 in r2 ])

def difference(r1, r2): """Return the difference of r1, r2 (a subset of r1).""" assert(matches(schema(r1), schema(r2))) return [ row1 for row1 in r1 if row1 not in r2 ] # ⚠ Note the negation (not)

More RA Laws

is associative and commutative: and

The empty relation is the neutral element for :

∪ ∖

‐ ∪( ∪ ) ∪ = ∪ ( ∪ )R1 R2 R3 R1 R2 R3 ∪ = ∪ R1 R2 R2 R1

‐ ∅ ∪R ∪ ∅ = ∅ ∪ R = R

21

Page 22: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: SET OPERATIONS ( , )

In RA queries, can straightforwardly express case distinction.

Example: Categorize LEGO sets according to their size (volume measured in stud³).setssetset namename catcat xx yy zz weightweight yearyear imgimg

x y z

∪ ∖

‐ ∪

22

Page 23: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

SQL: CASECASE…WHENWHEN (MULTI-WAY CONDITIONAL)

Conditional Expression (CASECASE…WHENWHEN)

The SQL expression CASE…WHEN implements case distinction ("multi-way if…else"):

CASE WHEN ⟨condition₁⟩ THEN ⟨expression₁⟩ [WHEN …] [ELSE ⟨expression₀⟩]END

CASE…WHEN evaluates to the first ⟨expressionᵢ⟩ whose Boolean ⟨conditionᵢ⟩ evaluatesto TRUE. If no WHEN clause is satisfied return the value of the fall-back ⟨expression₀⟩, ifpresent (otherwise return NULL).

Expression ⟨expressionᵢ⟩ never evaluated if ⟨conditionᵢ⟩ evaluates to FALSE.

The types of the ⟨expressionᵢ⟩ must be convertible into a single output type.

‐‐

23

Page 24: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

SQL: SET OPERATIONS

The family of set operations is available in SQL as well. Since SQL operates over unordered lists (or:bags) of rows, modifiers control the inclusion/removal of duplicates:

UNIONUNION, EXCEPTEXCEPT, INTERSECTINTERSECT

The binary set (bag) operations connect two SQL SFW blocks. Schemata must match(columns of the same name have convertible types). Modifier DISTINCT (i.e. setsemantics) is the default:

⟨SFW⟩ { UNION | EXCEPT | INTERSECT } [ ALL | DISTINCT ] ⟨SFW⟩

Bag semantics (ALL) with , duplicate rows contributed by the two SFW blocks:

OperationOperation Duplicates in resultDuplicates in resultUNION ALLEXCEPT ALL

INTERSECT ALL

‐ m n

m + nmax(m − n, 0)

min(m, n)24

Page 25: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: SET OPERATIONS ( , )

With , (and ) now being available, we may be even more restrictive with respect to theadmissable predicates in :

literals, attribute references, arithmetics, comparisons are OK,

the Boolean connectives ( , , ) are not allowed.

=

=

=

∪ ∖

‐ ∪ ∖ ∩p σp

1.

2. ∧ ∨ ¬

‐ (R)σp∧q

‐ (R)σp∨q

‐ (R)σ¬p

25

Page 26: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: MONOTONICITY

Monotonic Operators and QueriesAn algebraic operator is monotonic if a growing input relation implies that the outputrelation grows, too: .

An RA query is monotonic if it exclusively uses monotonic operators.

Operator (Operator ( : Input): Input) Monotonic?Monotonic?✓✓

, ✓, ✓

✓❌

def difference(r1, r2): ⋮ return [ row1 for row1 in r1 if row1 not in r2 ] # ⚠ If r2 grows, result may shrink

⊗ R ⊆ S ⇒ ⊗(R) ⊆ ⊗(S)

□(□)σp

(□)πℓ□ × _ _ × □□ ∪ _ _ ∪ □

□ ∖ __ ∖ □

26

Page 27: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: MONOTONICITY

It follows that we require the difference operator whenever a non-monotonic query is to be answered:

“Find those LEGO sets that do not contain LEGO bricks with stickers.”

“Find those LEGO sets in which all bricks are colored yellow.”

“Find the LEGO set that is the heaviest.”

More general: Typical non-monotonic query problems are those that …

… check for the non-existence of an element in a set ,

… check that a condition holds for all elements in a set ,

… find an element that is minimal/maximal among all other elements in a set .

⚠ Note that cases 2 and 3 in fact indeed are instances of case 1. How?

Insertion into may invalidate tuples that were valid query responses before grew.

‐‐‐‐

‐1. S

2. S

3. S

‐ S S

27

Page 28: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: NON-MONOTONIC QUERY“Find those LEGO sets that do not contain LEGO bricks with stickers.”

setssetset namename

containssetset piecepiece

brickspiecepiece namename

Sticker

Identify the LEGO bricks with stickers. (bricks)

Find the sets that contain these bricks. (contains)

These are exactly those sets that do not interest us. (contains)

Attach set name and other set information of interest. (sets)

catcat xx yy zz weightweight yearyear imgimgs

colorcolor extraextra quantityquantitys p

catcat weightweight imgimg xx yy zzp

1.

2.

3.

4.28

Page 29: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: NON-MONOTONIC QUERY“Find those LEGO sets in which all bricks are colored yellow.”

setssetset namename imgimg

containssetset colorcolor

colorscolorcolor namename

Yellow

Query/problem indeed is non-monotonic:Insertion into relation ____________ can invalidate a formerly valid result tuple.

Plan of attack resembles the no stickers query (see following slide).

catcat xx yy zz weightweight yearyears

piecepiece extraextra quantityquantitys c

finishfinish rgbrgb from_yearfrom_year to_yearto_yearc

29

Page 30: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: NON-MONOTONIC QUERY“Find those LEGO sets in which all bricks are colored yellow.”

➊ Lookup the colors of the individual bricks (identify yellow in relation colors).

➋ Select the bricks that are not yellow.

➊setset colorcolor

██████

➋setset colorcolor

███

‐‐

s1s1s2s2s3s3

s1s3s3

30

Page 31: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

REL. ALGEBRA: NON-MONOTONIC QUERY“Find those LEGO sets in which all bricks are colored yellow.”

➌ Identify the sets that contain (at least one) such non-yellow brick.

Among all sets ➍, the sets of ➌ are exactly those we are not interested in.

➎ Thus, return all other sets.

➌setset

➍setset

➎setset

‐‐‐

s1s3

s1s2s3

s2 31

Page 32: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: SEMIJOIN ( )

Join used as a filter: above , only attributes of contains are relevant.

(Left) SemijoinThe left semijoin of input relations , returns those tuples of for which at leastone join partner in exists:

acts like a filter on : . Can be used to implement the semantics of existentialquantitifcation ( ) in RA.

⋉p

⋉p R1 R2 R1

R2

:= ( )R1 ⋉p R2 πsch( )R1 R1 ⋈p R2

‐ ⋉p R1 ⊆R1 ⋉p R2 R1

∃32

Page 33: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: ANTIJOIN ( )

(Left) AntijoinThe left antijoin of input relations , returns those tuples of for which there doesnot exist any join partner in :

can be used to implement the semantics of universal quantification:

Example: Use self-left-antijoin on to compute :

⊳p

⊳p R1 R2 R1

R2

:= ∖ ( ) = ∖ ( )R1 ⊳p R2 R1 R1 ⋉p R2 R1 πsch( )R1 R1 ⋈p R2

‐ ⊳p

= {x | x ∈ , ¬∃y ∈ : p(x, y)} = {x | x ∈ , ∀y ∈ : ¬p(x, y)}R1 ⊳p R2 R1 R2 R1 R2

‐ S(A) max(S)

S (S) ⊳A<A′ π ←AA

′ = {x | x ∈ S, ¬∃y ∈ (S) : x. A < y. }π ←AA′ A′

= {x | x ∈ S, ∀y ∈ (S) : x. A ⩾ y. }π ←AA′ A′

= {max(S)}33

Page 34: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: DIVISION ( )

Certain query scenarios involving quantifiers can be concisely formulated using the derived RAoperator division ( ):

DivisionThe relational division ( ) of input relation by returns those values of

such that for every value in there exists a tuple in . Let :

Notes:

Schemata in general: and .

Division? Division!If then and .

÷

‐÷

÷ (A, B)R1 (B)R2 A a

R1 B b R2 (a, b) R1

s = sch( ) ∖ sch( )R1 R2

÷ := ( ) ∖ (( ( ) × ) ∖ )R1 R2 πs R1 πs πs R1 R2 R1

‐‐ sch( ) ⊂ sch( )R2 R1 sch( ÷ ) = sch( ) ∖ sch( )R1 R2 R1 R2

‐ × = SR1 R2 S ÷ =R1 R2 S ÷ =R2 R1

34

Page 35: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: DIVISION ( )

Example: Divide by :

 AA   BB 1 a1 c2 b2 a2 c3 b3 c3 a3 d

BBac

              

BBabc

÷

‐ (A, B)R1 (B)R2R1

R2 R2

35

Page 36: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: OUTER JOINFind the bricks of LEGO Set 336–1 along with possible replacement bricks

containssetset piecepiece quantityquantity (= 336-1)

matchespiece1piece1 piece2piece2 setset

brickspiecepiece namename

Relation matches: In set , piece is considered a replacement for original piece .

Query (⚠ unexpectedly returns way too few rows…):

colorcolor extraextras p1

p1 p2 s

catcat weightweight imgimg xx yy zzpi

‐ s p2 p1

(πpiece,name,piece2

(bricks) ⋈ ( (contains)) ⋈ (matches))πpiece,name πset,piece σset=336–1 πpiece←piece1,piece2,set

36

Page 37: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

RELATIONAL ALGEBRA: OUTER JOIN

(Left) Outer JoinThe (left) outer join of input relations and returns all tuples of the join of , plus those tuples of that did not find a join partner (padded with ▢). Let

:

Notes:

A Cartesian product with a singleton relation can conveniently implement the required padding.

is non-monotonic: insertion into or may invalidate a former ▢-padded result tuple.

The variants right ( ) and full outer join ( ) are defined in the obvious fashion.

⟕p R1 R2 R1 R2

R1

sch( ) = { , … , }R2 a1 an

:= ( ) ∪ (( ) × {( : ▢, … , : ▢)})R1 ⟕p R2 R1 ⋈p R2 R1 ⊳p R2 a1 an

‐1.

2. ⟕p R1 R2

3. ⟖ ⟗

37

Page 38: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

SQL: ALTERNATIVE JOIN SYNTAX

Alternative SQL Join Syntax

In the FROM clause, two ⟨from_item⟩s may be joined using an RA-inspired syntax as follows:

⟨from_item⟩ ⟨join_type⟩ JOIN ⟨from_item⟩ [ ON ⟨condition⟩ | USING (⟨column⟩ [, …]) ]

where ⟨join_type⟩ is defined as

{ [NATURAL] { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } | CROSS }

to indicate , , , , and respectively.

For all join types but CROSS, exactly one of NATURAL, ON, or USING must be specified.

USING ( ,…, ) abbreviates a conjunctive equality condition over the columns.

⋈ ⟕ ⟖ ⟗ ×

‐‐ a1 an n

38

Page 39: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

SQL: USING OUTER JOINOUTER JOIN

List all LEGO colors ordered by popularity (= number of bricks available in that color)

colorscolorcolor namename finishfinish

available_incolorcolor brickbrick

Recall: the relationship between colors and bricks was described in the LEGO database ER diagram asfollows, i.e., there may be unpopular colors that are not used (anymore):

[brick]&&(0,*)&&⟨available in⟩&&(0,*)&&[color]

Formulate the query with output columns: name, finish, popularity (= ⟨brick count⟩ ⩾ 0) …

… using SQL’s alternative join syntax,

… using all SQL constructs but the alternative join syntax.

rgbrgb from_yearfrom_year to_yearto_yearc

c

‐1.

2. 39

Page 40: INTRODUCTION TO RELATIONAL DATABASE SYSTEMS€¦ · RA is expressive : all SQL (DML) queries as we have studied them in this course have an equivalent RA form (if we omit ORDER BY

40