databases 1 seventh lecture. topics of the lecture extended relational algebra normalization normal...

48
Databases 1 Seventh lecture

Upload: maude-simpson

Post on 28-Dec-2015

231 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

Databases 1Seventh lecture

Page 2: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

Topics of the lecture

•Extended relational algebra•Normalization•Normal forms

2

Page 3: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

3

Relational Algebra on Bags•A bag is like a set, but an element may

appear more than once.▫Multiset is another name for “bag.”

•Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set.

•Bags also resemble lists, but order in a bag is unimportant.▫Example: {1,2,1} = {1,1,2} as bags, but

[1,2,1] != [1,1,2] as lists.

Page 4: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

4

Why Bags?

•SQL, the most important query language for relational databases is actually a bag language.▫SQL will eliminate duplicates, but usually

only if you ask it to do so explicitly.

•Some operations, like projection, are much more efficient on bags than sets.

Page 5: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

5

Operations on Bags•Selection applies to each tuple, so its

effect on bags is like its effect on sets.•Projection also applies to each tuple, but

as a bag operator, we do not eliminate duplicates.

•Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.

Page 6: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

6

Example: Bag Selection

A B B C

1 2 3 45 6 7 81 2

SELECTA+B<5 (R) = A B

1 21 2

R S

Page 7: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

7

Example: Bag Projection

A B B C

1 2 3 45 6 7 81 2

PROJECTA (R) = A

151

R S

Page 8: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

8

Example: Bag Product

A B B C

1 2 3 45 6 7 81 2

R * S = A R.B S.B C

1 2 3 41 2 7 85 6 3 45 6 7 81 2 3 41 2 7 8

R S

Page 9: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

9

Example: Bag Theta-Join

A B B C

1 2 3 45 6 7 81 2

R JOIN R.B<S.B S = A R.B S.B C

1 2 3 41 2 7 85 6 7 81 2 3 41 2 7 8

R S

Page 10: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

10

Bag Union

•Union, intersection, and difference need new definitions for bags.

•An element appears in the union of two bags the sum of the number of times it appears in each bag.

•Example: {1,2,1} UNION {1,1,2,3,1} = {1,1,1,1,1,2,2,3}

Page 11: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

11

Bag Intersection

•An element appears in the intersection of two bags the minimum of the number of times it appears in either.

•Example: {1,1,2,1} INTER {1,1,2,3} = {1,1,2}.

Page 12: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

12

Bag Difference

•An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B.▫But never less than 0 times.

•Example: {1,2,1} – {1,2,3} = {1}.

Page 13: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

13

Beware: Bag Laws <> Set Laws

•Not all algebraic laws that hold for sets also hold for bags.

•For one example, the commutative law for union (R UNION S = S UNION R ) does hold for bags.▫Since addition is commutative, adding the

number of times x appears in R and S doesn’t depend on the order of R and S.

Page 14: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

14

An Example of Inequivalence

•Set union is idempotent, meaning that S UNION S = S.

•However, for bags, if x appears n times in S, then it appears 2n times in S UNION S.

•Thus S UNION S <> S in general.

Page 15: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

15

The Extended Algebra1. DELTA = eliminate duplicates from

bags.2. TAU = sort tuples.3. Extended projection : arithmetic,

duplication of columns.4. GAMMA = grouping and aggregation.5. OUTERJOIN: avoids “dangling tuples” =

tuples that do not join with anything.

Page 16: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

16

Duplicate Elimination

•R1 := DELTA(R2).•R1 consists of one copy of each tuple that

appears in R2 one or more times.

Page 17: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

17

Example: Duplicate Elimination

R = A B

1 23 41 2

DELTA(R) = A B

1 23 4

Page 18: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

18

Sorting•R1 := TAUL (R2).

▫L is a list of some of the attributes of R2.

•R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on.▫Break ties arbitrarily.

•TAU is the only operator whose result is neither a set nor a bag.

•ORDER BY in SQL

Page 19: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

19

Example: SortingR = A B

1 23 45 2

TAUB (R) = [(5,2), (1,2), (3,4)]

Page 20: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

20

Extended Projection

• Using the same PROJL operator, we allow the list L to contain arbitrary expressions involving attributes, for example:

1. Arithmetic on attributes, e.g., A+B.2. Duplicate occurrences of the same

attribute.

Page 21: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

21

Example: Extended ProjectionR = A B

1 23 4

PROJA+B,A,A (R) = A+B A1 A2

3 1 17 3 3

Page 22: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

22

Aggregation Operators

•Aggregation operators are not operators of relational algebra.

•Rather, they apply to entire columns of a table and produce a single result.

•The most important examples: SUM, AVG, COUNT, MIN, and MAX.

Page 23: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

23

Example: AggregationR = A B

1 33 43 2

SUM(A) = 7COUNT(A) = 3MAX(B) = 4AVG(B) = 3

Page 24: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

24

Grouping Operator

• R1 := GAMMAL (R2). L is a list of elements that are either:

1. Individual (grouping ) attributes.2. AGG(A ), where AGG is one of the

aggregation operators and A is an attribute.

Page 25: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

25

Applying GAMMAL(R)•Group R according to all the grouping

attributes on list L.▫That is, form one group for each distinct list of

values for those attributes in R.•Within each group, compute AGG(A ) for

each aggregation on list L.•Result has grouping attributes and

aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.

Page 26: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

26

Example: Grouping/Aggregation

R = A B C

1 2 34 5 61 2 5

GAMMAA,B,AVG(C) (R) = ??

First, group R :

A B C

1 2 31 2 54 5 6

Then, average C withingroups:

A B AVG(C)

1 2 44 5 6

Page 27: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

27

Outerjoin•Suppose we join R JOINC S.•A tuple of R that has no tuple of S with

which it joins is said to be dangling.▫Similarly for a tuple of S.

•Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.

Page 28: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

28

Example: OuterjoinR = A B S = B C

1 2 2 34 5 6 7

(1,2) joins with (2,3), but the other two tuplesare dangling.

R OUTERJOIN S = A B C

1 2 34 5 NULLNULL 6 7

Page 29: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

29

Normalization: Anomalies

•Goal of relational schema design is to avoid anomalies and redundancy.▫Update anomaly : one occurrence of a fact

is changed, but not all occurrences.▫Deletion anomaly : valid fact is lost when a

tuple is deleted.

Page 30: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

30

Example of Bad Design

Data is redundant, because each of the ???’s can be figured out by using the FD’s name -> addr favBeer and beersLiked -> manf.

name addr BeersLiked manf FavBeer

Janeway Voyager Bud A.B. WickedAle

Janeway ??? WickedAle Pete’s ???

Spock Enterprise Bud ??? Bud

Drinkers(name, addr, beersLiked, manf, favBeer)

Page 31: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

31

This Bad Design AlsoExhibits Anomalies

• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.

name addr BeersLiked manf FavBeer

Janeway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle

Spock Enterprise Bud A.B. Bud

Page 32: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

32

Boyce-Codd Normal Form

•We say a relation R is in BCNF : if whenever X ->A is a nontrivial FD that holds in R, X is a superkey.▫Remember: nontrivial means A is not a

member of set X.▫Remember, a superkey is any superset of a

key (not necessarily a proper superset).

Page 33: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

33

Example• Drinkers(name, addr, beersLiked, manf, favBeer)• FD’s: name->addr favBeer, beersLiked->manf•Only key is {name, beersLiked}.•In each FD, the left side is not a

superkey.•Any one of these FD’s shows Drinkers is

not in BCNF

Page 34: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

34

Another Example•Beers(name, manf, manfAddr)•FD’s: name->manf, manf->manfAddr•Only key is {name}.•name->manf does not violate BCNF, but

manf->manfAddr does.

Page 35: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

35

Decomposition into BCNF•Given: relation R with FD’s F.•Look among the given FD’s for a BCNF

violation X ->B.▫If any FD following from F violates

BCNF, then there will surely be an FD in F itself that violates BCNF.

•Compute X +.▫Not all attributes, or else X is a superkey.

Page 36: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

36

Decompose R Using X -> B

• Replace R by relations with schemas:1. R1 = X +.

2. R2 = (R – X +) U X.

Project given FD’s F onto the two new relations.

1. Compute the closure of F = all nontrivial FD’s that follow from F.

2. Use only those FD’s whose attributes are all in R1 or all in R2.

Page 37: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

37

Decomposition Picture

R-X + X X +-X

R2

R1

R

Page 38: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

38

Example• Drinkers(name, addr, beersLiked, manf,

favBeer)• F = name->addr, name -> favBeer,

beersLiked->manf• Pick BCNF violation name->addr.• Close the left side: {name}+ = {name, addr,

favBeer}.• Decomposed relations:

1. Drinkers1(name, addr, favBeer)2. Drinkers2(name, beersLiked, manf)

Page 39: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

39

Example, Continued•We are not done; we need to check

Drinkers1 and Drinkers2 for BCNF.•Projecting FD’s is complex in general,

easy here.•For Drinkers1(name, addr, favBeer),

relevant FD’s are name->addr and name->favBeer.▫Thus, name is the only key and Drinkers1 is

in BCNF.

Page 40: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

40

Example, Continued• For Drinkers2(name, beersLiked, manf),

the only FD is beersLiked->manf, and the only key is {name, beersLiked}.

▫ Violation of BCNF.• beersLiked+ = {beersLiked, manf}, so

we decompose Drinkers2 into:1. Drinkers3(beersLiked, manf)2. Drinkers4(name, beersLiked)

Page 41: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

41

Example, Concluded

• The resulting decomposition of Drinkers :

1. Drinkers1(name, addr, favBeer)2. Drinkers3(beersLiked, manf)3. Drinkers4(name, beersLiked)

Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.

Page 42: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

42

Third Normal Form - Motivation

•There is one structure of FD’s that causes trouble when we decompose.

•AB ->C and C ->B.▫Example: A = street address, B = city,

C = zip code.•There are two keys, {A,B } and {A,C }.•C ->B is a BCNF violation, so we must

decompose into AC, BC.

Page 43: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

43

We Cannot Enforce FD’s

•The problem is that if we use AC and BC as our database schema, we cannot enforce the FD AB ->C by checking FD’s in these decomposed relations.

•Example with A = street, B = city, and C = zip on the next slide.

Page 44: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

44

An Unenforceable FD street zip

545 Tech Sq. 02138545 Tech Sq. 02139

city zip

Cambridge 02138Cambridge 02139

Join tuples with equal zip codes.

street city zip

545 Tech Sq. Cambridge 02138545 Tech Sq. Cambridge 02139

Although no FD’s were violated in the decomposed relations,FD street city -> zip is violated by the database as a whole.

Page 45: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

45

3NF Let’s Us Avoid This Problem

•3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.

•An attribute is prime if it is a member of any key.

•X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.

Page 46: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

46

Example

•In our problem situation with FD’s AB ->C and C ->B, we have keys AB and AC.

•Thus A, B, and C are each prime.•Although C ->B violates BCNF, it does not

violate 3NF.

Page 47: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

47

What 3NF and BCNF Give You• There are two important properties of a

decomposition:1. Recovery : it should be possible to

project the original relations onto the decomposed schema, and then reconstruct the original.

2. Dependency preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.

Page 48: Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2

48

3NF and BCNF, Continued•We can get (1) with a BCNF decompsition.

▫Explanation needs to wait for relational algebra.

•We can get both (1) and (2) with a 3NF decomposition.

•But we can’t always get (1) and (2) with a BCNF decomposition.▫street-city-zip is an example.