schema refinement - umass amherstavid.cs.umass.edu/courses/445/s2015/lectures/lec13-fds.pdf · 22...

23
1 Schema Refinement Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke

Upload: dodan

Post on 04-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

1

Schema Refinement

Yanlei Diao UMass Amherst

Slides Courtesy of R. Ramakrishnan and J. Gehrke

2

Revisit a Previous Example

v  Consider relation obtained from Hourly_Emps: §  Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked) §  Denote the schema by listing all its attributes: SNLRWH

Contract_Emps

name ssn

Employees

Lot

hourly_wages ISA

Hourly_Emps

contractid

hours_worked rating

3

Example (Contd.)

v  Rating (R) determines hourly wages (W): §  Redundant storage §  Update: Can we change W in just the 1st tuple of rating 8? §  Insertion: Insert an employee without knowing the hourly wage for

his rating? Insert the hourly wage for rating 10 with no employee? §  Deletion: What if we have deleted all employees with rating 5.

S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40

4

Will Two Smaller Tables be Better? S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40

S N L R H123-22-3666 Attishoo 48 8 40231-31-5368 Smiley 22 8 30131-24-3650 Smethurst 35 5 30434-26-3751 Guldu 35 5 32612-67-4134 Madayan 35 8 40

Hourly_Emps2 R W8 105 7

Wages

5

The Evils of Redundancy v  Redundant storage causes several operation anomalies:

§  Insert and delete anomalies

v  Functional dependencies, a new type of integrity constraint, can be used to identify schemas with such problems.

§  IC’s we have seen: domain constraints, key constraints, foreign key constraints, general constraints (checks or assertions)

§  A new type of IC, functional dependencies, which can be checked using SQL checks or assertions.

6

1. Functional Dependencies (FDs)

v  A functional dependency X à Y holds over relation R if: §  X and Y are two sets of attributes of R; §  ∀ allowable instance r of R:

t1 ∈ r, t2 ∈ r, πX (t1) = πX (t2) implies πY (t1) = πY (t2)

§  E.g., X = {R}, Y = {W}, and R à W

v  An FD holds for all allowable instances of a schema.

v  Key constraint is a special from of FD: §  K is a super key for R means that K à R. §  K à R does not require K to be minimal!

7

FDs in the Hourly_Emps Example

v  Hourly_Emps(ssn, name, office, rating, hrly_wages, hrs_worked) §  Denoted by SNLRWH

v  Some FDs on Hourly_Emps: §  ssn is the key: S à SNLRWH §  rating determines hrly_wages: R à W

8

Reasoning About FDs

v  Given some FDs, we can usually infer additional FDs: §  {ssn à did, did à building} implies ssn à building

v  Given a set of FDs F, closure of F (F+) is the set of all FDs that are implied by F. §  All FDs in F+ hold over the relation R.

9

Axioms and Rules

v  Armstrong’s Axioms (X, Y, Z are sets of attributes): §  Reflexivity: If X ⊆ Y, then Y à X §  Augmentation: If X à Y, then XZ à YZ for any Z §  Transitivity: If X à Y and Y à Z, then X à Z

v  Couple of additional rules (that follow from AA): §  Union: If X à Y and X à Z, then X à YZ §  Decomposition: If X à YZ, then X à Y and X à Z

v  Computing the closure F+ using the axioms/rules: §  Compute for all FD’s. §  Size of closure is exponential in number of attributes!

10

Attribute Closure v  What if we just want to check if a given FD X à Y is in F+? v  Simple algorithm for attribute closure X+:

§  X+ := {X} §  DO if there is U à V in F, s.t. U ⊆ X+, then X+= X+∪V UNTIL no change §  Check if a given FD X à Y is in F+: Simply check if Y ⊆ X+.

v  Does F = {A à B, B à C, C D à E } imply A à E? §  Is A à E in the closure F+? §  Equivalently, is E in A+?

11

2. Normal Forms v  Role of FDs in detecting redundancy: R(A,B,C)

–  Given A à B:

–  No FDs (except key constraints) hold:

v  Normal forms: If a reln does not have certain kinds of FDs, certain redundancy related problems are known not to occur.

No redundancy here.

Two tuples have the same A value will have the same B value!

12

Boyce-Codd Normal Form (BCNF)

v  Rewrite every FD in the form of X à A, X is a set of attributes, A is a single attribute §  Use the decomposition rule

v  Reln R with FDs F is in BCNF if ∀ X à A in F+: 1.  A ∈ X (called a trivial FD), or 2.  X is a superkey (i.e., contains a key) for R.

v  In BCNF, the only non-trivial FDs are key constraints!

13

Boyce-Codd Normal Form (contd.)

v  Can we infer the value marked by ‘?’ §  If X → A, then the relation is not in BCNF §  A reln. in BCNF can’t have X → A

X Y A

x y1 a x y2 ?

v  Relation in BCNF:

§  Every field of every tuple records information that can’t be inferred using FD’s from other fields.

§  No redundancy can be detected using FDs!

14

Another Example

v  When is redundancy possible? §  Reserves( Sailor, Boat, Date, Credit_card) with

S à C, C à S §  Keys are SBD and CBD. §  It is not in BCNF. §  For each reservation of sailor S, same (S, C) is stored.

18

3. Decomposing a Relation Scheme

v  A decomposition of R breaks R into two or more relns s.t. §  Each new reln contains a subset of the attributes of R. §  Every attribute of R appears in at least one new reln.

v  Decompositions should be used only when: §  R has redundancy related problems (not in BCNF), §  We can afford the joins in queries later (performance penalty).

19

Example Decomposition

v  Hourly_Emps (SNLRWH) §  FDs: S à SNLRWH and R à W. §  R à W violates BCNF. §  And it causes repeated (R,W) storage.

v  To fix this, create a relation RW, remove W from the main schema. (SNLRWH) → (SNLRH) and (RW).

20

(1) Lossless Join Decompositions

v  Decomposition of schema R into R1 and R2 is lossless-join w.r.t. a set of FDs F if ∀ reln. instance r that satisfies F:

§  r = π R1 (r) π R2 (r)

v  It is always true that r ⊆ π R1 (r) π R2 (r).

§  A bad decomposition can cause r ⊂ π R1 (r) π R2 (r).

A B C1 2 34 5 67 2 8

A B1 24 57 2

B C2 35 62 8

A B C1 2 34 5 67 2 81 2 87 2 3

21

A Simple Test for Lossless Join

v  Theorem: Decomposition of R into R1 and R2 is lossless-join wrt F iff the F+ contains: §  R1 ∩ R2 à R1 or R1 ∩ R2 à R2 (intersection of R1 and R2 is a (super) key of one of them.)

v  Algorithm for a lossless join is based on the above result: §  If U àV holds over R and violates a BCNF definition, the

decomposition into UV and R - V is lossless-join.

22

(2) Dependency Preserving Decomposition

v  Contracts(Contractid, Supplierid, proJectid, Deptid, Partid, Qty, Value), CSJDPQV, with FDs: §  C is key. §  JP à C: a project buys a given part using a single contract. §  SD à P: a department buys at most one part from a supplier.

v  What are the keys? Is it in BCNF? §  Keys: C, JP, SDJ. §  Not in BCNF due to SDàP

v  Lossless-join BCNF decomposition: SDP, CSJDQV §  Problem: Checking JP à C requires an assertion (using join) §  How to write this assertion in SQL?

23

Dependency Preserving Decomposition

v  The projection of a FD set F onto a decomposed reln R1, FR1 = F+

R1, contains all U àV s.t. §  (i) U, V are both in R1, §  (ii) U àV is in closure F+.

v  Decomposition of R into R1, R2 is dependency preserving if (FR1 UNION FR2 ) + = F +

v  Important to consider F + (not F!) in this definition: §  ABC, A à B, B à C, C à A, decomposed into AB and BC. §  Is this dependency preserving? Is C à A preserved?

24

Algorithm for Decomposition into BCNF

v  Relation R with FDs F. If X à Y violates BCNF, decompose R into R1 = XY and R2 = R – Y. §  For each Ri, compute FRi and check if it is in BCNF. §  If not, pick a FD violating BCNF and keep composing Ri.

v  Repeated application of this process yields a lossless join decomposition into BCNF relations. §  But not necessarily dependency-preserving! So need to check

whether some FDs are lost.

25

Steps of BCNF Decomposition v  Contracts(CSJDPQV), key C, JP à C, SD à P, J à S.

1. Keys. C, JP, DJ. 2. Normal form. Not in BCNF; SD à P and J à S violate BCNF. 3. Decomposition. To deal with SD à P, decompose into SDP, CSJDQV.

n  SDP is in BCNF. But CSJDQV is not because: 1. Projection of FDs and keys. Projection of FDs: keys C and DJ, J à S. 2. Normal form. Not BCNF; J à S violates BCNF. 3. Decomposition. For J à S, decompose CSJDQV into JS and CJDQV.

n  JS is in BCNF. So is CJDQV.

v  If several FDs violate BCNF, the order of “dealing with” them could lead to very different sets of relations!

26

BCNF and Dependency Preservation

v  Is a lossless-join BCNF decomposition dependency-preserving? §  CSJDPQV with key C, JP à C, SD à P and J à S. §  CSJDPQV decomposed into SDP, JS, and CJDQV. §  What about the lost FD, JP à C?

a)  Use assertion. Bad performance on inserts/updates. b)  Adding JPC as a new relation to preserve JP à C introduces

redundancy across relations. In general, the new relation may not be in BCNF.

c)  Don’t decompose. Deal with insertion/deletion anomalies. DBA should examine the tradeoffs.

v  There may not exist a lossless join, dependency-preserving decomposition into BCNF. §  But there is always a lossless join, dependency-preserving decomposition

into 3NF (more see the textbook)…