design theory | eecs-3421m: introductioystems | winter ... · the designer presribes them by naming...

49
1.1 Design Theory Design Theory 1. Keys & FDs 2. The Normal Forms 3. Reasoning with FDs 4. Normalization

Upload: others

Post on 30-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

1 . 1

Design TheoryDesign Theory1. Keys & FDs2. The Normal Forms3. Reasoning with FDs4. Normalization

Page 2: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

1 . 2

Table of ContentsTable of ContentsDesign Theory 0

Table of Contents 0/1Parke Godfrey 0/2Acknowledgments 0/3

1. Keys & FDs 1

Notation: functionally determines 1/1Notation: keys & superkeys 1/2Notation: functional dependencies 1/3Canonical form 1/4Shorthand for FDs 1/5

Page 3: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

1 . 3

Parke GodfreyParke Godfrey2016-10-05 initial [v1] 2016-10-17 [v2]

Page 4: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

1 . 4

AcknowledgmentsAcknowledgmentsThanks

to Jeffrey D. Ullman for initial slidedeck

to Jarek Szlichta for the slidedeck with significant refinements on whichthis is derived

Page 5: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

2 . 1

1. Keys & FDs1. Keys & FDs

Page 6: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

2 . 2

Notation: Notation: functionally determinesfunctionally determinesLet be the set of attr's of table .

If subset of attr's is “the” key of , then there is atmost one tuple with given values for the attr's in in(any instance of) table .

Another way to think of this is that, given values for ,there is a distinct value for each attr. in .

In this case, we say that functionally determines .We denote this by

Page 7: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

2 . 3

Notation: Notation: keys & superkeyskeys & superkeysGiven — the set of all attr's of table — we call anysubset such that a superkey of table .

If no proper subset of a superkey for is also asuperkey for — that is, — then wecall a key of table .

Page 8: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

Notation: Notation: functional dependenciesfunctional dependenciesWe might happen to know that, in some domain,

holds, where , but ; that is,

.

Thus, is not a superkey of ! But such things as

will be important.

We will call a functional dependency (“FD”).

Note. Call an FD a superkey FD if its left-hand side is

a superkey; call it a key FD if its LHS is a key.

Not all FDs are superkey FDs!

Page 9: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

2 . 42 . 5

Canonical formCanonical formsplitting right-hand sidessplitting right-hand sidesConsider where . Then thatFD is equivalent to the set of FDs

We generally express FD’s with singleton right-handsides.

There is no splitting rule for left-hand sides! Thatwould be incorrect.

Page 10: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

2 . 6

Shorthand for FDsShorthand for FDsAs shorthand, instead of using “{”'s and “}”'s everywhere,when we are using single letters for attr's, we just mungethem together.

E.g., we write as .

Page 11: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

3 . 1

Example: Drinker & FDsExample: Drinker & FDs

Here, we mean the drinker ( ) likes that , and it

is manufactured by .

Reasonable FDs to assert:

Equivalent to

Page 12: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

3 . 2

Example: key of DrinkerExample: key of Drinker is a key of because neither

nor is a superkey.

In this case, there are no other keys.

But there are lots of superkeys! Namely, any supersetof .

Page 13: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

3 . 3

VisualizingVisualizingWe sometimes draw out FDs to see what is going on.

N A B M F

Page 14: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

4

Where do FDs come from?Where do FDs come from?The designer presribes them by naming keys.

From real-world “constraints” that the designer knowsand prescribes.

E.g., “no two classes can meet in the same room atthe same time”

Because we may have FDs in addition to the prescribed key FDs, additional key FDs mightexist. We may have to assert our FDs, then deduce the keys by systematic exploration!

Page 15: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

5 . 1

Problem: non-key FDsProblem: non-key FDsNon-key FDs are problematic, because they allow for thepotential of anomalies.

our game: Design to have only superkey FDs.

Page 16: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

5 . 2

Anomalies / redundanciesAnomalies / redundanciesupdate anomaly: one occurrence of a fact ischanged, but not all occurrences are.

deletion anomaly: a valid fact is lost when a tuple isdeleted.

The goal of relational schema design is to avoid suchanomalies and redundancy.

Non-key FDs cause these problems because theseviolate our “single source of truth” mandate.

Page 17: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

5 . 3

Example: anomaliesExample: anomaliesdeletion anomaly.

No drinker likes the beer Bud.

Thus, no tuple appears in with .

Then we do not have the information that and .

insertion anomaly.

Say there is a tuple in with and (with, say,

). correct

Someone accidentally adds another tuple later with and

(with, say, ). incorrect

Note. Our key of has not been violated.

Who manufactures Bud?

Page 18: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

6 . 1

2. The Normal Forms2. The Normal Forms

Page 19: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

6 . 2

First passFirst passWhat the normal forms doWhat the normal forms do1. 1NF: Every table (relation) has a key prescribed.

(Trivial to achieve. Why?)2. 2NF: Every table is in 1NF and no table has a partial-

key dependency.3. 3NF: Every table is in 2NF and no table has a

transitive dependency.4. BCNF: Every table is in 3NF and no table has a back

dependency.

If we can achieve BCNF, we should be good to go!

Page 20: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

6 . 3

Partial key dependencies (Partial key dependencies ( 2NF)2NF)Consider ( ).

S C G N

“The” key FD is clearly .

We also have the non-key FD .

We call this a partial-key dependency because its

LHS is a proper subset of a key.

E.g., .

Page 21: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

Transitive dependencies (Transitive dependencies ( 3NF)3NF)Consider ( ).

E N L W

“The” key FD is .

We also have the non-key FD .

We call this a transitive dependency because its

LHS and RHS are not subsets of any keys.

E.g., and .

Page 22: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

6 . 46 . 5

Back dependencies (Back dependencies ( BCNF)BCNF)Consider ( ).

S F C G

“The” key FD is .

We also have the non-key FD .

We call this a back dependency because its LHS

seems to not be a subset of a key but its RHS is

part of a key.

E.g., .

Page 23: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

7 . 1

Second passSecond passFormally defining thingsFormally defining thingsBut your “definitions” above seem fuzzy!

“Seems to not be a subset of a key?!”

The examples above give you a feel for what is going on.

Why are they not formal definitions?

We did not say what the situation is when

the LHS or RHS of a non-key FD overlaps with a key,

orthere is more than one key.

Page 24: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

7 . 2

What were the keys in What were the keys in ??Well, . That was stated!

But also !

But then… wouldn't be a partial-keydependency?!

Page 25: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

Formal definitionsFormal definitions1NF: Each attr. is an elementary type. A key is definedfor each relation.2NF: Whenever holds in and ,

is prime, or is not a proper subset of any key of .

3NF: Whenever holds in and , is prime, or is a superkey of .

BCNF: Whenever holds in and , is a superkey of .

An attr. is prime if it is part of any key of . E.g., in .

Page 26: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

7 . 38 . 1

3. Reasoning with FDs3. Reasoning with FDsSome FDs that hold for a relation may be implicit.

That is, an implicit FD may logically follow from the set ofknown FDs.

We will need to infer the implicit ones. For example, weneed to know a relation's keys to test whether it is inBCNF.

Is this hard? Well…yes and no.

Page 27: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

8 . 2

Why might it be hard?Why might it be hard?How many different keys can a single-key rel'nHow many different keys can a single-key rel'n

have?have?

Exponential!

Consider a rel'n with attr's. Then, there are possible

different keys.

How many possible keys of length ?

Page 28: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

8 . 3

Why might it be hard?Why might it be hard?How many keys might a rel'n have simultaneously?How many keys might a rel'n have simultaneously?

There can be lots of FDs! And lots of key FDs too.

Consider I know that , , …, , .

There are keys!

There can be an exponential number of key FDs.

Page 29: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

8 . 4

What is easy?What is easy?From a set of attr's , I can find the closure, , of thatset of attr's easily.

(In fact, there is a linear runtime algorithm known forthis!)

Page 30: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

9 . 1

FD axiomatizationFD axiomatizationA sound and complete set of axioms for FDs is asfollows.

1. Reflexivity. if .

These are called trivial FDs.

2. Augmentation. If then for any .

3. Transitivity. If and then .

Page 31: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

9 . 2

Computing Computing ��Computing the closure of a set of attr's is just thefixpoint with respect to transitivity then.

Let .

Define .

where .

Then for which .

Page 32: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

9 . 3

Ever need to find Ever need to find allall keys of a rel'n? keys of a rel'n?

Unfortunately, we might have to.

For instance, to check a schema for meeting normalform.

See Exercise #9 from (and ).Exercises for Study with answers

Page 33: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

10 . 1

4. Normalization4. NormalizationGoal: A “correct” schema in BCNF.

Or in 3NF, if that is not possible.

So…how to achieve BCNF (or 3NF)?

Page 34: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

10 . 2

Two approachesTwo approachesdecomposition: a top-down approach

Start with our schema and refine it.

synthesis: a bottom-up approach

Start with the prescribed FDs and create a schemafrom them.

Page 35: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 1

DecompositionDecompositionMethod Sketch

1. Find a problematic FD; i.e., one that breaks BCNF.

2. Make the problematic FD “go away”. How?

Let the FD be key of its own rel'n.

That is, split the non-BCNF rel'n into two rel'ns in a

lossless way.

3. Repeat until no more problematic FDs!

See Exercise #16 from (and ).Exercises for Study with answers

Page 36: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 2

Two goalsTwo goals1. lossless decomposition

We must ensure that the final schema can“reproduce” our original schema. That nothing islost.

2. dependency preservation

All of our FDs are ensured by the final schema (byrel'n's keys).

Page 37: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 3

Lossless join decompositionLossless join decompositionSay we have a rel'n schema of and an FD

that violates BCNF for .

We can break into two rel'ns:

1.

This is just the “FD” itself!

Its key can be , the LHS of the FD.

2.

And the rest of what is left over from

…repeating the LHS of the FD!

This will server as a foreign key from 2) to 1).

The key can be whatever was key for .

Page 38: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 4

Lossless join decomposition (general)Lossless join decomposition (general)Given and an FD violating — assume that

is non-trivial, that — replace bytwo rel'ns:

1.

2.

Page 39: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 5

What goes wrong if not What goes wrong if not losslesslossless??We cannot reproduce the “database” with respect to theoriginal schema from our decomposed schema.

That is, if we “joined” our two decomposed rel'ns, wemight not recover the original rel'n.

Page 40: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 6

Example of a bad decompositionExample of a bad decompositionConsider with no FDS and I break it into and . Let our table be

A B C1 2 34 2 5

Page 41: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 7

Example of a bad decomposition (2)Example of a bad decomposition (2)This decomposes into

A B1 24 2

 

and

B C2 32 5

Page 42: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

11 . 8

Example of a bad decomposition (3)Example of a bad decomposition (3)But “joining” these back together ( ) does notgive us the same thing that was in !

A B C1 2 31 2 54 2 34 2 5

The extra resulting tuples here from the “lossy” join are called spurious.

Page 43: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

12 . 1

But what about But what about dependency preservationdependency preservation??

good news

Lossless join decomposition steps can always get useventually to BCNF!

bad news

the resulting schema may not be dependencypreserving.

See Exercise #17 from (and ).Exercises for Study with answers

Page 44: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

12 . 2

DecompositionDecompositionRevised Method

1. Find a problematic FD for a rel'n, decompose the rel'nlosslessly with respect to the FD.

2. Repeat until no more problematic FDs!

3. Add back any FDs that are not covered in the resultingschema as rel'ns.

Page 45: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

12 . 3

Can we always have BCNF?Can we always have BCNF?The method above often works, but does not always.

What can go wrong? Some of the non-covered FDs thatwe add back in as rel'ns may not be in BCNF!

Well, we could decompose an added, non-BCNFrel'n…But then the corresponding FD is not covered again!Stuck.

The result does arrive always to a 3NF, dependencypreserving schema!

Page 46: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

12 . 4

Is there just Is there just oneone lossless decomposition? lossless decomposition?

Of course not! The decomposition depends on the orderof the decomposition steps we apply.

There may be an exponential number of lossless-join

decompositions.

One of them might be BCNF and dependency preserving

(after we add back in the non-covered)!

But finding this one might be hard.

And it might not even exist!

Page 47: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

13 . 1

Synthesis MethodSynthesis MethodFind a minimal basis of the set of declared (explicit) FDs.

That's it!

Page 48: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

13 . 2

Is polynomial!Is polynomial!It is polynomial to find a minimal basis!

The resulting schema is in 3NF and it is dependency

preserving.

Note. There is no notion of “lossless” here; there is no

“original” schema to compare against.

Finding all minimal bases, though, is exponential.

One of them might be in BCNF too!

But this is NP-complete to find.

Page 49: Design Theory | EECS-3421M: Introductioystems | Winter ... · The designer presribes them by naming keys. From real-world “constraints” that the designer knows ... We can break

Design Theory | EECS-3421M: Introduction to Database Systems | Winter 2018 13 . 3

Minimal basisMinimal basis1. Throw away any FD that can be derived from the

others (the remaining ones)

Repeat until no such FD remains.

2. Throw away any attr. on the LHS of an FD if the

resulting FD plus the others can derive the original FD.

Repeat until no such FD remains.

See Exercise #18 from (and ).Exercises for Study with answers