the relational data model - epfllsir...2 3 the relational data model data modeling relational schema...
TRANSCRIPT
1
1
The Relational Data Model
Lecture 6
2
Outline
• Relational Data Model• Functional Dependencies• Logical Schema Design
ReadingChapter 8
2
3
The Relational Data Model
DataModeling
Relational Schema
Physicalstorage
E/R diagrams Tables: column names: attributes rows: tuples
Complexfile organizationand index structures.
Have seenthis in SQL
Have seenthis too
Discuss next
4
Terminology
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
Tuples or rows or records
Attribute namesTable name or relation name
Products:
3
5
SchemasRelational Schema:
– Relation name plus attribute names– E.g. Product(Name, Price, Category, Manufacturer)– In practice we add the domain for each attribute
Database Schema– Set of relational schemas– E.g. Product(Name, Price, Category, Manufacturer),
Company(Name, Address, Phone), . . . . . . .
6
Instances• Relational schema = R(A1,…,Ak)
Instance = relation with k attributes (of “type” R)– values of corresponding domains
• Database schema = R1(…), R2(…), …, Rn(…)Instance = n relations, of types R1, R2, ..., Rn
This is all mathematics, not to be confused with SQL tables !(What's a difference?)
4
7
Example
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
Relational schema:Product(Name, Price, Category, Manufacturer)Instance:
8
First Normal Form (1NF)• A database schema is in First Normal
Form if all tables are flat
3.9Carol
3.7Bob
3.8Alice
CoursesGPAName
OS
DB
Math
OS
DB
OS
Math
Student
3.9Carol3.7Bob3.8Alice
GPAName
Student
Course
OSDBMath
OSCarolOSAliceDBBob
AliceCarolAliceStudent Course
DBMathMath
Takes Course
May needto add keys
5
9
Functional Dependencies
• A form of constraint– hence, part of the schema
• Finding them is part of the database design• Also used in normalizing the relations
• Warning: this is the most abstract, and“hardest” part of the course.
10
Functional DependenciesDefinition:
If two tuples agree on the attributes
then they must also agree on the attributes
Formally:
A1, A2, …, An B1, B2, …, Bm
A1, A2, …, An
B1, B2, …, Bm
6
11
Examples
• EmpID Name, Phone, Position• Position Phone• but Phone Position
EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
12
In General• To check A B, erase all other columns
• check if the remaining relation is many-one(called functional in mathematics)
… A … B
X1 Y1
X2 Y2
… …
7
13
Example
EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
Position Phone
14
Typical Examples of FDs
Product: name price, manufacturer
Person: ssn name, age
Company: name stockprice, president
8
15
Example
Product(name, category, color, department, price)
name colorcategory departmentcolor, category price
Consider these FDs:
What do they say ?
16
ExampleFD’s are constraints on relations:• On some instances they hold• On others they don’t
99ToysGreenGadgetTweaker
49ToysGreenGadgetGizmo
pricedepartmentcolorcategoryname
Does this instance satisfy all the FDs ?
name colorcategory departmentcolor, category price
9
17
Example
59Office-supp.GreenStationaryGizmo
99ToysBlackGadgetTweaker
49ToysGreenGadgetGizmo
pricedepartmentcolorcategoryname
What about this one ?
name colorcategory departmentcolor, category price
18
Example
If some FDs are satisfied, thenothers are satisfied too
If all these FDs are true:name colorcategory departmentcolor, category price
Then this FD also holds: name, category price
Why ??
10
19
Inference Rules for FD’s
Is equivalent to
Splitting rule and Combining rule
Bm...B1Am...A1
A1, A2, …, An B1, B2, …, Bm
A1, A2, …, An B1A1, A2, …, An B2 . . . . .A1, A2, …, An Bm
20
Inference Rules for FD’s(continued)
Trivial Rule
Why ?Am…A1
where i = 1, 2, ..., n
A1, A2, …, An Ai
11
21
Inference Rules for FD’s(continued)
Transitive Closure Rule
If
and
then
Why ?
A1, A2, …, An B1, B2, …, Bm
B1, B2, …, Bm C1, C2, …, Cp
A1, A2, …, An C1, C2, …, Cp
22
...C1 CpBm…B1Am…A1
12
23
Example (continued)
Start from the following FDs:
Infer the following FDs:
1. name color2. category department3. color, category price
8. name, category price7. name, category color, category6. name, category category5. name, category color4. name, category name
Which Ruledid we apply ?Inferred FD
24
Example (continued)Answers:
Transitivity on 3, 78. name, category priceSplit/combine on 5, 67. name, category color, categoryTrivial rule6. name, category categoryTransitivity on 4, 15. name, category colorTrivial rule4. name, category name
Which Ruledid we apply ?Inferred FD
1. name color2. category department3. color, category price
13
25
Another Example• Enrollment(student, major, course, room, time)
student majormajor, course roomcourse time
What else can we infer ?
26
Another Rule
If
then
Augmentation follows from trivial rules and transitivityHow ?
A1, A2, …, An B
A1, A2, …, An , C1, C2, …, Cp B
Augmentation
14
27
Problem: infer ALL FDsGiven a set of FDs, infer all possible FDs
How to proceed ?• Try all possible FDs, apply all 3 rules
– E.g. R(A, B, C, D): how many FDs are possible ?• Drop trivial FDs, drop augmented FDs
– Still way too many• Better: use the Closure Algorithm (next)
28
Closure of a set of AttributesGiven a set of attributes A1, …, An
The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B
name colorcategory departmentcolor, category price
Example:
Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}
15
29
Closure AlgorithmStart with X={A1, …, An}.
Repeat until X doesn’t change do:
if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X. {name, category}+ =
{name, category, color, department, price}
name colorcategory departmentcolor, category price
Example:
30
Example
Compute {A,B}+ X = {A, B, }
Compute {A, F}+ X = {A, F, }
R(A,B,C,D,E,F) A, B CA, D EB DA, F B
16
31
Using Closure to Infer ALL FDsA, B CA, D BB D
Example:
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X Y, s.t. Y ⊆ X+ and X∩Y = ∅:
AB CD, ADBC, ABC D, ABD C, ACD B
32
Problem: Finding FDs• Approach 1: During Database Design
– Designer derives them from real-world knowledge ofusers
– Problem: knowledge might not be available• Approach 2: From a Database Instance
– Analyze given database instance and find all FD’ssatisfied by that instance
– Useful if designers don’t get enough information fromusers
– Problem: FDs might be artifical for the given instance
17
33
Find All FDs
020CircuitsEEFrank045DBCSEElsa050JavaCSEDan045DBCSECarol040HWEEAlice020C++CSEBob020C++CSEAlice
RoomCourseDeptStudent
Do all FDsmake sensein practice ?
34
Answer
Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course
Do all FDsmake sensein practice ?
18
35
Keys
• A key is a set of attributes A1, ..., An s.t. forany other attribute B, we have A1, ..., An B
• A minimal key is a set of attributes which isa key and for which no subset is a key
• Note: book calls them superkey and key
36
Computing Keys
• Compute X+ for all sets X• If X+ = all attributes, then X is a key• List only the minimal keys
Note: there can be many minimal keys !• Example: R(A,B,C), ABC, BCA
Minimal keys: AB and BC
19
37
Examples of Keys• Product(name, price, category, color)
name, category pricecategory color
Keys are: {name, category} and all supersets
• Enrollment(student, address, course, room, time)student addressroom, time coursestudent, course room, time
38
Relational Schema Design(or Logical Schema Design)
Main idea:• Start with some relational schema• Find out its FD’s• Use them to design a better relational
schema
20
39
Data AnomaliesWhen a database is poorly designed we get
anomalies:
Redundancy: data is repeated
Update anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
40
Relational Schema Design
Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:
what is his city ?
Example: Persons with several phones
SSN Name, City
Westfield908-555-2121987-65-4321JoeSeattle206-555-6543123-45-6789FredSeattle206-555-1234123-45-6789FredCityPhoneNumberSSNName
but not SSN PhoneNumber
21
41
Relation DecompositionBreak the relation into two:
Westfield987-65-4321JoeSeattle123-45-6789FredCitySSNName
908-555-2121987-65-4321206-555-6543123-45-6789206-555-1234123-45-6789PhoneNumberSSN
Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)
Westfield908-555-2121987-65-4321Joe
Seattle206-555-6543123-45-6789Fred
Seattle206-555-1234123-45-6789Fred
CityPhoneNumberSSNName
42
Relational Schema DesignPersonbuysProduct
name
price name ssn
Conceptual Model:
Relational Model:plus FD’s
Normalization:Eliminates anomalies
22
43
Decompositions in General
R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
44
Decomposition
• Sometimes it is correct:
Camera19.99GizmoCamera24.99OneClickGadget19.99Gizmo
CategoryPriceName
19.99Gizmo24.99OneClick19.99Gizmo
PriceName
CameraGizmoCameraOneClickGadgetGizmo
CategoryName
Lossless decomposition
23
45
Incorrect Decomposition
• Sometimes it is not:
Camera19.99GizmoCamera24.99OneClickGadget19.99Gizmo
CategoryPriceName
CameraGizmoCameraOneClickGadgetGizmo
CategoryName
Camera19.99Camera24.99Gadget19.99
CategoryPrice
What’sincorrect ??
Lossy decomposition
46
Decompositions in GeneralR(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An B1, ..., Bm Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Example: name price, hence the first decomposition is lossless
Note: don’t need necessarily A1, ..., An C1, ..., Cp
24
47
Normal FormsFirst Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = this lecture
Boyce Codd Normal Form (BCNF) = this lecture
Others...
48
Boyce-Codd Normal FormA simple condition for removing anomalies from relations:
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R.
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
25
49
BCNF Decomposition Algorithm
A’s OthersB’s
R1
Is there a 2-attribute relation that isnot in BCNF ?
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2Until no more violations
R2
50
Example
What are the dependencies?SSN Name, City
What are the keys?{SSN, PhoneNumber}
Is it in BCNF?
Westfield908-555-1234987-65-4321JoeWestfield908-555-2121987-65-4321JoeSeattle206-555-6543123-45-6789FredSeattle206-555-1234123-45-6789FredCityPhoneNumberSSNName
26
51
Decompose it into BCNF
Westfield987-65-4321JoeSeattle123-45-6789FredCitySSNName
908-555-1234987-65-4321908-555-2121987-65-4321206-555-6543123-45-6789206-555-1234123-45-6789PhoneNumberSSN
SSN Name, City
Let’s check anomalies:• Redundancy ?• Update ?• Delete ?
52
Summary of BCNFDecomposition
Find a dependency that violates the BCNF condition:
A’sOthers B’s
R1 R2
Heuristics: choose B , B , … B “as large as possible”1 2 m
Decompose:
2-attribute relations are BCNF
Continue untilthere are noBCNF violationsleft.
A1, A2, …, An B1, B2, …, Bm
27
53
Example DecompositionPerson(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Decompose in BCNF (in class):
Step 1: find all keys (How ? Compute S+, for various sets S)
Step 2: now decompose
54
Other Example
• R(A,B,C,D) A B, B C
• Key: AD• Violations of BCNF: A B, A C, ABC• Pick A BC: split into R1(A,BC) R2(A,D)• What happens if we pick A B first ?
28
55
Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C)
R1(A,B) R2(A,C)
R’(A,B,C) should be the same as R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
Decompose
Recover
56
Lossless Decompositions
• Given R(A,B,C) s.t. AB, thedecomposition into R1(A,B), R2(A,C) islossless
29
57
3NF: A Problem with BCNFUnit Company Product
Unit Company
Unit Product
FD’s: Unit → Company; Company, Product → UnitSo, there is a BCNF violation, and we decompose.
Unit → Company
No FDs
Notice: we loose the FD: Company, Product Unit
58
So What’s the Problem?
Unit Company Product
Unit Company Unit ProductGalaga99 UW Galaga99 databasesBingo UW Bingo databases
No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again (anomalies?):
Galaga99 UW databasesBingo UW databases
Violates the dependency: company, product -> unit!
30
59
Solution: 3rd Normal Form (3NF)A simple condition for removing anomalies from relations:
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dependency A1, A2, ..., An → Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.
Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies