functional dependencies (autosaved)

27
Functional Dependencies Functional dependencies play a key role in differentiating good database designs from bad database designs. Basic Concepts Functional dependencies are constraints on the set of legal relations. They allow us to express facts about the enterprise that we are modeling with our database. In Chapter we defined the notion of a superkey as follows. Let R be a relation shema. A subset K or R is a superkey of R if, in any legal relation r®, for all pairs t 1 and t 2 of tuples in r such that t 1 t 2 , then t 1 [K] t 2 [K]. That is, no two tuples in any legal relation r ® may have the same value on attribute set K. The notion of functional dependency generalizes the notion of superkey. Consider a relation schema R, and let R and R. The functional dependency. holds on schema R if, in any legal relation r(R), for all pairs of tuples t 1 and t 2 in r such tat t 1 [] = t 2 [], it is also the case that t 1 [] = t 2 []. Using the functional-dependency notation, we say that K is a superkey of R if K R. That is, K is a superkey if, whenever t 1 [K] = t 2 [K], it is also the case that t 1 [R] = t 2 [R] (that is t 1 = t 2 ).

Upload: janani2229

Post on 25-Nov-2014

143 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional Dependencies (Autosaved)

Functional Dependencies

Functional dependencies play a key role in differentiating good database designs from bad database designs.

Basic Concepts

Functional dependencies are constraints on the set of legal relations. They allow us to express facts about the enterprise that we are modeling with our database.

In Chapter we defined the notion of a superkey as follows. Let R be a relation shema. A subset K or R is a superkey of R if, in any legal relation r®, for all pairs t 1 and t2 of tuples in r such that t1 t2, then t1[K] t2[K]. That is, no two tuples in any legal relation r ® may have the same value on attribute set K.

The notion of functional dependency generalizes the notion of superkey. Consider a relation schema R, and let R and R. The functional dependency.

holds on schema R if, in any legal relation r(R), for all pairs of tuples t 1 and t2 in r such tat t1 [] = t2[], it is also the case that t1 [] = t2 [].

Using the functional-dependency notation, we say that K is a superkey of R if K R. That is, K is a superkey if, whenever t1 [K] = t2[K], it is also the case that t1[R] = t2

[R] (that is t1 = t2).

Functional dependencies allow us to express constraints that we cannot express with superkeys. Consider the schema.

Loan-info-schema = (loan-number, branch-name, customer-name, amount)

Which is simplification of the Lending-schema that we saw earlier. The set of functional dependencies that we expect to hold on this relation schema is

loan-number amount

loan-number branch-name

Page 2: Functional Dependencies (Autosaved)

We would not however, expect the functional dependency

loan-number customer-name

to hold, since in general, a given loan can be made to more than one customer (for example, to both members of a husband-wide pair).

We shall use functional dependencies in two ways.

1. To test relations to see whether they are legal under a given set of functional dependencies. If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.

2. To specify constraints on the set of legal relations. We shall thus concern ourselves with only those relations that satisfy a given set of functional dependencies. If we wish to constraint ourselves to relations on schema R that satisfy a set F of functional dependencies, we say that F holds on R.

Let us consider the relation r of Figure to see which functional dependencies are satisfied. Observe that A C is satisfied. There are two tuples that have an A value of a1. These tuples have the same C value-namely, c1. Similarly, the two tuples with an A value of a2 have the same C value, c2. There are no other pairs of distinct tuples that have the same A value. The functional dependency C A is not satisfied, however. To see that it is not, consider the tuples t1 = (a2, b3, c2, d3) and

Figure Sample relation r.

A B C D

a1

a1

a2

a2

a3

b1

b2

b2

b2

b3

c1

c1

c2

c2

c2

d1

d2

d2

d3

d4

Page 3: Functional Dependencies (Autosaved)

t2 = (a3, b3,c2, d4). These two tuples have the same C values, c2, but they have different A values, a2 and a3, respectively. Thus, we have found a pair of tuples t1 and t2 such that t1[C] = t2 [C], but t1[A] t2[A].

Some functional dependencies are said to be trivial because they are satisfied by all relations. For example, A A is satisfied by all relations involving attribute A. Reading the definition of functional dependency literally, we see that, for all tuples t 1 and t2 such that t1 [A] = t2 [A], it is the case that t1 [A] = t2 [A]. Similarly, AB A is satisfied by all relations involving attribute A. In general, a functional dependency of the form is trivial if .

Figure The loan relation.

In the loan relation (on Loan-schema) of Figure. We see that the dependency loan-number amount is satisfied. In contrast to the case of customer-city and

loan-number

branch-name

Amount

L-17

L-23

L-15

L-14

L-93

L-11

L-29

L-16

L-18

L-25

L-10

Downtown

Redwood

Perryridge

Downtown

Mianus

Round Hill

Pownal

North Town

Downtown

Perryridge

Brighton

1000

2000

1500

1500

500

900

1200

1300

2000

2500

2200

Page 4: Functional Dependencies (Autosaved)

customer street in Customer-schema, we do believe that the real-world enterprise that we are modeling requires each loan to have only one amount. Therefore, we want to require that loan-number amount be satisfied by the loan relation at all times. In other words, we required that the constraint loan-number amount hold on Loan-schema.

Closure of a Set of Functional DependenciesClosure of a Set of Functional DependenciesGiven a set F set of functional dependencies, there are certain other functional

dependencies that are logically implied by F.For example: If A B and B C, then we can infer that A C

The set of all functional dependencies logically implied by F is the closure of F.We denote the closure of F by F+.We can find all of F+ by applying Armstrong’s Axioms:

if , then (reflexivity) if , then g g (augmentation) if , and g, then g (transitivity)

These rules are sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold).

We can further simplify manual computation of F+ by using the following additional rules.

If holds and g holds, then g holds (union) If g holds, then holds and g holds (decomposition) If holds and g d holds, then g d holds (pseudotransitivity)

ExampleExample R = (A, B, C, G, H, I)

F = { A B A CCG HCG I B H}

some members of F+

A H by transitivity from A B and B H

AG I by augmenting A C with G, to get AG CG

and then transitivity with CG I CG HI

by augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and then transitivity

Procedure for Computing FProcedure for Computing F++

To compute the closure of a set of functional dependencies F:

F + = F

Page 5: Functional Dependencies (Autosaved)

repeatfor each functional dependency f in F+

apply reflexivity and augmentation rules on f add the resulting functional dependencies to F +

for each pair of functional dependencies f1and f2 in F +

if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F +

until F + does not change any further

Closure of Attribute SetsClosure of Attribute SetsGiven a set of attributes a, define the closure of a under F (denoted by a+) as the set of attributes that are functionally determined by a under F:

a is in F+ ó a+

Algorithm to compute a+, the closure of a under Fresult := a;while (changes to result) do

for each g in F dobegin

if result then result := result È g end

Example of Attribute Set ClosureExample of Attribute Set ClosureR = (A, B, C, G, H, I)F = {A B

A C CG HCG IB H}

(AG)+

1. result = AG2. result = ABCG (A C and A B)3. result = ABCGH (CG H and CG AGBC)4. result = ABCGHI (CG I and CG AGBCH)

Is AG a candidate key? Yes, it includes all attributes of R

Uses of Attribute ClosureUses of Attribute ClosureThere are several uses of the attribute closure algorithm:

Testing for superkey:To test if is a superkey, we compute +, and check if + contains all

attributes of R. Testing functional dependencies

To check if a functional dependency holds just check if +. That is, we compute + by using attribute closure, and then check if it contains .

Is a simple and cheap test, and very useful

Page 6: Functional Dependencies (Autosaved)

Computing closure of FFor each g R, we find the closure g+, and for each S g+, we output a functional dependency g S.

Canonical CoverCanonical CoverCanonical cover of F is a “minimal” set of functional dependencies equivalent to

F, having no redundant dependencies or redundant parts of dependencies. A canonical cover for F is a set of dependencies Fc such that

F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F, and No functional dependency in Fc contains an extraneous attribute Each left side of functional dependency in Fc is unique

Sets of functional dependencies may have redundant dependencies that can be inferred from the othersFor example: A C is redundant in: {A B, B C}Parts of a functional dependency may be redundant E.g.: on RHS: {A B, B C, A CD} can be simplified to

{A B, B C, A D} E.g.: on LHS: {A B, B C, AC D} can be simplified to

{A B, B C, A D}canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies

Testing if an Attribute is ExtraneousTesting if an Attribute is ExtraneousConsider a set F of functional dependencies and the functional dependency in F.

To test if attribute A Î is extraneous in compute ({} – A)+ using the dependencies in F check that ({} – A)+ contains A;

if it does, A is extraneous To test if attribute A Î is extraneous in

compute + using only the dependencies in F’ = (F – { }) È { ( – A)}, check that + contains A; if it does, A is extraneous

Extraneous AttributesExtraneous Attributes Example: Given F = {A C, AB C }B is extraneous in AB C because {A C,

AB C} logically implies A C (I.e. the result of dropping B from AB C). Example: Given F = {A C, AB CD}C is extraneous in AB CD since AB

C can be inferred even after deleting C

Canonical CoverCanonical CoverTo compute a canonical cover for F:repeat

Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2

Find a functional dependency with an extraneous attribute either in or in

If an extraneous attribute is found, delete it from until F does not change

Page 7: Functional Dependencies (Autosaved)

Computing a Canonical CoverComputing a Canonical CoverR = (A, B, C)F = {A BC

B C A BAB C}

Combine A BC and A B into A BC Set is now {A BC, B C, AB C}

A is extraneous in AB C Check if the result of deleting A from AB C is implied by the other

dependencies Yes: in fact, B C is already present!

Set is now {A BC, B C} C is extraneous in A BC

Check if A C is logically implied by A B and the other dependencies Yes: using transitivity on A B and B C.

– Can use attribute closure of A in more complex casesThe canonical cover is: A B B C

Lossless-join DecompositionLossless-join Decomposition

For the case of R = (R1, R2)we require that for all possible relations r on schema Rr = R1 (r ) R2 (r )

A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+:

R1 R2 R1

R1 R2 R2

Example of Lossy-Join Decomposition

A B

121

A

B12

rA(r) B(r)

A (r) B (r)

A B

1212

Page 8: Functional Dependencies (Autosaved)

Dependency PreservationDependency Preservation Let Fi be the set of dependencies F + that include only attributes in Ri.

A decomposition is dependency preserving, if (F1 È F2 È … Fn )+ = F +

If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.

Testing for Dependency PreservationTesting for Dependency Preservation To check if a dependency is preserved in a decomposition of R into R1, R2,

Rn we apply the following test (with attribute closure done with respect to F) result =

while (changes to result) dofor each Ri in the decomposition

t = (result Ri)+ Ri

result = result È t If result contains all attributes in , then the functional dependency

is preserved. We apply the test on all dependencies in F to check if a decomposition is

dependency preserving This procedure takes polynomial time, instead of the exponential time required to

compute F+ and (F1 È F2 È … È Fn)+ Example 1Example 1 R = (A, B, C)

F = {A B, B C) Can be decomposed in two different ways

R1 = (A, B), R2 = (B, C) Lossless-join decomposition:

R1 R2 = {B} and B BC Dependency preserving

R1 = (A, B), R2 = (A, C) Lossless-join decomposition:

R1 R2 = {A} and A AB Not dependency preserving

(cannot check B C without computing R1 R2)Example2Example2 R = (A, B, C )

F = {A B B C}

Key = {A} R is not in BCNF Decomposition R1 = (A, B), R2 = (B, C)

R1 and R2 in BCNF Lossless-join decomposition Dependency preserving

What is normalisation?

Page 9: Functional Dependencies (Autosaved)

Normalisation is the process of taking data from a problem and reducing it to a set of relations while ensuring data integrity and eliminating data redundancy

Data integrity - all of the data in the database are consistent, and satisfy all integrity constraints.

Data redundancy – if data in the database can be found in two different locations then the data is said to contain redundancy. Data redundancy resulrs in insertion, update, and deletion anomalies—that could lead to a loss of data integrity

Background to normalization: definitions

Functional dependency: Attribute B has a functional dependency on attribute A (i.e., A → B) if, for each value of attribute A, there is exactly one value of attribute B.

Trivial functional dependency A trivial functional dependency is a functional dependency of an attribute on a superset of itself.

Full functional dependency An attribute is fully functionally dependent on a set of attributes X if it is

functionally dependent on X, and not functionally dependent on any proper subset of X. {Employee

Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency :A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

Multivalued dependency:A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

Join dependency :A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Superkey :A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.

Candidate key:A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.

Non-prime attribute:A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key :Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.

Page 10: Functional Dependencies (Autosaved)

Normal forms

Normal form Brief definitionFirst normal form (1NF) Table faithfully represents a relation and has

no repeating groups

Second normal form(2NF)No non-prime attribute in the table is functionally dependent on a part (proper subset) of a candidate key

Third normal form (3NF) Every non-prime attribute is non-transitively dependent on every key of the table

Boyce-Codd normal form(BCNF)

Every non-trivial functional dependency in the table is a dependency on a superkey

Fourth normal form (4NF) Every non-trivial multivalued dependency in the table is a dependency on a superkey

Fifth normal form (5NF) Every non-trivial join dependency in the table is implied by the superkeys of the table

Goals of Normalization

Let R be a relation scheme with a set F of functional dependencies. Decide whether a relation scheme R is in “good” form. In the case that a relation scheme R is not in “good” form, decompose it into a

set of relation scheme {R1, R2, ..., Rn} such that each relation scheme is in good form the decomposition is a lossless-join decomposition Preferably, the decomposition should be dependency preserving.

First Normal Form

Page 11: Functional Dependencies (Autosaved)

A relation is in 1NF if, and only if, it contains no repeating attributes or groups of attributes.

A relational schema R is in first normal form if the domains of all attributes of Rare atomic(indivisible units).

Student(Regno, name, date_of_birth, subject, grade )

Reg_no Name dob subject grade

100 Smith, J 14/11/1977DatabasesSoft_DevISDE

CAD

105 White, A 10/05/1975 Soft_DevISDE

BB

120 Moore, T 11/03/1970DatabasesSoft_DevWorkshop

ABC

The Student table with the repeating group is not in 1NF . Relational databases require that each row only has a single value per attribute, and so a repeating group in a row is not allowed. To remove the repeating group, one of two things can be done:

Either flatten the table and extend the keyWith the relation in its flattened form, strange anomalies appear in the

system. Redundant data is the main cause of insertion, deletion, and updating anomalies.

Decompose the relation- leading to First Normal Form Split the table into two parts, one for the repeating groups and one of the

non-repeating groups. The primary key for the original relation is included in both of the new

relations We now have two relations, Student and Record.

Record has the original repeating groups and the Reg_no

Record Record(Reg_no, subject, grade ) Reg_no subject grade960100 Databases C960100 Soft_Dev A960100 ISDE D960105 Soft_Dev B960105 ISDE B

Student contains the original non-repeating groups Student Student(Reg_no, name, dob )Reg_no name dob

Reg_no name dob Subject grade960100 Smith, J 14/11/1977 Databases C960100 Smith, J 14/11/1977 Soft_Dev A960100 Smith, J 14/11/1977 ISDE D960105 White, A 10/05/1975 Soft_Dev B960105 White, A 10/05/1975 ISDE B960120 Moore, T 11/03/1970 Databases A960120 Moore, T 11/03/1970 Soft_Dev B960120 Moore, T 11/03/1970 Workshop C

Page 12: Functional Dependencies (Autosaved)

960100 Smith,J 14/11/1977960105 White,A 10/05/1975960120 Moore,T 11/03/1970960145 Smith,J 09/01/1972960150 Black,D 21/08/1973This method has eliminated some of the anomalies. It does not always do so, it depends on the example chosen

In this case we no longer have the insertion anomaly.It is now possible to enter new students without knowing the subjects that they will be studying

They will exist only in the Student table, and will not be entered in the Record table until they are studying at least one subject.

We have also removed the deletion anomaly .If all of the `databases' subject records are removed, student 960145 still exists in the Student table.

We have also removed the update anomaly Student and Record are now in First Normal Form.

Second Normal FormA relation is in 2NF if, and only if, it is in 1NF and every non-key attribute is fully

functionally dependent on the whole key.A 1NF table is in 2NF if and only if none of its non-prime attributes are

functionally dependent on a part (proper subset) of a candidate key.Consider again the Student relation

Student(matric no, name, date_of_birth, subject, grade )

There are no repeating groups.The relation is already in 1NF However, we have a compound primary key - so we must check all of the non-

key attributes against each part of the key to ensure they are functionally dependent on it.

matric_no determines name and date_of_birth, but not grade. subject together with matric_no determines grade, but not name or date_of_birth

Figure : Dependency Diagram

This relation is not in 2NF. Split the relation up into its component parts.

separate out all the attributes that are solely dependent on matric_no ,put them in a new Student_details relation, with matric_no as the primary key

Page 13: Functional Dependencies (Autosaved)

separate out all the attributes that are solely dependent on subject. in this case no attributes are solely dependent on subject.

separate out all the attributes that are solely dependent on matric_no + subject put them into a separate Student relation, keyed on matric_no + subject

   

All attributes in each relation are fully functionally dependent upon its primary key

These relations are now in 2NF

Figure : Dependencies after splitting Third Normal Form

A table is in 3NF if and only if both of the following conditions hold:

The relation R (table) is in second normal form (2NF) Every non-prime attribute of R is non-transitively dependent (i.e. directly

dependent) on every key of R. A non-prime attribute of R is an attribute that does not belong to any candidate key of R.[ A transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, by virtue of X → Y and Y → Z (where it is not the case that Y → X).

A table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds:

X contains A (that is, X → A is trivial functional dependency), or X is a superkey , or A is a prime attribute (i.e., A is contained within a candidate key)

By definition transitive functional dependency can only occur if there is more than one non-key field, so we can say that a relation in 2NF with zero or one non-key field must automatically be in 3NF. project_no manager addressp1 Black,B 32 High Streetp2 Smith,J 11 New Streetp3 Black,B 32 High Streetp4 Black,B 32 High Street

    Project has more than one non-key field so we must check for transitive dependency:

address depends on the value in the manager column every time B Black is listed in the manager column, the address column has the value `32 High Street'. From this the relation and functional dependency can be implied as:

Project(project_no, manager, address)

   manager -> address

Page 14: Functional Dependencies (Autosaved)

in this case address is transitively dependent on manager. Manager is the determinant - it determines the value of address. It is transitive functional dependency only if all attributes on the left of the “->” are not in the key but are all in the relation, and all attributes to the right of the “->” are not in the key with at least one actually being in the relationData redundancy arises from this we duplicate address if a manager is in charge of more than one project causes problems if we had to change the address- have to change several entries, and this could lead to errors.

The solution is to eliminate transitive functional dependency by splitting the table

create two relations - one with the transitive dependency in it, and another for all of the remaining attributes.

split Project into Project and Manager. the determinant attribute becomes the primary key in the new relation

manager becomes the primary key to the Manager relation the original key is the primary key to the remaining non-transitive attributes in this case, project_no remains the key to the new Projects table.

Project project_no managerp1 Black,B p2 Smith,J p3 Black,B p4 Black,B

Manager manager addressBlack,B 32 High StreetSmith,J 11 New Street

Now we need to store the address only once If we need to know a manager's address we can look it up in the Manager relation.The manager attribute is the link between the two tables, and in the Projects table it is now a foreign key. These relations are now in third normal form.

Example

Relation schema:cust_banker_branch = (customer_id, employee_id, branch_name, type ) The functional dependencies for this relation schema are:

1. customer_id, employee_id branch_name, type2. employee_id branch_name 3. customer_id, branch_name employee_id

FC

1. customer_id, employee_id type2. employee_id branch_name3. customer_id, branch_name employee_id

Page 15: Functional Dependencies (Autosaved)

4.Decomposed into 3NF

R1:(customer_id, employee_id, type) R2:(customer_id, branch_name, employee_id) Boyce-Codd Normal Form

Using functional dependencies, we can define several normal forms that represent “good” database designs. In this section we cover BCNF (defined below), and later, in Section, we cover 3NF.

Definition

One of the more desirable normal forms that we can obtain is Boyce-Codd normal form (BCNF). A relation schema R is in BCNF with respect to a set F of functional dependencies if, for all functional dependencies in F+ of the form , where R and R, at least on of the following holds:

is a trivial functional dependency (that is, ). is a superkey for schema R.A database design is in BCNF if each member of the set of relation schemas that constitutes

the design is in BCNF.

As an illustration, consider the following relation schemas and their respective functional dependencies:

Customer-schema = (customer-name, customer-street, customer-city)Customer-name customer-street customer-city

Branch-schema =(branch-name, assets, branch-city)branch-name assets branch-city

Loan-info-schema= (branch-name, customer-name, loan-number, amount)loan-number amount branch-name

We claim that Customer-schema is in BCNF. We note that a candidate key for the schema is customer-name. The only nontrivial functional dependencies that hold on Customer-schema have customer-name on the left side of the arrow. Since customer-name is a candidate key, functional dependencies with customer-name on the left side do not violate the definition of BCNF. Similarly, it can be shown easily that the relation schema Branch-schema is in BCNF.

The schema Loan-info-schema, however, is not in BCNF. First, note that loan-number is not a superkey for Loan-info-schema, since we could have a pair of tuples representing a single-loan made to two people-for example,

(Downtown, John Bell, L-44, 1000)

Page 16: Functional Dependencies (Autosaved)

(Downtown, Jane Bell, L-44, 1000)

Because we did not list functional dependencies that rule out the preceding case, loan-number is not a candidate key. However, the functional loan-number amount is nontrivial. Therefore, Loan-info-schema does not satisfy the definition of BCNF.

We claim that Loan-info-schema is not in a desirable form, since it suffers from the problem of repetition of information that we described in Section. We observe that, if there are several customer names associated with a loan, in a relation on Loan-info-schema, then we are forced to repeat the branch names and the amount once for each customer. We can eliminate this redundancy by redesigning our database such that all schemas are in BCNF. Open approach to this problem is to take the existing non-BCNF design as a starting point, and to decompose those schemas that are not in BCNF. Consider the decomposition of Loan-info-schema into two schemas:

Loan-schema = (loan-number, branch-name, amount)

Borrower-schema = (customer-name, loan-number)

This decomposition is a lossless-join decomposition.

To determine whether these schemas are in BCNF, we need to determine what functional dependencies apply to them. In this example, it is easy to see that

loan-number amount branch-name

applies to the Loan-schema, and that only trivial functional dependencies apply to Borrower-schema. Although loan-number is not a superkey for Loan-info-schema, it is a candidate key for Loan-schema. Thus, both schemas of our decomposition are in BCNF.

It is now possible to avoid redundancy in the case where there are several customers associated with a loan. There is exactly one tuple for each loan in the relation on Loan-schema, and one tuple for each customer of each loan in the relation on Borrower-schema. Thus, we do not have to repeat the branch name and the amount once for each customer associated with a loan. Often testing of a relation to see if it satisfies BCNF can be simplified:

To check if a nontrivial dependency causes a violation of BCNF, compute + (the attribute closure of ), and verify that it includes all attributes of R; that is, it is a superkey of R.

To check if a relation schema R is in BCNF it suffices to check only the dependencies in the given set F for violation of BCNF, rather than check all dependencies in F+.We can show that if none of the dependencies in F causes a violation of BCNF, then

none of the dependencies in F+ will cause a violation of BCNF either.

Unfortunately, the latter procedure does not work when a relation is decomposed. That is, it does not suffice to use F when we test a relation R i, in a decomposition of R, for violation of BCNF. For example, consider relation schema R (A, B, C, D, E), with functional dependencies F

Page 17: Functional Dependencies (Autosaved)

containing A B and BC D. Suppose this were decomposed into R1(A,B) and R2 (A, C, D, E). Now, neither of the dependencies in F contains only attributes from (A, C, D, E) so we might be misled into thinking R2 satisfies BCNF. In fact, there is a dependency AC D in F+ (which can be inferred using the pseudo transitivity rule form the two dependencies in F), which shows that R2 is not in BCNF. Thus, we may need a dependency that is in F+, but is not in F, to show that a decomposed relation is not in BCNF.

Decomposition Algorithm in BCNF

The decomposition that the algorithm generates is not only in BCNF, but is also a lossless-join decomposition. To see why our algorithm generates only lossless-join decompositions, we note that, when we replace a schema R i with (Ri - ) and (, ), the dependency holds, and (Ri - ) (, ) = .

result: = {R}; done: = false;

compute F+;

while (not done) do

if (there is a schema Ri in result that is not in BCNF)

then begin

let be a nontrivial functional dependency that holds

on Ri such that Ri is not in F+, and = Æ;

result; = (result – Ri) È (, );

end

else done: = true;

Figure BCNF decomposition algorithm.

We apply the BCNF decomposition algorithm to the Lending-schema that we used in Section as an example of a poor database design:

Lending-schema = (branch-name, branch-city.assets, customer-

name, loan-number, amount)

Page 18: Functional Dependencies (Autosaved)

The set of functional dependencies that we require to hold on Lending-schema are

branch-name assets branch-city

loan-number amount branch-name

A candidate key for this schema is {loan-number, customer-name}.

We can apply the algorithm of Figure to the Lending-schema example as follows:

The functional dependency

branch-name assets branch-city

holds on Lending-schema, but branch-name is not a superkey. Thus, Lending-schema is not in BCNF. We replace Lending-schema by

Branch-schema = (branch-name, branch-city, assets)

Loan-info-schema = (branch-name, customer-name, loan-number, amount)

The only nontrivial functional dependencies that hold on Brnach-schema include branch-name on the left side of the arrow. Since branch-name is a key for Branch-schema, the relation Branch-schema is in BCNF.

The functional dependency

loan – number amount branch-name

holds on Loan-info-schema, but loan-number is not a key for Loan-info-schema. We replace Loan-inof-schema by

Loan-schema = (loan-number, branch-name, amount)

Page 19: Functional Dependencies (Autosaved)

Borrower-schema = (customer-name, loan number)

Loan-schema and Borrower-schema are in BCNF/

Thus the decomposition of Lending schema results in the three relation schemas Branch-schema, Loan-schema, and Borrower-schema, each of which is in BCNF. These relation schemas are the same as those in Section. Where we demonstrated that the resulting decomposition is both a lossless-join decomposition and a dependency-preserving decomposition.

The BCNF decomposition algorithm takes time exponential in the size of the initial schema, since the algorithm for checking if a relation in the decomposition satisfies BCNF can take exponential time. The bibliographical notes provide references to an algorithm that can compute a BCNF decomposition in polynomial time. However, the algorithm may “overnormalize”, that is, decompose a relation unnecessarily.

Comparison of BCNF and 3NF

Of the two normal forms for relational database schemas, 3NF and BCNF, there are advantages to 3NF in that we know that it is always possible to obtain a 3NF design without sacrificing a lossless join or dependency preservation. Nevertheless, there are disadvantages to 3NF. If we do not eliminate all transitive relations schema dependencies, we may have to use null values to represent some of the possible meaningful relationships among data items, and there is the problem of repetition of information.

As an illustration of the null value problem, consider again the Banker-schema and its associated functional dependencies. Since banker-name branch-name, we may want to represent relationships between values for banker-name and values for branch-name in our database. If we are to do so, however, either there must be a corresponding value for customer-name, or we must use a null value for the attribute customer-name.

customer-name banker-name branch-name

Jones

Smith

Hayes

Jackson

Curry

Johnson

Johnson

Johnson

Johnson

Johnson

Perryridge

Perryridge

Perryridge

Perryridge

Perryridge

Page 20: Functional Dependencies (Autosaved)

TurnerJohnson Perryridge

Figure An instance of Banker-schema.

As an illustration of the repletion of information problem, consider the instance of Banker-schema in Figure. Notice that the information indicating that Johnson is working at the Perryridge branch is repeated.

Recall that our goals of database design with functional dependencies are:

1. BCNF2. Lossless join3. Dependency preservationSince it is not always possible to satisfy all three, we may be forced to choose between

BCNF and dependency preservation with 3NF.