normal form

24
This is totally made from wiki-pedia and for nonprofit purpose. - Partha De Some definitions: Functional dependency: Attribute B has a functional dependency on attribute A (i.e., A → B) if, for each value of attribute A, there is exactly one value of attribute B. If value of A is repeating in tuples then value of B will also repeat. In our example, Employee Address has a functional dependency on Employee ID, because a particular Employee ID value corresponds to one and only one Employee Address value. (Note that the reverse need not be true: several employees could live at the same address and therefore one Employee Address value could correspond to more than one Employee ID. Employee ID is therefore not functionally dependent on Employee Address.) An attribute may be functionally dependent either on a single attribute or on a combination of attributes. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the problem domain. For example, an Employer may require certain employees to split their time between two locations, such as New York City and London, and therefore want to allow Employees to have more than one Employee Address. In this case, Employee Address would no longer be functionally dependent on Employee ID. Another way to look at the above is by reviewing basic mathematical functions: Let F(x) be a mathematical function of one independent variable. The independent variable is analogous to the attribute A. The dependent variable (or the dependent attribute using the terminology above), and hence the term functional dependency, is the value of F(A); A is an independent attribute. As we know, mathematical functions can have only one output. Notationally speaking, it is common to express this relationship in mathematics as F(A) = B; or, F : A → B. There are also functions of more than one independent variable—commonly, this is referred to as multivariable functions. This idea represents an attribute being functionally dependent on a combination of attributes. Hence, F(x,y,z) contains three independent variables, or independent attributes, and one dependent attribute, namely, F(x,y,z). In multivariable functions, there can only be one output, or one dependent variable, or attribute. Trivial functional dependency A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}. Full functional dependency An attribute is fully functionally dependent on a set of attributes X if it is functionally dependent on X, and not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}. Transitive dependency: A transitive dependency is an indirect functional dependency, one in which XZ only by virtue of XY and YZ.

Upload: dparthade

Post on 18-Nov-2014

625 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Normal Form

This is totally made from wiki-pedia and for nonprofit purpose. - Partha De

Some definitions:

• Functional dependency: Attribute B has a functional dependency on attribute A (i.e., A → B) if, for each value of attribute A, there is exactly one value of attribute B. If value of A is repeating in tuples then value of B will also repeat. In our example, Employee Address has a functional dependency on Employee ID, because a particular Employee ID value corresponds to one and only one Employee Address value. (Note that the reverse need not be true: several employees could live at the same address and therefore one Employee Address value could correspond to more than one Employee ID. Employee ID is therefore not functionally dependent on Employee Address.) An attribute may be functionally dependent either on a single attribute or on a combination of attributes. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the problem domain. For example, an Employer may require certain employees to split their time between two locations, such as New York City and London, and therefore want to allow Employees to have more than one Employee Address. In this case, Employee Address would no longer be functionally dependent on Employee ID.

Another way to look at the above is by reviewing basic mathematical functions: Let F(x) be a mathematical function of one independent variable. The independent variable is analogous to the attribute A. The dependent variable (or the dependent attribute using the terminology above), and hence the term functional dependency, is the value of F(A); A is an independent attribute. As we know, mathematical functions can have only one output. Notationally speaking, it is common to express this relationship in mathematics as F(A) = B; or, F : A → B. There are also functions of more than one independent variable—commonly, this is referred to as multivariable functions. This idea represents an attribute being functionally dependent on a combination of attributes. Hence, F(x,y,z) contains three independent variables, or independent attributes, and one dependent attribute, namely, F(x,y,z). In multivariable functions, there can only be one output, or one dependent variable, or attribute. Trivial functional dependency

A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.

Full functional dependency An attribute is fully functionally dependent on a set of attributes X if it is

• functionally dependent on X, and • not functionally dependent on any proper subset of X. {Employee Address} has a

functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency: A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

Page 2: Normal Form

Multivalued dependency: A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

Join dependency: A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Superkey: A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.

Candidate key: A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.

Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key: Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.

First Norman Form:

First normal form (1NF or Minimal Form) is a normal form used in database normalization. A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. These criteria are basically concerned with ensuring that the table is a faithful representation of a relation[1] and that it is free of repeating groups.[2] The concept of a "repeating group" is, however, understood in different ways by different theorists. As a consequence, there is no universal agreement as to which features would disqualify a table from being in 1NF. Most notably, 1NF as defined by some authors (for example, Ramez Elmasri and Shamkant B. Navathe,[3] following the precedent established by Edgar F. Codd) excludes relation-valued attributes (tables within tables); whereas 1NF as defined by other authors (for example, Chris Date) permits them.

1NF tables as representations of relations According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:

1. There's no top-to-bottom ordering to the rows.

Page 4: Normal Form

The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one value:

Customer

Customer ID First Name Surname Telephone Number

123 Robert Ingram 555-861-2025

456 Jane Wright 555-403-1659 555-776-4100

789 Maria Fernandez 555-808-9633

Assuming, however, that the Telephone Number column is defined on some Telephone Number-like domain (e.g. the domain of strings 12 characters in length), the representation above is not in 1NF. 1NF (and, for that matter, the RDBMS) prohibits a field from containing more than one value from its column's domain.

The designer might attempt to get around this restriction by defining multiple Telephone Number columns:

Repeating groups across columns

Customer

Customer ID First Name Surname Tel. No. Tel. No. Tel. No.

123 Robert Ingram 555-861-2025

456 Jane Wright 555-403-1659 555-776-4100

789 Maria Fernandez 555-808-9633

This representation, however, makes use of nullable columns, and therefore does not conform to Date's definition of 1NF. Even if the view is taken that nullable columns are allowed, the design is not in keeping with the spirit of 1NF. Tel. No. 1, Tel. No. 2., and Tel. No. 3. share exactly the

Page 5: Normal Form

same domain and exactly the same meaning; the splitting of Telephone Number into three headings is artificial and causes logical problems. These problems include:

• Difficulty in querying the table. Answering such questions as "Which customers have telephone number X?" and "Which pairs of customers share a telephone number?" is awkward.

• Inability to enforce uniqueness of Customer-to-Telephone Number links through the RDBMS. Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly the same as her Tel. No. 1 value.

• Restriction of the number of telephone numbers per customer to three. If a customer with four telephone numbers comes along, we are constrained to record only three and leave the fourth unrecorded. This means that the database design is imposing constraints on the business process, rather than (as should ideally be the case) vice-versa.

The designer might, alternatively, retain the single Telephone Number column but alter its domain, making it a string of sufficient length to accommodate multiple telephone numbers:

Repeating groups within columns

This design is not consistent with 1NF, and presents several design issues. The Telephone Number heading becomes semantically woolly, as it can now represent either a telephone number, a list of telephone numbers, or indeed anything at all. A query such as "Which pairs of customers share a telephone number?" is more difficult to formulate, given the necessity to cater for lists of telephone numbers as well as individual telephone numbers. Meaningful constraints on telephone numbers are also very difficult to define in the RDBMS with this design.

A design that is unambiguously in 1NF makes use of two tables: a Customer Name table and a Customer Telephone Number table.

A design that complies with 1NF

Customer

Customer ID First Name Surname Telephone Number

123 Robert Ingram 555-861-2025

456 Jane Wright 555-403-1659, 555-776-4100

789 Maria Fernandez 555-808-9633

Page 6: Normal Form

Customer Name

Customer ID

First Name

Surname

123 Robert Ingram

456 Jane Wright

789 Maria Fernandez

Customer Telephone Number

Customer ID

Telephone Number

123 555-861-2025

456 555-403-1659

456 555-776-4100

789 555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record.

Some definitions of 1NF, most notably that of

Atomicity

Edgar F. Codd, make reference to the concept of atomicity. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."[9] Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)."[10]

Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.[11][12]

• A character string would seem not be atomic, as the RDBMS typically provides operators to decompose it into substrings.

In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:

• A date would seem not to be atomic, as the RDBMS typically provides operators to decompose it into day, month, and year components.

• A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components.

Date suggests that "the notion of atomicity has no absolute meaning":[13] a value may be considered atomic for some purposes, but may be considered an assemblage of more basic

Page 7: Normal Form

elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to array types and table types) are then acceptable in a 1NF table—although perhaps not always desirable. Date argues that relation-valued attributes, by means of which a field within a table can contain a table, are useful in rare cases.[14]

Any table that is in

Normalization beyond 1NF

second normal form (2NF) or higher is, by definition, also in 1NF (each normal form has more stringent criteria than its predecessor). On the other hand, a table that is in 1NF may or may not be in 2NF; if it is in 2NF, it may or may not be in 3NF, and so on.

Normal forms higher than 1NF are intended to deal with situations in which a table suffers from design problems that may compromise the integrity of the data within it. For example, the following table is in 1NF, but is not in 2NF and therefore is vulnerable to logical inconsistencies:

Subscriber Email Addresses

Subscriber ID Email Address Subscriber First Name Subscriber Surname

108 [email protected] Steve Wallace

252 [email protected] Carol Robertson

252 [email protected] Carol Robertson

360 [email protected] Harriet Clark

The table's key is {Subscriber ID, Email Address}.

If Carol Robertson changes her surname by marriage, the change must be applied to two rows. If the change is only applied to one row, a contradiction results: the question "What is Customer 252's name?" has two conflicting answers. 2NF addresses this problem.

Page 8: Normal Form

Second normal form Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E.F. Codd[1] in 1971. A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a 1NF table is in 2NF if and only if, given any candidate key and any attribute that is not a constituent of a candidate key, the non-key attribute depends upon the whole of the candidate key rather than just a part of it.

In slightly more formal terms: a 1NF table is in 2NF if and only if none of its non-prime attributes are functionally dependent on a part (proper subset) of a candidate key. (A non-prime attribute is one that does not belong to any candidate key.)

Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.

Consider a table describing employees' skills:

Example

Employees' Skills

Employee Skill Current Work Location

Jones Typing 114 Main Street

Jones Shorthand 114 Main Street

Jones Whittling 114 Main Street

Bravo Light Cleaning 73 Industrial Way

Ellis Alchemy 73 Industrial Way

Ellis Juggling 73 Industrial Way

Harrison Light Cleaning 73 Industrial Way

Page 9: Normal Form

Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for the table.

The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?"

A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}:

Employees

Employee Current Work Location

Jones 114 Main Street

Bravo 73 Industrial Way

Ellis 73 Industrial Way

Harrison 73 Industrial Way

Employees' Skills

Employee Skill

Jones Typing

Jones Shorthand

Jones Whittling

Bravo Light Cleaning

Ellis Alchemy

Ellis Juggling

Harrison Light Cleaning

Neither of these tables can suffer from update anomalies.

Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update anomalies is:

Page 10: Normal Form

Tournament Winners

Tournament Year Winner Winner Date of Birth

Des Moines Masters 1998 Chip Masterson 14 March 1977

Indiana Invitational 1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 September 1968

Des Moines Masters 1999 Al Fredrickson 21 July 1975

Indiana Invitational 1999 Chip Masterson 14 March 1977

Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This problem is addressed by third normal form (3NF).

A table for which there are no partial functional dependencies on the primary key is typically, but not always, in 2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.Multiple candidate keys occur in the following table:

2NF and candidate keys

Electric Toothbrush Models

Manufacturer Model Model Full Name Manufacturer Country

Forte X-Prime Forte X-Prime Italy

Forte Ultraclean Forte Ultraclean Italy

Dent-o-Fresh EZbrush Dent-o-Fresh EZBrush USA

Page 11: Normal Form

Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer.

Third normal form The third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E.F. Codd[1] in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:

• The relation R (table) is in second normal form (2NF) • Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every

key of R.

A non-prime attribute of R is an attribute that does not belong to any candidate key of R.[2] A transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, by virtue of X → Y and Y → Z (where it is not the case that Y → X).[3]

A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds:

• X contains A (that is, X �W��A is trivial functional dependency), or • X is a superkey, or • A is a prime attribute (i.e., A is contained within a candidate key)[4]

Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third alternative ("A is a prime attribute").

A memorable summary of Codd's definition of 3NF, paralleling the traditional

"Nothing but the key"

pledge to give true evidence in a court of law, was given by Bill Kent: every non-key attribute "must provide a fact about the key, the whole key, and nothing but the key."[5] A common variation supplements this definition with the oath: "so help me Codd".[6]

Requiring that non-key attributes be dependent on "the whole key" ensures that a table is in 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures that the table is in 3NF.

Chris Date refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and notes that with slight adaptation it may serve as a definition of the slightly-stronger Boyce-Codd normal form: "Each attribute must represent a fact about the key, the whole key, and nothing other than the key."[7] The 3NF version of the definition is weaker than Date's BCNF variation,

Page 12: Normal Form

as the 3NF is concerned only with ensuring that non-key attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must be functionally independentl; they each represent a fact about the key in the sense of providing part or all of the key itself. (It should be noted here that this rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.)

An example of a 2NF table that fails to meet the requirements of 3NF is:

Example

Tournament Winners

Tournament Year Winner Winner Date of

Birth

Indiana Invitational

1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 September 1968

Des Moines Masters

1999 Al Fredrickson 21 July 1975

Indiana Invitational

1999 Chip Masterson

14 March 1977

Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.In order to express the same facts without violating 3NF, it is necessary to split the table into two:

Page 13: Normal Form

Tournament Winners

Tournament Year Winner

Indiana Invitational

1998 Al Fredrickson

Cleveland Open

1999 Bob Albertson

Des Moines Masters

1999 Al Fredrickson

Indiana Invitational

1999 Chip Masterson

Player Dates of Birth

Player Date of Birth

Chip Masterson 14 March 1977

Al Fredrickson 21 July 1975

Bob Albertson 28 September 1968

Update anomalies cannot occur in these tables, which are both in 3NF.

Boyce-Codd normal form Boyce-Codd normal form (or BCNF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). A table is in Boyce-Codd normal form if and only if, for every one of its non-trivial functional dependencies X → Y, X is a superkey—that is, X is either a candidate key or a superset thereof.

BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by 3NF as originally defined.[1]

Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper by Ian Heath in 1971.[2] Date writes:

"Since that definition predated Boyce and Codd's own definition by some three years, it seems to me that BCNF ought by rights to be called Heath normal form. But it isn't."[3]

Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not have multiple overlapping candidate keys is guaranteed to be in BCNF.

3NF tables not meeting BCNF

[4] Depending on what its functional dependencies are, a 3NF table with two or more overlapping candidate keys may or may not be in BCNF.

An example of a 3NF table that does not meet BCNF is:

Page 14: Normal Form

Today's Court Bookings

Court Start Time End Time Rate Type

1 09:30 10:30 SAVER

1 11:00 12:00 SAVER

1 14:00 15:30 STANDARD

2 10:00 11:30 PREMIUM-B

2 11:30 13:30 PREMIUM-B

2 15:00 16:30 PREMIUM-A

• Each row in the table represents a court booking at a tennis club that has one hard court (Court 1) and one grass court (Court 2)

• A booking is defined by its Court and the period for which the Court is reserved • Additionally, each booking has a Rate Type associated with it. There are four distinct rate types:

• SAVER, for Court 1 bookings made by members • STANDARD, for Court 1 bookings made by non-members • PREMIUM-A, for Court 2 bookings made by members • PREMIUM-B, for Court 2 bookings made by non-members

The table's candidate keys are:

• {Court, Start Time} • {Court, End Time} • {Rate Type, Start Time} • {Rate Type, End Time}

Recall that 2NF prohibits partial functional dependencies of non-prime attributes on candidate keys, and that 3NF prohibits transitive functional dependencies of non-prime attributes on candidate keys. In the Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong to candidate keys. Therefore the table adheres to both 2NF and 3NF.

The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in which the determining attribute (Rate Type) is neither a candidate key nor a superset of a candidate key.

Page 15: Normal Form

Any table that falls short of BCNF will be vulnerable to logical inconsistencies. In this example, enforcing the candidate keys will not ensure that the dependency Rate Type → Court is respected. There is, for instance, nothing to stop us from assigning a PREMIUM A Rate Type to a Court 1 booking as well as a Court 2 booking—a clear contradiction, as a Rate Type should only ever apply to a single Court.

The design can be amended so that it meets BCNF:

Rate Types

Rate Type Court Member Flag

SAVER 1 Yes

STANDARD 1 No

PREMIUM-A 2 Yes

PREMIUM-B 2 No

Today's Bookings

Court Start Time

End Time

Member Flag

1 09:30 10:30 Yes

1 11:00 12:00 Yes

1 14:00 15:30 No

2 10:00 11:30 No

2 11:30 13:30 No

2 15:00 16:30 Yes

The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; the candidate keys for the Today's Bookings table are {Court, Start Time} and {Court, End Time}. Both tables are in BCNF. Having one Rate Type associated with two different Courts is now impossible, so the anomaly affecting the original table has been eliminated.

In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the dependencies that held in the original table. Beeri and Bernstein showed in 1979 that, for example, a set of functional dependencies {AB → C, C → B} cannot be represented by a BCNF schema.

Achievability of BCNF

[5] Thus, unlike the first three normal forms, BCNF is not always achievable.

Consider the following non-BCNF table whose functional dependencies follow the {AB → C, C → B} pattern:

Page 16: Normal Form

Nearest Shops

Person Shop Type Nearest Shop

Davidson Optician Eagle Eye

Davidson Hairdresser Snippets

Wright Bookshop Merlin Books

Fuller Bakery Doughy's

Fuller Hairdresser Sweeney Todd's

Fuller Optician Eagle Eye

For each Person / Shop Type combination, the table tells us which shop of this type is geographically nearest to the person's home. We assume for simplicity that a single shop cannot be of more than one type.

The candidate keys of the table are:

• {Person, Shop Type} • {Person, Nearest Shop}

Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is functionally dependent on a non-superkey: Nearest Shop.

The violation of BCNF means that the table is subject to anomalies. For example, Eagle Eye might have its Shop Type changed to "Optometrist" on its "Fuller" record while retaining the Shop Type "Optician" on its "Davidson" record. This would imply contradictory answers to the question: "What is Eagle Eye's Shop Type?" Holding each shop's Shop Type only once would seem preferable, as doing so would prevent such anomalies from occurring:

Page 17: Normal Form

Shop Near Person

Person Shop

Davidson Eagle Eye

Davidson Snippets

Wright Merlin Books

Fuller Doughy's

Fuller Sweeney Todd's

Fuller Eagle Eye

Shop Shop Type

Eagle Eye Optician

Snippets Hairdresser

Merlin Books Bookshop

Doughy's Bakery

Sweeney Todd's Hairdresser

In this revised design , the "Shop Near Person" table has a candidate key of {Person, Shop}, and the "Shop" table has a candidate key of {Shop}. Unfortunately, although this design adheres to BCNF, it is unacceptable on different grounds: it allows us to record multiple shops of the same type against the same person. In other words, its candidate keys do not guarantee that the functional dependency {Person, Shop Type} → {Shop} will be respected.

A design that eliminates all of these anomalies (but does not conform to BCNF) is possible.[6] This design consists of the original "Nearest Shops" table supplemented by the "Shop" table described above.

Nearest Shops

Person Shop Type Nearest Shop

Davidson Optician Eagle Eye

Davidson Hairdresser Snippets

Shop

Shop Shop Type

Eagle Eye Optician

Snippets Hairdresser

Page 18: Normal Form

Wright Bookshop Merlin Books

Fuller Bakery Doughy's

Fuller Hairdresser Sweeney Todd's

Fuller Optician Eagle Eye

Merlin Books Bookshop

Doughy's Bakery

Sweeney Todd's Hairdresser

If a referential integrity constraint is defined to the effect that {Shop Type, Nearest Shop} from the first table must refer to a {Shop Type, Shop} from the second table, then the data anomalies described previously are prevented.

Fourth normal form Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce-Codd normal form (BCNF). Whereas the second, third, and Boyce-Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X →→ Y, X is a superkey—that is, X is either a candidate key or a superset thereof.[1]

If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A

Multivalued dependencies

multivalued dependency X →→ Y signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we will find that xc

A

is associated with the same y entries regardless of z.

A trivial multivalued dependency X →→ Y is one in which Y consists of all columns not belonging to X. That is, a subset of attributes in a table has a trivial multivalued dependency on the remaining subset of attributes.

functional dependency is a special case of multivalued dependency. In a functional dependency X → Y, every x determines exactly one y, never more than one.

Page 19: Normal Form

Consider the following example:

Example

Pizza Delivery Permutations

Restaurant Pizza Variety Delivery Area

A1 Pizza Thick Crust Springfield

A1 Pizza Thick Crust Shelbyville

A1 Pizza Thick Crust Capital City

A1 Pizza Stuffed Crust Springfield

A1 Pizza Stuffed Crust Shelbyville

A1 Pizza Stuffed Crust Capital City

Elite Pizza Thin Crust Capital City

Elite Pizza Stuffed Crust Capital City

Vincenzo's Pizza Thick Crust Springfield

Vincenzo's Pizza Thick Crust Shelbyville

Vincenzo's Pizza Thin Crust Springfield

Vincenzo's Pizza Thin Crust Shelbyville

Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.

Page 20: Normal Form

The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not affected by delivery area, then it does not meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:

• {Restaurant} �W�W���‚�W�]�Ì�Ì�����s���Œ�]���š�Ç�ƒ • {Restaurant} �W�W���‚�����o�]�À���Œ�Ç�����Œ�����ƒ

These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza start producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant} →→ {Pizza Variety}.

To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table from the facts about delivery areas, yielding two tables that are both in 4NF:

Varieties By Restaurant

Restaurant Pizza Variety

A1 Pizza Thick Crust

A1 Pizza Stuffed Crust

Elite Pizza Thin Crust

Elite Pizza Stuffed Crust

Vincenzo's Pizza

Thick Crust

Vincenzo's Pizza

Thin Crust

Delivery Areas By Restaurant

Restaurant Delivery Area

A1 Pizza Springfield

A1 Pizza Shelbyville

A1 Pizza Capital City

Elite Pizza Capital City

Vincenzo's Pizza

Springfield

Vincenzo's Pizza

Shelbyville

Page 22: Normal Form

Jack Schneider Acme Breadbox

Willy Loman Robusto Pruning Shears

Willy Loman Robusto Vacuum Cleaner

Willy Loman Robusto Breadbox

Willy Loman Robusto Umbrella Stand

Louis Ferguson Robusto Vacuum Cleaner

Louis Ferguson Robusto Telescope

Louis Ferguson Acme Vacuum Cleaner

Louis Ferguson Acme Lava Lamp

Louis Ferguson Nimbus Tie Rack

The table's predicate is: Products of the type designated by Product Type, made by the brand designated by Brand, are available from the travelling salesman designated by Travelling Salesman.

In the absence of any rules restricting the valid possible combinations of Travelling Salesman, Brand, and Product Type, the three-attribute table above is necessary in order to model the situation correctly.

Suppose, however, that the following rule applies: A Travelling Salesman has certain Brands and certain Product Types in his repertoire. If Brand B is in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B makes Product Type P), the Travelling Salesman must offer products of Product Type P made by Brand B.

In that case, it is possible to split the table into three:

Page 23: Normal Form

Product Types By Travelling Salesman

Travelling Salesman

Product Type

Jack Schneider Vacuum Cleaner

Jack Schneider Breadbox

Willy Loman Pruning Shears

Willy Loman Vacuum Cleaner

Willy Loman Breadbox

Willy Loman Umbrella Stand

Louis Ferguson Telescope

Louis Ferguson Vacuum Cleaner

Louis Ferguson Lava Lamp

Louis Ferguson Tie Rack

Brands By Travelling Salesman

Travelling Salesman Brand

Jack Schneider Acme

Willy Loman Robusto

Louis Ferguson Robusto

Louis Ferguson Acme

Louis Ferguson Nimbus

Product Types By Brand

Brand Product Type

Acme Vacuum Cleaner

Acme Breadbox

Acme Lava Lamp

Robusto Pruning Shears

Robusto Vacuum Cleaner

Robusto Breadbox

Robusto Umbrella Stand

Robusto Telescope

Nimbus Tie Rack

Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's products. In the previous setup we would have to add two new entries since Jack Schneider is able to sell two Product Types covered by Robusto: Breadboxes and Vacuum

Page 24: Normal Form

Cleaners. With the new setup we need only add a single entry (in Brands By Travelling Salesman).

Only in rare situations does a

Usage

4NF table not conform to 5NF. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the possibility of such inconsistencies.