database normalization mihir

Upload: manoj-kumar

Post on 05-Apr-2018

251 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Database Normalization Mihir

    1/23

    Database models

    Adata model is not just a way of structuring data: it alsodefines a set of operations that can be performed on the data.

    The relational model, for example, defines operations such asselect,project, andjoin. Although these operations may not be

    explicit in a particularquery language, they provide thefoundation on which a query language is built.

    Contents

    [hide]

    1 Modelso 1.1 Flat model

    o 1.2 Hierarchical modelo 1.3 Network model

    o 1.4 Relational model

    1.4.1 Relational operations

    1.4.2 Normal Formso 1.5 Dimensional model

    o 1.6 Object database models

    2 References

    3 See Also

    Models

    Various techniques are used to model data structure. Most database systems are built

    around one particular data model, although it is increasingly common for products to offer

    support for more than one model. For any onelogical model various physicalimplementations may be possible, and most products will offer the user some level of

    control in tuning thephysical implementation, since the choices that are made have a

    significant effect on performance. An example of this is the relational model: all seriousimplementations of the relational model allow the creation of indexes which provide fast

    access to rows in a table if the values of certain columns are known.

    Flat model

    This may not strictly qualify as a data model, as defined above. The flat (or table) model

    consists of a single, two-dimensional array ofdata elements, where all members of a givencolumn are assumed to be similar values, and all members of a row are assumed to be

    related to one another. For instance, columns for name and password that might be used as

    a part of a system security database. Each row would have the specific password associated

    Database models

    Common models

    HierarchicalNetwork

    RelationalObject-relational

    Object

    Other models

    Associative

    Concept-oriented

    Multi-dimensional

    Star schemaXML database

    http://en.wikipedia.org/wiki/Data_modelhttp://en.wikipedia.org/wiki/Data_modelhttp://en.wikipedia.org/wiki/Select_(SQL)http://en.wikipedia.org/w/index.php?title=Project_(SQL)&action=edithttp://en.wikipedia.org/w/index.php?title=Project_(SQL)&action=edithttp://en.wikipedia.org/wiki/Join_(SQL)http://en.wikipedia.org/wiki/Query_languagehttp://en.wikipedia.org/wiki/Query_languagehttp://toggletoc%28%29/http://en.wikipedia.org/wiki/Database_models#Models%23Modelshttp://en.wikipedia.org/wiki/Database_models#Flat_model%23Flat_modelhttp://en.wikipedia.org/wiki/Database_models#Hierarchical_model%23Hierarchical_modelhttp://en.wikipedia.org/wiki/Database_models#Network_model%23Network_modelhttp://en.wikipedia.org/wiki/Database_models#Relational_model%23Relational_modelhttp://en.wikipedia.org/wiki/Database_models#Relational_operations%23Relational_operationshttp://en.wikipedia.org/wiki/Database_models#Normal_Forms%23Normal_Formshttp://en.wikipedia.org/wiki/Database_models#Dimensional_model%23Dimensional_modelhttp://en.wikipedia.org/wiki/Database_models#Object_database_models%23Object_database_modelshttp://en.wikipedia.org/wiki/Database_models#References%23Referenceshttp://en.wikipedia.org/wiki/Database_models#See_Also%23See_Alsohttp://en.wikipedia.org/wiki/Logical_modelhttp://en.wikipedia.org/wiki/Logical_modelhttp://en.wikipedia.org/w/index.php?title=Physical_implementation&action=edithttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Flat_file_databasehttp://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Object-relational_databasehttp://en.wikipedia.org/wiki/Object_databasehttp://en.wikipedia.org/wiki/Associative_model_of_datahttp://en.wikipedia.org/wiki/Concept-oriented_modelhttp://en.wikipedia.org/wiki/Multidimensional_databasehttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/XML_databasehttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Object-relational_databasehttp://en.wikipedia.org/wiki/Object_databasehttp://en.wikipedia.org/wiki/Associative_model_of_datahttp://en.wikipedia.org/wiki/Concept-oriented_modelhttp://en.wikipedia.org/wiki/Multidimensional_databasehttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/XML_databasehttp://en.wikipedia.org/wiki/Data_modelhttp://en.wikipedia.org/wiki/Select_(SQL)http://en.wikipedia.org/w/index.php?title=Project_(SQL)&action=edithttp://en.wikipedia.org/wiki/Join_(SQL)http://en.wikipedia.org/wiki/Query_languagehttp://toggletoc%28%29/http://en.wikipedia.org/wiki/Database_models#Models%23Modelshttp://en.wikipedia.org/wiki/Database_models#Flat_model%23Flat_modelhttp://en.wikipedia.org/wiki/Database_models#Hierarchical_model%23Hierarchical_modelhttp://en.wikipedia.org/wiki/Database_models#Network_model%23Network_modelhttp://en.wikipedia.org/wiki/Database_models#Relational_model%23Relational_modelhttp://en.wikipedia.org/wiki/Database_models#Relational_operations%23Relational_operationshttp://en.wikipedia.org/wiki/Database_models#Normal_Forms%23Normal_Formshttp://en.wikipedia.org/wiki/Database_models#Dimensional_model%23Dimensional_modelhttp://en.wikipedia.org/wiki/Database_models#Object_database_models%23Object_database_modelshttp://en.wikipedia.org/wiki/Database_models#References%23Referenceshttp://en.wikipedia.org/wiki/Database_models#See_Also%23See_Alsohttp://en.wikipedia.org/wiki/Logical_modelhttp://en.wikipedia.org/w/index.php?title=Physical_implementation&action=edithttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Flat_file_databasehttp://en.wikipedia.org/wiki/Data
  • 7/31/2019 Database Normalization Mihir

    2/23

    with an individual user. Columns of the table often have a type associated with them,

    defining them as character data, date or time information, integers, or floating point

    numbers. This model is, incidentally, a basis of thespreadsheet.

    Hierarchical model

    Main article:Hierarchical model

    In a hierarchical model, data is organized into a tree-like structure, implying a single

    upward link in each record to describe the nesting, and a sort field to keep the records in a

    particular order in each same-level list. Hierarchical structures were widely used in the

    early mainframe database management systems, such as the Information ManagementSystem (IMS) by IBM, and now describe the structure of XML documents. This structure

    allows one 1:N relationship between two types of data. This structure is very efficient to

    describe many relationships in the real world; recipes, table of contents, ordering ofparagraphs/verses, any nested and sorted information. However, the hierarchical structure

    is inefficient for certain database operations when a full path (as opposed to upward linkand sort field) is not also included for each record.

    One limitation of the hierarchical model is its inability to efficiently represent redundancyin data.

    Network model

    Main article:Network model

    The network model(defined by the CODASYL specification) organizes data using two

    fundamental constructs, called records andsets. Records contain fields (which may beorganized hierarchically, as in the programming language COBOL). Sets (not to beconfused with mathematical sets) define one-to-many relationships between records: one

    owner, many members. A record may be an owner in any number of sets, and a member in

    any number of sets.

    The network model is a variation on the hierchical model, to the extent that it is built on theconcept of multiple branches (lower-level structures) emanating from one or more nodes

    (higher-level structures), while the model differs from the hierchical model in that branches

    can be connected to multiple nodes. The network model is able to represent redundancy in

    data more efficiently than is the hierarchical model.

    The operations of the network model are navigational in style: a program maintains a

    current position, and navigates from one record to another by following the relationships in

    which the record participates. Records can also be located by supplying key values.

    Although it is not an essential feature of the model, network databases generally implementthe set relationships by means ofpointers that directly address the location of a record on

    http://en.wikipedia.org/wiki/Spreadsheethttp://en.wikipedia.org/wiki/Spreadsheethttp://en.wikipedia.org/wiki/Spreadsheethttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Hierarchical_databasehttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Network_databasehttp://en.wikipedia.org/wiki/Network_databasehttp://en.wikipedia.org/wiki/CODASYLhttp://en.wikipedia.org/wiki/COBOLhttp://en.wikipedia.org/wiki/One-to-manyhttp://en.wikipedia.org/wiki/Pointerhttp://en.wikipedia.org/wiki/Spreadsheethttp://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Hierarchical_databasehttp://en.wikipedia.org/wiki/Network_modelhttp://en.wikipedia.org/wiki/Network_databasehttp://en.wikipedia.org/wiki/CODASYLhttp://en.wikipedia.org/wiki/COBOLhttp://en.wikipedia.org/wiki/One-to-manyhttp://en.wikipedia.org/wiki/Pointer
  • 7/31/2019 Database Normalization Mihir

    3/23

    disk. This gives excellent retrieval performance, at the expense of operations such as

    database loading and reorganization.

    Most Object databases use the navigational concept to provide fast navigation acrossnetworks of objects, generally using Object Identifiers as "smart" pointers to related

    objects. Objectivity/DB, for instance, implements named 1:1, 1:many, Many:1 andMany:Many named relationships that can cross databases. Many object databases also

    support SQL, combining the strengths of both models.

    Relational model

    Main article:Relational model

    The relational model was introduced in anacademic paperbyE. F. Codd in 1970 as a way

    to make database management systems more independent of any particular application. It is

    a mathematical model defined in terms ofpredicate logic andset theory.

    The products that are generally referred to as relational databases in fact implement amodel that is only an approximation to the mathematical model defined by Codd. Three

    key terms are used extensively in relational database models: relations, attributes, and

    domains. A relation is a table with columns and rows. The named columns of the relationare called attributes, and the domain is the set of values the attributes are allowed to take.

    The basic data structure of the relational model is the table, where information about a

    particular entity (say, an employee) is represented in columns and rows (also called tuples).

    Thus, the "relation" in "relational database" refers to the various tables in the database; arelation is a set of tuples. The columns enumerate the various attributes of the entity (the

    employee's name, address or phone number, for example), and a row is an actual instanceof the entity (a specific employee) that is represented by the relation. As a result, each tupleof the employee table represents various attributes of a single employee.

    All relations (and, thus, tables) in a relational database have to adhere to some basic rules

    to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there

    can't be identical tuples or rows in a table. And third, each tuple will contain a single valuefor each of its attributes.

    A relational database contains multiple tables, each similar to the one in the "flat" database

    model. One of the strengths of the relational model is that, in principle, any value occurring

    in two different records (belonging to the same table or to different tables), implies arelationship among those two records. Yet, in order to enforce explicit integrity constraints,

    relationships between records in tables can also be defined explicitly, by identifying or

    non-identifying parent-child relationships characterized by assigning cardinality (1:1,(0)1:M, M:M). Tables can also have a designated single attribute or a set of attributes that

    can act as a "key", which can be used to uniquely identify each tuple in the table.

    http://en.wikipedia.org/wiki/Object_databaseshttp://en.wikipedia.org/wiki/Objectivity/DBhttp://en.wikipedia.org/wiki/Object_databaseshttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://www.acm.org/classics/nov95/toc.htmlhttp://www.acm.org/classics/nov95/toc.htmlhttp://en.wikipedia.org/wiki/E._F._Coddhttp://en.wikipedia.org/wiki/E._F._Coddhttp://en.wikipedia.org/wiki/Predicate_logichttp://en.wikipedia.org/wiki/Set_theoryhttp://en.wikipedia.org/wiki/Set_theoryhttp://en.wikipedia.org/wiki/Set_theoryhttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Tuplehttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Object_databaseshttp://en.wikipedia.org/wiki/Objectivity/DBhttp://en.wikipedia.org/wiki/Object_databaseshttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://www.acm.org/classics/nov95/toc.htmlhttp://en.wikipedia.org/wiki/E._F._Coddhttp://en.wikipedia.org/wiki/Predicate_logichttp://en.wikipedia.org/wiki/Set_theoryhttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Tuplehttp://en.wikipedia.org/wiki/Database_modelhttp://en.wikipedia.org/wiki/Database_model
  • 7/31/2019 Database Normalization Mihir

    4/23

    A key that can be used to uniquely identify a row in a table is called a primary key. Keys

    are commonly used to join or combine data from two or more tables. For example, an

    Employee table may contain a column namedLocation which contains a value that matchesthe key of aLocation table. Keys are also critical in the creation of indexes, which facilitate

    fast retrieval of data from large tables. Any column can be a key, or multiple columns can

    be grouped together into a compound key. It is not necessary to define all the keys inadvance; a column can be used as a key even if it was not originally intended to be one.

    A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or

    a car's serial number) is sometimes called a "natural" key. If no natural key is suitable

    (think of the many people namedBrown), an arbitrary or surrogate key can be assigned(such as by giving employees ID numbers). In practice, most databases have both

    generated and natural keys, because generated keys can be used internally to create links

    between rows that cannot break, while natural keys can be used, less reliably, for searchesand for integration with other databases. (For example, records in two independently

    developed databases could be matched up bysocial security number, except when the

    social security numbers are incorrect, missing, or have changed.)

    Relational operations

    Users (or programs) request data from a relational database by sending it a query that is

    written in a special language, usually a dialect ofSQL. Although SQL was originally

    intended for end-users, it is much more common for SQL queries to be embedded into

    software that provides an easier user interface. Many web sites, such as Wikipedia, performSQL queries when generating pages.

    In response to a query, the database returns a result set, which is just a list of rows

    containing the answers. The simplest query is just to return all the rows from a table, butmore often, the rows are filtered in some way to return just the answer wanted.

    Often, data from multiple tables are combined into one, by doing ajoin. Conceptually, this

    is done by taking all possible combinations of rows (the Cartesian product), and then

    filtering out everything except the answer. In practice, relational database managementsystems rewrite ("optimize") queries to perform faster, using a variety of techniques.

    There are a number of relational operations in addition to join. These include project (the

    process of eliminating some of the columns), restrict (the process of eliminating some of

    the rows), union (a way of combining two tables with similar structures), difference (which

    lists the rows in one table that are not found in the other), intersect (which lists the rowsfound in both tables), and product (mentioned above, which combines each row of one

    table with each row of the other). Depending on which other sources you consult, there are

    a number of other operators - many of which can be defined in terms of those listed above.These include semi-join, outer operators such as outer join and outer union, and various

    forms of division. Then there are operators to rename columns, and summarizing or

    aggregating operators, and if you permit relation values as attributes (RVA - relation-

    http://en.wikipedia.org/wiki/ISBNhttp://en.wikipedia.org/wiki/Social_security_numberhttp://en.wikipedia.org/wiki/Social_security_numberhttp://en.wikipedia.org/wiki/Queryhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/Join_(SQL)http://en.wikipedia.org/wiki/Cartesian_producthttp://en.wikipedia.org/wiki/Query_optimizerhttp://en.wikipedia.org/wiki/Relation-valued_attributehttp://en.wikipedia.org/wiki/ISBNhttp://en.wikipedia.org/wiki/Social_security_numberhttp://en.wikipedia.org/wiki/Queryhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/Join_(SQL)http://en.wikipedia.org/wiki/Cartesian_producthttp://en.wikipedia.org/wiki/Query_optimizerhttp://en.wikipedia.org/wiki/Relation-valued_attribute
  • 7/31/2019 Database Normalization Mihir

    5/23

    valued attribute), then operators such as group and ungroup. The SELECT statement in

    SQL serves to handle all of these except for the group and ungroup operators.

    The flexibility of relational databases allows programmers to write queries that were notanticipated by the database designers. As a result, relational databases can be used by

    multiple applications in ways the original designers did not foresee, which is especiallyimportant for databases that might be used for decades. This has made the idea and

    implementation of relational databases very popular with businesses.

    Normal Forms

    Main article:Database normalization

    Relations are classified based upon the types of anomalies to which they're vulnerable. A

    database that's in the first normal form is vulnerable to all types of anomalies, while adatabase that's in the domain/key normal form has no modification anomalies. Normal

    forms are hierarchical in nature. That is, the lowest level is the first normal form, and the

    database cannot meet the requirements for higher level normal forms without first havingmet all the requirements of the lesser normal forms.[1]

    Dimensional model

    The dimensional model is a specialized adaptation of the relational model used to represent

    data indata warehouses in a way that data can be easily summarized usingOLAP queries.

    In the dimensional model, a database consists of a single large table of facts that are

    described using dimensions and measures. A dimension provides the context of a fact (suchas who participated, when and where it happened, and its type) and is used in queries to

    group related facts together. Dimensions tend to be discrete and are often hierarchical; for

    example, the location might include the building, state, and country. A measure is aquantity describing the fact, such as revenue. It's important that measures can be

    meaningfully aggregated - for example, the revenue from different locations can be added

    together.

    In an OLAP query, dimensions are chosen and the facts are grouped and added together tocreate a summary.

    The dimensional model is often implemented on top of the relational model using a star

    schema, consisting of one table containing the facts and surrounding tables containing the

    dimensions. Particularly complicated dimensions might be represented using multiple

    tables, resulting in a snowflake schema.

    A data warehouse can contain multiple star schemas that share dimension tables, allowing

    them to be used together. Coming up with a standard set of dimensions is an important part

    of dimensional modeling.

    Object database models

    http://en.wikipedia.org/wiki/Relation-valued_attributehttp://en.wikipedia.org/wiki/Database_normalizationhttp://en.wikipedia.org/wiki/Database_normalizationhttp://en.wikipedia.org/wiki/Database_models#_note-Normalization%23_note-Normalizationhttp://en.wikipedia.org/wiki/Database_models#_note-Normalization%23_note-Normalizationhttp://en.wikipedia.org/w/index.php?title=Dimensional_database&action=edithttp://en.wikipedia.org/wiki/Data_warehousehttp://en.wikipedia.org/wiki/Data_warehousehttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Snowflake_schemahttp://en.wikipedia.org/wiki/Snowflake_schemahttp://en.wikipedia.org/wiki/Relation-valued_attributehttp://en.wikipedia.org/wiki/Database_normalizationhttp://en.wikipedia.org/wiki/Database_models#_note-Normalization%23_note-Normalizationhttp://en.wikipedia.org/w/index.php?title=Dimensional_database&action=edithttp://en.wikipedia.org/wiki/Data_warehousehttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Snowflake_schema
  • 7/31/2019 Database Normalization Mihir

    6/23

    Main article:Object-relational model

    Main article:Object model

    In recent years, theobject-oriented paradigm has been applied to database technology,creating a new programming model known asobject databases. These databases attempt to

    bring the database world and the application programming world closer together, inparticular by ensuring that the database uses the same type system as the application

    program. This aims to avoid the overhead (sometimes referred to as the impedance

    mismatch) of converting information between its representation in the database (for

    example as rows in tables) and its representation in the application program (typically as

    objects). At the same time, object databases attempt to introduce the key ideas of objectprogramming, such as encapsulation andpolymorphism, into the world of databases.

    A variety of these ways have been tried for storing objects in a database. Some products

    have approached the problem from the application programming end, by making the

    objects manipulated by the programpersistent. This also typically requires the addition of

    some kind of query language, since conventional programming languages do not have theability to find objects based on their information content. Others have attacked the problem

    from the database end, by defining an object-oriented data model for the database, anddefining a database programming language that allows full programming capabilities as

    well as traditional query facilities.

    Object databases suffered because of a lack of standardization: although standards were

    defined by ODMG, they were never implemented well enough to ensure interoperabilitybetween products. Nevertheless, object databases have been used successfully in many

    applications: usually specialized applications such as engineering databases or molecular

    biology databases rather than mainstream commercial data processing. However, object

    database ideas were picked up by the relational vendors and influenced extensions made tothese products and indeed to the SQL language.

    http://en.wikipedia.org/wiki/Object-relational_modelhttp://en.wikipedia.org/wiki/Object-relational_modelhttp://en.wikipedia.org/wiki/Object_modelhttp://en.wikipedia.org/wiki/Object_modelhttp://en.wikipedia.org/wiki/Object-orientedhttp://en.wikipedia.org/wiki/Object-orientedhttp://en.wikipedia.org/wiki/Object_databasehttp://en.wikipedia.org/wiki/Object_databasehttp://en.wikipedia.org/wiki/Object_databasehttp://en.wikipedia.org/wiki/Type_systemhttp://en.wikipedia.org/wiki/Object-Relational_impedance_mismatchhttp://en.wikipedia.org/wiki/Object-Relational_impedance_mismatchhttp://en.wikipedia.org/wiki/Encapsulationhttp://en.wikipedia.org/wiki/Polymorphism_(computer_science)http://en.wikipedia.org/wiki/Polymorphism_(computer_science)http://en.wikipedia.org/wiki/Persistencehttp://en.wikipedia.org/wiki/Persistencehttp://en.wikipedia.org/wiki/Object_Database_Management_Grouphttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/Object-relational_modelhttp://en.wikipedia.org/wiki/Object_modelhttp://en.wikipedia.org/wiki/Object-orientedhttp://en.wikipedia.org/wiki/Object_databasehttp://en.wikipedia.org/wiki/Type_systemhttp://en.wikipedia.org/wiki/Object-Relational_impedance_mismatchhttp://en.wikipedia.org/wiki/Object-Relational_impedance_mismatchhttp://en.wikipedia.org/wiki/Encapsulationhttp://en.wikipedia.org/wiki/Polymorphism_(computer_science)http://en.wikipedia.org/wiki/Persistencehttp://en.wikipedia.org/wiki/Object_Database_Management_Grouphttp://en.wikipedia.org/wiki/SQL
  • 7/31/2019 Database Normalization Mihir

    7/23

    Database normalization

    Database normalization is a design technique for structuringrelational database tables.

    Tables can be normalized to a greater or lesser degree. Database theory describes a table's

    degree of normalization in terms ofnormal forms.The most common normal forms, fromleast normalized to most normalized, are as follows:

    First normal form (1NF)

    Second normal form (2NF)

    Third normal form (3NF)

    Boyce-Codd normal form (BCNF)

    Fourth normal form (4NF)

    Fifth normal form (5NF) Domain/key normal form (DKNF)

    Sixth normal form (6NF)

    Each normal form automatically includes the properties of lower normal formsfor

    example, a table in 3NF is also in 1NF.

    More highly normalized tables reduce data duplication and opportunities for various kindsof logical inconsistencies that could lead to loss of integrity of the database. They greatly

    simplify development, maintenance, and expandability of the database. Higher degrees of

    normalization typically involve more tables and create the need for a larger number of

    joins, which can reduce performance. As a result, more highly normalized tables aretypically used for databases involving many isolable transactions (such as an automatic

    teller system), while less normalized tables are used for read-mostly information (such asreports).

    Although the normal forms are often defined informally in terms of the characteristics of

    tables, rigorous definitions of the normal forms are concerned with the characteristics of

    mathematical constructs known asrelations. Whenever information is representedrelationallythat is, roughly speaking, as values within rows beneath fixed column

    headingsit makes sense to ask to what extent the representation is normalized.

    Contents

    [hide]

    1 Problems addressed by normalization

    2 Background to normalization: definitions

    http://en.wikipedia.org/wiki/Table_(database)http://en.wikipedia.org/wiki/Table_(database)http://en.wikipedia.org/wiki/Normal_formhttp://en.wikipedia.org/wiki/First_normal_formhttp://en.wikipedia.org/wiki/Second_normal_formhttp://en.wikipedia.org/wiki/Third_normal_formhttp://en.wikipedia.org/wiki/Boyce-Codd_normal_formhttp://en.wikipedia.org/wiki/Fourth_normal_formhttp://en.wikipedia.org/wiki/Fifth_normal_formhttp://en.wikipedia.org/wiki/Domain/key_normal_formhttp://en.wikipedia.org/wiki/Sixth_normal_formhttp://en.wikipedia.org/wiki/Relation_(mathematics)http://en.wikipedia.org/wiki/Relation_(mathematics)http://toggletoc%28%29/http://en.wikipedia.org/wiki/Database_normalization#Problems_addressed_by_normalization%23Problems_addressed_by_normalizationhttp://en.wikipedia.org/wiki/Database_normalization#Background_to_normalization:_definitions%23Background_to_normalization:_definitionshttp://en.wikipedia.org/wiki/Table_(database)http://en.wikipedia.org/wiki/Normal_formhttp://en.wikipedia.org/wiki/First_normal_formhttp://en.wikipedia.org/wiki/Second_normal_formhttp://en.wikipedia.org/wiki/Third_normal_formhttp://en.wikipedia.org/wiki/Boyce-Codd_normal_formhttp://en.wikipedia.org/wiki/Fourth_normal_formhttp://en.wikipedia.org/wiki/Fifth_normal_formhttp://en.wikipedia.org/wiki/Domain/key_normal_formhttp://en.wikipedia.org/wiki/Sixth_normal_formhttp://en.wikipedia.org/wiki/Relation_(mathematics)http://toggletoc%28%29/http://en.wikipedia.org/wiki/Database_normalization#Problems_addressed_by_normalization%23Problems_addressed_by_normalizationhttp://en.wikipedia.org/wiki/Database_normalization#Background_to_normalization:_definitions%23Background_to_normalization:_definitions
  • 7/31/2019 Database Normalization Mihir

    8/23

    3 History

    4 Normal forms

    o 4.1 First normal form

    o 4.2 Second normal form

    o 4.3 Third normal form

    o 4.4 Boyce-Codd normal formo 4.5 Fourth normal form

    o 4.6 Fifth normal form

    o 4.7 Domain/key normal form

    o 4.8 Sixth normal form

    5 Example Of The Process

    o 5.1 Starting Point

    o 5.2 1NF

    o 5.3 2NF

    o 5.4 3NF and BCNF

    o 5.5 4NF

    o 5.6 5NF

    6 Denormalization

    o 6.1 Non-first normal form (NF)

    7 Further reading

    8 References

    9 See also

    10 External links

    Problems addressed by normalizationA table that is not sufficiently normalized can suffer from logical inconsistencies of various

    types, and from anomalies involving data operations. In such a table:-

    The same information can be expressed on multiple records; therefore updates to

    the table may result in logical inconsistencies. For example, each record in anunnormalized "Employees' Skills" table might contain an Employee ID, Employee

    Address, and Skill; thus a change of address for a particular employee will

    potentially need to be applied to multiple records (one for each of his skills). If the

    update is not carried through successfullyif, that is, the employee's address is

    updated on some records but not othersthen the table is left in an inconsistentstate. Specifically, the table provides conflicting answers to the question of what

    this particular employee's address is. This phenomenon is known as an update

    anomaly.

    There are circumstances in which certain facts cannot be recorded at all. In the

    above example, if it is the case that Employee Address is held only in the"Employees' Skills" table, then we cannot record the address of an employee whose

    skills are not yet known. This phenomenon is known as an insertion anomaly.

    http://en.wikipedia.org/wiki/Database_normalization#History%23Historyhttp://en.wikipedia.org/wiki/Database_normalization#Normal_forms%23Normal_formshttp://en.wikipedia.org/wiki/Database_normalization#First_normal_form%23First_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Second_normal_form%23Second_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Third_normal_form%23Third_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Boyce-Codd_normal_form%23Boyce-Codd_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Fourth_normal_form%23Fourth_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Fifth_normal_form%23Fifth_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Domain.2Fkey_normal_form%23Domain.2Fkey_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Sixth_normal_form%23Sixth_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Example_Of_The_Process%23Example_Of_The_Processhttp://en.wikipedia.org/wiki/Database_normalization#Starting_Point%23Starting_Pointhttp://en.wikipedia.org/wiki/Database_normalization#1NF%231NFhttp://en.wikipedia.org/wiki/Database_normalization#2NF%232NFhttp://en.wikipedia.org/wiki/Database_normalization#3NF_and_BCNF%233NF_and_BCNFhttp://en.wikipedia.org/wiki/Database_normalization#4NF%234NFhttp://en.wikipedia.org/wiki/Database_normalization#5NF%235NFhttp://en.wikipedia.org/wiki/Database_normalization#Denormalization%23Denormalizationhttp://en.wikipedia.org/wiki/Database_normalization#Non-first_normal_form_.28NF.C2.B2.29%23Non-first_normal_form_.28NF.C2.B2.29http://en.wikipedia.org/wiki/Database_normalization#Further_reading%23Further_readinghttp://en.wikipedia.org/wiki/Database_normalization#References%23Referenceshttp://en.wikipedia.org/wiki/Database_normalization#See_also%23See_alsohttp://en.wikipedia.org/wiki/Database_normalization#External_links%23External_linkshttp://en.wikipedia.org/wiki/CRUD_(acronym)http://en.wikipedia.org/wiki/Database_normalization#History%23Historyhttp://en.wikipedia.org/wiki/Database_normalization#Normal_forms%23Normal_formshttp://en.wikipedia.org/wiki/Database_normalization#First_normal_form%23First_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Second_normal_form%23Second_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Third_normal_form%23Third_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Boyce-Codd_normal_form%23Boyce-Codd_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Fourth_normal_form%23Fourth_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Fifth_normal_form%23Fifth_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Domain.2Fkey_normal_form%23Domain.2Fkey_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Sixth_normal_form%23Sixth_normal_formhttp://en.wikipedia.org/wiki/Database_normalization#Example_Of_The_Process%23Example_Of_The_Processhttp://en.wikipedia.org/wiki/Database_normalization#Starting_Point%23Starting_Pointhttp://en.wikipedia.org/wiki/Database_normalization#1NF%231NFhttp://en.wikipedia.org/wiki/Database_normalization#2NF%232NFhttp://en.wikipedia.org/wiki/Database_normalization#3NF_and_BCNF%233NF_and_BCNFhttp://en.wikipedia.org/wiki/Database_normalization#4NF%234NFhttp://en.wikipedia.org/wiki/Database_normalization#5NF%235NFhttp://en.wikipedia.org/wiki/Database_normalization#Denormalization%23Denormalizationhttp://en.wikipedia.org/wiki/Database_normalization#Non-first_normal_form_.28NF.C2.B2.29%23Non-first_normal_form_.28NF.C2.B2.29http://en.wikipedia.org/wiki/Database_normalization#Further_reading%23Further_readinghttp://en.wikipedia.org/wiki/Database_normalization#References%23Referenceshttp://en.wikipedia.org/wiki/Database_normalization#See_also%23See_alsohttp://en.wikipedia.org/wiki/Database_normalization#External_links%23External_linkshttp://en.wikipedia.org/wiki/CRUD_(acronym)
  • 7/31/2019 Database Normalization Mihir

    9/23

    There are circumstances in which the deletion of data representing certain facts

    necessitates the deletion of data representing completely different facts. For

    example, suppose a table has the attributes Student ID, Course ID, and Lecturer ID(a given student is enrolled in a given course, which is taught by a given lecturer).

    If in the early stages of enrolment the number of students on the course temporarily

    drops to zero, then the last of the records referencing that course must be deletedmeaning, as a side-effect, that the table no longer tells us which lecturer has been

    assigned to teach the course. This phenomenon is known as a deletion anomaly.

    Ideally, a relational database table should be designed in such a way as to exclude the

    possibility of update, insertion, and deletion anomalies. The normal forms of relationaldatabase theory provide guidelines for deciding whether a particular design will be

    vulnerable to such anomalies. It is possible to correct an unnormalized design so as to make

    it adhere to the demands of the normal forms: this is called normalization.

    Normalization typically involves decomposing an unnormalized table into two or more

    tables that, were they to be combined (joined), would convey exactly the same informationas the original table.

    Background to normalization: definitions

    Functional dependency: Attribute B has a functional dependency on attribute A if,for each value of attribute A, there is exactly one value of attribute B. For example,

    Employee Address has a functional dependency on Employee ID, because a

    particular Employee Address value corresponds to every Employee ID value. Anattribute may be functionally dependent either on a single attribute or on a

    combination of attributes. It is not possible to determine the extent to which a

    design is normalized without understanding what functional dependencies apply tothe attributes within its tables; understanding this, in turn, requires knowledge of

    the problem domain.

    Trivial functional dependency: A trivial functional dependency is a functional

    dependency of an attribute on a superset of itself. {Employee ID, EmployeeAddress} {Employee Address} is trivial, as is {Employee Address}

    {Employee Address}.

    Full functional dependency: An attribute is fully functionally dependent on a setof attributes X if it is a) functionally dependent on X, and b) not functionally

    dependent on any proper subset of X. {Employee Address} has a functional

    dependency on {Employee ID, Skill}, but not afullfunctional dependency, for it is

    also dependent on {Employee ID}. Transitive dependency: A transitive dependency is an indirect functional

    dependency, one in whichXZonly by virtue ofXYand YZ.

    Multivalued dependency: A multivalued dependency is a constraint according to

    which the presence of certain rows in a table implies the presence of certain other

    rows: see the Multivalued Dependency article for a rigorous definition.

    Join dependency: A table Tis subject to ajoin dependency ifTcan always berecreated by joining multiple tables each having a subset of the attributes ofT.

    http://en.wikipedia.org/wiki/Functional_dependencyhttp://en.wikipedia.org/wiki/Multivalued_Dependencyhttp://en.wikipedia.org/w/index.php?title=Join_dependency&action=edithttp://en.wikipedia.org/wiki/Functional_dependencyhttp://en.wikipedia.org/wiki/Multivalued_Dependencyhttp://en.wikipedia.org/w/index.php?title=Join_dependency&action=edit
  • 7/31/2019 Database Normalization Mihir

    10/23

    Superkey: Asuperkey is an attribute or set of attributes that uniquely identifies

    rows within a table; in other words, two distinct rows are always guaranteed to have

    distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkeyfor the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.

    Candidate key: Acandidate key is a minimal superkey, that is, a superkey for

    which we can say that no proper subset of it is also a superkey. {Employee Id,Skill} would be a candidate key for the "Employees' Skills" table.

    Non-prime attribute: A non-prime attribute is an attribute that does not occur in

    any candidate key. Employee Address would be a non-prime attribute in the"Employees' Skills" table.

    Primary key: MostDBMSsrequire a table to be defined as having a single unique

    key, rather than a number of possible unique keys. Aprimary key is a candidate key

    which the database designer has designated for this purpose.

    History

    Edgar F. Codd first proposed the process of normalization and what came to be known as

    the 1st normal form:

    There is, in fact, a very simple elimination[1] procedure which we shall call normalization.Through decomposition non-simple domains are replaced by "domains whose elements are

    atomic (non-decomposable) values."

    Edgar F. Codd, A Relational Model of Data for Large Shared Data Banks[2]

    In his paper, Edgar F. Codd used the term "non-simple" domains to describe aheterogeneous data structure, but later researchers would refer to such a structure as an

    abstract data type.

    Normal forms

    The normal forms (abbrev. NF) of relational database theory provide criteria fordetermining a table's degree of vulnerability to logical inconsistencies and anomalies. The

    higher the normal form applicable to a table, the less vulnerable it is to such inconsistencies

    and anomalies. Each table has a "highest normal form" (HNF): by definition, a table

    always meets the requirements of its HNF and of all normal forms lower than its HNF; alsoby definition, a table fails to meet the requirements of any normal form higher than its

    HNF.

    The normal forms are applicable to individual tables; to say that an entire database is innormal form n is to say that all of its tables are in normal form n.

    http://en.wikipedia.org/wiki/Superkeyhttp://en.wikipedia.org/wiki/Superkeyhttp://en.wikipedia.org/wiki/Candidate_keyhttp://en.wikipedia.org/wiki/Candidate_keyhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Primary_keyhttp://en.wikipedia.org/wiki/Primary_keyhttp://en.wikipedia.org/wiki/Edgar_F._Coddhttp://en.wikipedia.org/wiki/Database_normalization#_note-0%23_note-0http://en.wikipedia.org/wiki/Database_normalization#_note-1%23_note-1http://en.wikipedia.org/wiki/Abstract_data_typehttp://en.wikipedia.org/wiki/Abstract_data_typehttp://en.wikipedia.org/wiki/Image:Wiki_letter_w.svghttp://en.wikipedia.org/wiki/Superkeyhttp://en.wikipedia.org/wiki/Candidate_keyhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Primary_keyhttp://en.wikipedia.org/wiki/Edgar_F._Coddhttp://en.wikipedia.org/wiki/Database_normalization#_note-0%23_note-0http://en.wikipedia.org/wiki/Database_normalization#_note-1%23_note-1http://en.wikipedia.org/wiki/Abstract_data_type
  • 7/31/2019 Database Normalization Mihir

    11/23

    Newcomers to database design sometimes suppose that normalization proceeds in an

    iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This

    is not an accurate description of how normalization typically works. A sensibly designedtable is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is

    overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms

    (above 3NF) does not usually require an extra expenditure of effort on the part of thedesigner, because 3NF tables usually need no modification to meet the requirements of

    these higher normal forms.

    Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These

    normal forms have been summarized as requiring that all non-key attributes be dependenton "the key, the whole key and nothing but the key". The fourth and fifth normal forms

    (4NF and 5NF) deal specifically with the representation of many-to-many and one-to-many

    relationships among attributes. Sixth normal form (6NF) incorporates considerationsrelevant to temporal databases.

    First normal form

    Main article:First normal form

    The criteria forfirst normal form (1NF) are:

    A table must be guaranteed not to have any duplicate records;

    therefore it must have at least one candidate key.

    Every column must be atomic, i.e. single-valued with respect to its

    datatype. In other words, a column may represent exactly one member from

    its domain. For example, a date column carrying two dates is a 1NFviolation. On the other hand, a datatype may be arbitrarily complex.

    Therefore, a hypothetical date-range datatype might indeed carry two dates(or rather, one date range) without violating 1NF.

    Sometimes this second requirement is expressed like "there may not be repeatinggroups", leading to some prevalent misconceptions. The first misconception is that

    1NF precludes a series of columns repeating the same domain. The second

    misconception is that 1NF does not allow embedded lists. These are perhapsexamples of poor design, but not necessarily 1NF violations:

    Recipe ID Ingredient 1 Ingredient 2 Ingredient 3

    1 Flour Eggs Milk

    2 Parsely Sage Rosemary

    3 Flour Eggs Milk Recipe ID Ingredients

    1 flour,eggs,milk

    2 parsely,sage,rosemary

    3 flour,eggs,milk

    Realize that relational databases are incapable of such things,but here's a depiction

    of a true 1NF violation, nonetheless:

    http://en.wikipedia.org/wiki/Edgar_F._Coddhttp://en.wikipedia.org/wiki/Temporal_databasehttp://en.wikipedia.org/wiki/First_normal_formhttp://en.wikipedia.org/wiki/First_normal_formhttp://en.wikipedia.org/wiki/Candidate_keyhttp://en.wikipedia.org/wiki/Edgar_F._Coddhttp://en.wikipedia.org/wiki/Temporal_databasehttp://en.wikipedia.org/wiki/First_normal_formhttp://en.wikipedia.org/wiki/Candidate_key
  • 7/31/2019 Database Normalization Mihir

    12/23

    Recipe ID Ingredient 1 Ingredient 2 Ingredient 3

    1

    3

    Flour Eggs Milk

    2 Parsely Sage Rosemary

    Second normal form

    Main article:Second normal formThe criteria forsecond normal form (2NF) are:

    The table must be in 1NF.

    None of the non-prime attributes of the table are functionally

    dependent on a part (proper subset) of a candidate key; in other words, allfunctional dependencies of non-prime attributes on candidate keys are full

    functional dependencies. For example, in an "Employees' Skills" table

    whose attributes are Employee ID, Employee Address, and Skill, thecombination of Employee ID and Skill uniquely identifies records within

    the table. Given that Employee Address depends on only one of those

    attributes namely, Employee ID the table is not in 2NF.

    Note that if none of a 1NF table's candidate keys are composite i.e.

    every candidate key consists of just one attribute then we can say

    immediately that the table is in 2NF.

    Third normal form

    Main article:Third normal form

    The criteria forthird normal form (3NF) are:

    The table must be in 2NF.

    Every non-prime attribute of the table must be non-transitively

    dependent on every candidate key. A violation of 3NF would mean that at

    least one non-prime attribute is only indirectly dependent (transitivelydependent) on a candidate key. For example, consider a "Departments"

    table whose attributes are Department ID, Department Name, Manager ID,

    and Manager Hire Date; and suppose that each manager can manage one or

    more departments. {Department ID} is a candidate key. Although ManagerHire Date is functionally dependent on the candidate key {Department ID},

    this is only because Manager Hire Date depends on Manager ID, which inturn depends on Department ID. This transitive dependency means the tableis not in 3NF.

    Boyce-Codd normal form

    Main article:Boyce-Codd normal form

    The criteria forBoyce-Codd normal form (BCNF) are:

    http://en.wikipedia.org/wiki/Second_normal_formhttp://en.wikipedia.org/wiki/Second_normal_formhttp://en.wikipedia.org/wiki/Third_normal_formhttp://en.wikipedia.org/wiki/Third_normal_formhttp://en.wikipedia.org/wiki/Boyce-Codd_normal_formhttp://en.wikipedia.org/wiki/Boyce-Codd_normal_formhttp://en.wikipedia.org/wiki/Second_normal_formhttp://en.wikipedia.org/wiki/Third_normal_formhttp://en.wikipedia.org/wiki/Boyce-Codd_normal_form
  • 7/31/2019 Database Normalization Mihir

    13/23

    The table must be in 3NF.

    Every non-trivial functional dependency must be a dependency on a

    superkey.

    Fourth normal form

    Main article:Fourth normal formThe criteria forfourth normal form (4NF) are:

    The table must be in BCNF.

    There must be no non-trivialmultivalued dependencies on

    something other than a superkey. A BCNF table is said to be in 4NFif andonly ifall of itsmultivalued dependencies are functional dependencies.

    Fifth normal form

    Main article:Fifth normal formThe criteria forfifth normal form (5NF and also PJ/NF) are:

    The table must be in 4NF.

    There must be no non-trivial join dependencies that do not followfrom the key constraints. A 4NF table is said to be in the 5NF if and only if

    every join dependency in it is implied by the candidate keys.

    Domain/key normal form

    Main article:Domain/key normal form

    Domain/key normal form (orDKNF) requires that a table not be subject to anyconstraints other than domain constraints and key constraints.

    Sixth normal form

    This normal form was, as of 2005, only recently proposed: the sixth normal form (6NF)was only defined when extending the relational model to take into account the temporal

    dimension. Unfortunately, most current SQL technologies as of 2005 do not take into

    account this work, and most temporal extensions to SQL are not relational. See work by

    Date, Darwen and Lorentzos[3] for a relational temporal extension, [4],for further discussionon Temporal Aggregation in SQL, or see TSQL2for a different approach.

    Example Of The Process

    The following example illustrates how a database designer might employ his knowledge of

    the normal forms to make progressive improvements to an initially unnormalized databasedesign. The example is somewhat contrived: in practice, few designs lend themselves to

    being normalized in strict stages in which the HNF increases at each stage.

    http://en.wikipedia.org/wiki/Fourth_normal_formhttp://en.wikipedia.org/wiki/Fourth_normal_formhttp://en.wikipedia.org/wiki/Multivalued_dependencyhttp://en.wikipedia.org/wiki/Multivalued_dependencyhttp://en.wikipedia.org/wiki/If_and_only_ifhttp://en.wikipedia.org/wiki/If_and_only_ifhttp://en.wikipedia.org/wiki/If_and_only_ifhttp://en.wikipedia.org/wiki/Multivalued_dependencyhttp://en.wikipedia.org/wiki/Multivalued_dependencyhttp://en.wikipedia.org/wiki/Functional_dependencyhttp://en.wikipedia.org/wiki/Fifth_normal_formhttp://en.wikipedia.org/wiki/Fifth_normal_formhttp://en.wikipedia.org/wiki/If_and_only_ifhttp://en.wikipedia.org/wiki/Domain/key_normal_formhttp://en.wikipedia.org/wiki/Domain/key_normal_formhttp://en.wikipedia.org/wiki/Timehttp://en.wikipedia.org/wiki/Dimensionhttp://en.wikipedia.org/wiki/Database_normalization#_note-DBDebunk%23_note-DBDebunkhttp://en.wikipedia.org/wiki/Database_normalization#_note-Zimyani%23_note-Zimyanihttp://en.wikipedia.org/w/index.php?title=TSQL2&action=edithttp://en.wikipedia.org/w/index.php?title=TSQL2&action=edithttp://en.wikipedia.org/wiki/Fourth_normal_formhttp://en.wikipedia.org/wiki/Multivalued_dependencyhttp://en.wikipedia.org/wiki/If_and_only_ifhttp://en.wikipedia.org/wiki/If_and_only_ifhttp://en.wikipedia.org/wiki/Multivalued_dependencyhttp://en.wikipedia.org/wiki/Functional_dependencyhttp://en.wikipedia.org/wiki/Fifth_normal_formhttp://en.wikipedia.org/wiki/If_and_only_ifhttp://en.wikipedia.org/wiki/Domain/key_normal_formhttp://en.wikipedia.org/wiki/Timehttp://en.wikipedia.org/wiki/Dimensionhttp://en.wikipedia.org/wiki/Database_normalization#_note-DBDebunk%23_note-DBDebunkhttp://en.wikipedia.org/wiki/Database_normalization#_note-Zimyani%23_note-Zimyanihttp://en.wikipedia.org/w/index.php?title=TSQL2&action=edit
  • 7/31/2019 Database Normalization Mihir

    14/23

    The database in the example captures information about the suppliers with which various

    companies' divisions have relationships more specifically, it captures information about

    the types of parts which each division of each company sources from its suppliers.

    Starting Point

    Information has been presented initially in a way that does not even meet 1NF. Everyrecord is for a particular Company/Division combination: for each of these combinations,

    repeating groups of part- and supplier-related information occur. 1NF does not permit

    repeating groups.

    Suppliers and Parts By Company Division

    CompanyCompany

    Founder

    Company

    LogoDivision Part Type Supplier

    Supplier

    Country

    Allied

    Clock andWatch

    Horace

    Washington Sundial Clocks

    Spring

    Pendulum

    SpringToothed Wheel

    Tensile Globodynamics

    Tensile Globodynamics

    Pieza de AceroPieza de Acero

    USA

    USA

    MexicoMexico

    N

    N

    NN

    AlliedClock and

    Watch

    HoraceWashington

    Sundial WatchesQuartz CrystalTuning Fork

    Battery

    MicrofluxMicroflux

    Dakota Electrics

    BelgiumBelgium

    USA

    EE

    N

    GlobalRobot

    NilsNeumann

    GearboxIndustrialRobots

    Flywheel

    AxleAxle

    Mechanical Arm

    Wheels 4 Less

    Wheels 4 LessTransEuropa

    TransEuropa

    USA

    USAItaly

    Italy

    N

    NE

    E

    Global

    Robot

    Nils

    Neumann Gearbox

    Domestic

    Robots

    Artificial BrainArtificial Brain

    Metal HousingBackplate

    Prometheus LabsFrankenstein Labs

    Pieza de AceroPieza de Acero

    LuxembourgGermany

    MexicoMexico

    EE

    NN

    1NF

    We eliminate the repeating groups by ensuring that each group appears on its own record.

    The unique identifier for a record is now {Company, Division, Part Type, Supplier}.

    Suppliers and Parts By Company Division

    CompanyCompany

    Founder

    Company

    Logo

    Division Part Type SupplierSupplier

    Country

    Supplier

    ContinentAlliedClock and

    Watch

    Horace

    WashingtonSundial Clocks Spring

    Tensile

    GlobodynamicsUSA N. Amer.

    AlliedClock and

    Watch

    HoraceWashington

    Sundial Clocks PendulumTensileGlobodynamics

    USA N. Amer.

    Allied Horace Sundial Clocks Spring Pieza de Acero Mexico N. Amer.

  • 7/31/2019 Database Normalization Mihir

    15/23

    Clock and

    WatchWashington

    AlliedClock and

    Watch

    Horace

    WashingtonSundial Clocks

    Toothed

    WheelPieza de Acero Mexico N. Amer.

    AlliedClock and

    Watch

    HoraceWashington

    Sundial WatchesQuartzCrystal

    Microflux Belgium Europe

    Allied

    Clock andWatch

    HoraceWashington

    Sundial WatchesTuningFork

    Microflux Belgium Europe

    Allied

    Clock andWatch

    HoraceWashington

    Sundial Watches BatteryDakotaElectrics

    USA N. Amer.

    Global

    Robot

    Nils

    NeumannGearbox

    Industrial

    RobotsFlywheel Wheels 4 Less USA N. Amer.

    GlobalRobot

    NilsNeumann

    GearboxIndustrialRobots

    Axle Wheels 4 Less USA N. Amer.

    Global

    Robot

    Nils

    NeumannGearbox

    Industrial

    RobotsAxle TransEuropa Italy Europe

    Global

    Robot

    Nils

    NeumannGearbox

    Industrial

    Robots

    Mechanical

    ArmTransEuropa Italy Europe

    Global

    Robot

    Nils

    NeumannGearbox

    Domestic

    Robots

    Artificial

    Brain

    Prometheus

    LabsLuxembourg Europe

    Global

    Robot

    Nils

    NeumannGearbox

    Domestic

    Robots

    Artificial

    Brain

    Frankenstein

    LabsGermany Europe

    GlobalRobot

    NilsNeumann

    Gearbox DomesticRobots

    MetalHousing

    Pieza de Acero Mexico N. Amer.

    Global

    Robot

    Nils

    NeumannGearbox

    Domestic

    RobotsBackplate Pieza de Acero Mexico N. Amer.

    2NF

    One problem with the design at this stage is that Company Founder and Company Logodetails for a given company may appear redundantly on more than one record; so may the

    Supplier Countries and Continents for a given supplier. These phenomena arise from the

    part-key dependencies of a) the Company Founder and Company Logo attributes onCompany, and b) the Supplier Country and Supplier Continent attributes on Supplier. 2NF

    does not permit part-key dependencies. We correct the problem by splitting out the

    Company Founder and Company Logo details into their own table, called Companies, as

  • 7/31/2019 Database Normalization Mihir

    16/23

    well as splitting out the Supplier Country and Supplier Continent details into their own

    table, called Suppliers.

    Suppliers and Parts By Company Division

    Company Division Part Type Supplier

    Allied Clock and Watch Clocks Spring Tensile GlobodynamicsAllied Clock and Watch Clocks Pendulum Tensile Globodynamics

    Allied Clock and Watch Clocks Spring Pieza de Acero

    Allied Clock and Watch Clocks Toothed Wheel Pieza de Acero

    Allied Clock and Watch Watches Quartz Crystal Microflux

    Allied Clock and Watch Watches Tuning Fork Microflux

    Allied Clock and Watch Watches Battery Dakota Electrics

    Global Robot Industrial Robots Flywheel Wheels 4 Less

    Global Robot Industrial Robots Axle Wheels 4 Less

    Global Robot Industrial Robots Axle TransEuropa

    Global Robot Industrial Robots Mechanical Arm TransEuropaGlobal Robot Domestic Robots Artificial Brain Prometheus Labs

    Global Robot Domestic Robots Artificial Brain Frankenstein Labs

    Global Robot Domestic Robots Metal Housing Pieza de Acero

    Global Robot Domestic Robots Backplate Pieza de Acero

    Companies

    Company Company Founder Company Logo

    Allied Clock and Watch Horace Washington Sundial

    Global Robot Nils Neumann Gearbox

    Suppliers

    Supplier Supplier Country Supplier ContinentTensile Globodynamics USA N. Amer.

    Pieza de Acero Mexico N. Amer.

    Microflux Belgium Europe

    Dakota Electrics USA N. Amer.

    Wheels 4 Less USA N. Amer.

    TransEuropa Italy Europe

    Prometheus Labs Luxembourg Europe

    Frankenstein Labs Germany Europe

    3NF and BCNF

    There is still, however, redundancy in the design. The Supplier Continent for a given

    Supplier Country may appear redundantly on more than one record. This phenomenon

    arises from the dependency of non-key attribute Supplier Continent on non-key attributeSupplier Country, and means that the design does not conform to 3NF. To achieve 3NF

    (and, while we are at it, BCNF), we create a separate Countries table which tells us which

    continent a country belongs to.

  • 7/31/2019 Database Normalization Mihir

    17/23

    Suppliers and Parts By Company Division

    Company Division Part Type Supplier

    Allied Clock and Watch Clocks Spring Tensile Globodynamics

    Allied Clock and Watch Clocks Pendulum Tensile Globodynamics

    Allied Clock and Watch Clocks Spring Pieza de Acero

    Allied Clock and Watch Clocks Toothed Wheel Pieza de Acero

    Allied Clock and Watch Watches Quartz Crystal Microflux

    Allied Clock and Watch Watches Tuning Fork Microflux

    Allied Clock and Watch Watches Battery Dakota Electrics

    Global Robot Industrial Robots Flywheel Wheels 4 Less

    Global Robot Industrial Robots Axle Wheels 4 Less

    Global Robot Industrial Robots Axle TransEuropa

    Global Robot Industrial Robots Mechanical Arm TransEuropa

    Global Robot Domestic Robots Artificial Brain Prometheus Labs

    Global Robot Domestic Robots Artificial Brain Frankenstein Labs

    Global Robot Domestic Robots Metal Housing Pieza de Acero

    Global Robot Domestic Robots Backplate Pieza de Acero

    Suppliers

    Supplier Supplier Country

    Tensile Globodynamics USA

    Pieza de Acero Mexico

    Microflux Belgium

    Dakota Electrics USA

    Wheels 4 Less USA

    TransEuropa Italy

    Prometheus Labs Luxembourg

    Frankenstein Labs Germany

    Companies

    Company Company Founder Company Logo

    Allied Clock and Watch Horace Washington Sundial

    Global Robot Nils Neumann Gearbox

    Countries

    Country Continent

    USA N. Amer.

    Mexico N. Amer.

    Belgium Europe

    Italy Europe

    Luxembourg Europe

    4NF

  • 7/31/2019 Database Normalization Mihir

    18/23

    What happens if a company has more than one founder or more than one logo? (Let us

    assume for the sake of the example that both of these things may happen.) One way of

    handling the situation would be to alter the primary key of our Companies table to{Company, Company Founder, Company Logo}. Representing multiple founders and

    multiple logos then becomes possible, but at the price of redundancy:

    Companies

    Company Company Founder Company Logo

    Allied Clock and Watch Horace Washington Sundial

    Global Robot Nils Neumann Gearbox

    International Broom Gareth Patterson Whirlwind

    International Broom Sandra Patterson Whirlwind

    International Broom Gareth Patterson Sweeper

    International Broom Sandra Patterson Sweeper

    This type of redundancy reflects the fact that the design does not conform to 4NF. We

    correct the design by separating facts about founders from facts about logos.

    Suppliers and Parts By Company Division

    Company Division Part Type Supplier

    Allied Clock and Watch Clocks Spring Tensile Globodynamics

    Allied Clock and Watch Clocks Pendulum Tensile Globodynamics

    Allied Clock and Watch Clocks Spring Pieza de Acero

    Allied Clock and Watch Clocks Toothed Wheel Pieza de Acero

    Allied Clock and Watch Watches Quartz Crystal Microflux

    Allied Clock and Watch Watches Tuning Fork Microflux

    Allied Clock and Watch Watches Battery Dakota Electrics

    Global Robot Industrial Robots Flywheel Wheels 4 Less

    Global Robot Industrial Robots Axle Wheels 4 Less

    Global Robot Industrial Robots Axle TransEuropa

    Global Robot Industrial Robots Mechanical Arm TransEuropa

    Global Robot Domestic Robots Artificial Brain Prometheus Labs

    Global Robot Domestic Robots Artificial Brain Frankenstein Labs

    Global Robot Domestic Robots Metal Housing Pieza de Acero

    Global Robot Domestic Robots Backplate Pieza de Acero

    Companies

    Company

    Allied Clock and Watch

    Global Robot

    International Broom

    Company Logos

    Company Company Logo

    Allied Clock and Watch Sundial

  • 7/31/2019 Database Normalization Mihir

    19/23

    Global Robot Gearbox

    International Broom Whirlwind

    International Broom Sweeper

    Company Founders

    Company Company Founder

    Allied Clock and Watch Horace Washington

    Global Robot Nils Neumann

    International Broom Gareth Patterson

    International Broom Sandra Patterson

    Suppliers

    Supplier Supplier Country

    Tensile Globodynamics USA

    Pieza de Acero Mexico

    Microflux Belgium

    Dakota Electrics USA

    Wheels 4 Less USA

    TransEuropa Italy

    Prometheus Labs Luxembourg

    Frankenstein Labs Germany

    Countries

    Country Continent

    USA N. Amer.

    Mexico N. Amer.

    Belgium Europe

    Italy Europe

    Luxembourg Europe

    5NF

    We know that the Clocks division of Allied Clock and Watch relies upon its suppliers to

    provide springs, pendulums, and toothed wheels. We also know that the Clocks divisiondeals with suppliers Tensile Globodynamics and Pieza de Acero. Let us suppose for the

    sake of the example that the following rule applies: if a supplier that a division deals with

    offers a part that the division needs, the division will always purchase it. If, for example,Tensile Globodynamics start producing Toothed Wheels, then Allied Clock and Watch will

    start purchasing them. This rule leads to redundancy in our design as it stands, causing it to

    fall short of 5NF. We correct the design by recording part-types-by-company-division

    separately from suppliers-by-company-division, and adding a further table that providesinformation as to which suppliers offer which parts.

    Part Types By Company Division

    Company Division Part Type

    Allied Clock and Watch Clocks Spring

  • 7/31/2019 Database Normalization Mihir

    20/23

    Allied Clock and Watch Clocks Pendulum

    Allied Clock and Watch Clocks Toothed Wheel

    Allied Clock and Watch Watches Quartz Crystal

    Allied Clock and Watch Watches Tuning Fork

    Allied Clock and Watch Watches Battery

    Global Robot Industrial Robots Flywheel

    Global Robot Industrial Robots Axle

    Global Robot Industrial Robots Mechanical Arm

    Global Robot Domestic Robots Artificial Brain

    Global Robot Domestic Robots Metal Housing

    Global Robot Domestic Robots Backplate

    Suppliers By Company Division

    Company Division Supplier

    Allied Clock and Watch Clocks Tensile Globodynamics

    Allied Clock and Watch Clocks Pieza de Acero

    Allied Clock and Watch Watches Microflux

    Allied Clock and Watch Watches Dakota Electrics

    Global Robot Industrial Robots Wheels 4 Less

    Global Robot Industrial Robots TransEuropa

    Global Robot Domestic Robots Prometheus Labs

    Global Robot Domestic Robots Frankenstein Labs

    Global Robot Domestic Robots Pieza de Acero

    Parts By Supplier

    Part Type Supplier

    Spring Tensile Globodynamics

    Pendulum Tensile Globodynamics

    Spring Pieza de Acero

    Toothed Wheel Pieza de Acero

    Quartz Crystal Microflux

    Tuning Fork Microflux

    Battery Dakota Electrics

    Flywheel Wheels 4 Less

    Axle Wheels 4 Less

    Axle TransEuropa

    Mechanical Arm TransEuropa

    Artificial Brain Prometheus Labs

    Artificial Brain Frankenstein Labs

    Metal Housing Pieza de Acero

    Backplate Pieza de Acero

    Companies

    Company Company Logo

    Allied Clock and Watch Sundial

  • 7/31/2019 Database Normalization Mihir

    21/23

    Global Robot Gearbox

    Company Founders

    Company Company Founder

    Allied Clock and Watch Horace Washington

    Global Robot Nils Neumann

    International Broom Gareth Patterson

    International Broom Sandra Patterson

    Suppliers

    Supplier Supplier Country

    Tensile Globodynamics USA

    Pieza de Acero Mexico

    Microflux Belgium

    Dakota Electrics USA

    Wheels 4 Less USA

    TransEuropa Italy

    Prometheus Labs Luxembourg

    Frankenstein Labs Germany

    Countries

    Country Continent

    USA N. Amer.

    Mexico N. Amer.

    Belgium Europe

    Italy Europe

    Luxembourg Europe

    Denormalization

    Main article:Denormalization

    Databases intended forOnline Transaction Processing (OLTP)are typically more

    normalized than databases intended forOn Line Analytical Processing (OLAP). OLTPApplications are characterized by a high volume of small transactions such as updating a

    sales record at a super market checkout counter. The expectation is that each transaction

    will leave the database in a consistent state. By contrast, databases intended for OLAPoperations are primarily "read mostly" databases. OLAP applications tend to extract

    historical data that has accumulated over a long period of time. For such databases,redundant or "denormalized" data may facilitate Business Intelligenceapplications.Specifically, dimensional tables in a star schemaoften contain denormalized data. The

    denormalized or redundant data must be carefully controlled during ETL processing, and

    users should not be permitted to see the data until it is in a consistent state. The normalized

    alternative to the star schema is the snowflake schema. It has never been proven that thisdenormalization itself provides any increase in performance, or if the concurrent removal

    http://en.wikipedia.org/wiki/Denormalizationhttp://en.wikipedia.org/wiki/Denormalizationhttp://en.wikipedia.org/wiki/Online_transaction_processinghttp://en.wikipedia.org/wiki/Online_transaction_processinghttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/Business_Intelligencehttp://en.wikipedia.org/wiki/Business_Intelligencehttp://en.wikipedia.org/wiki/Dimension_tablehttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Extract%2C_transform%2C_loadhttp://en.wikipedia.org/wiki/Snowflake_schemahttp://en.wikipedia.org/wiki/Denormalizationhttp://en.wikipedia.org/wiki/Online_transaction_processinghttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/Business_Intelligencehttp://en.wikipedia.org/wiki/Dimension_tablehttp://en.wikipedia.org/wiki/Star_schemahttp://en.wikipedia.org/wiki/Extract%2C_transform%2C_loadhttp://en.wikipedia.org/wiki/Snowflake_schema
  • 7/31/2019 Database Normalization Mihir

    22/23

    of data constraints is what increases the performance. The need for denormalization has

    waned as computers and RDBMS software have become more powerful.

    Denormalization is also used to improve performance on smaller computers as incomputerized cash-registers and mobile devices, since these may use the data for look-up

    only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for aplatform (such as Palm), or no changes are to be made to the data and a swift response is

    crucial.

    Non-first normal form (NF)

    In recognition that denormalization can be deliberate and (dubiously) useful, the non-first

    normal form is a definition of database designs which do not conform to the first normal

    form, by allowing "sets and sets of sets to be attribute domains" (Schek 1982). This

    extension is a (non-optimal) way of implementing hierarchies in relations. Sometheoreticians have dubbed this practitioner developed method, "First Ab-normal Form",

    Codd defined a relational database as using relations, so any table not in 1NF could not be

    considered to be relational.

    Consider the following table:

    Non-First Normal Form

    Person Favorite Colors

    Bob blue, red

    Jane green, yellow, red

    Assume a person has several favorite colors. Obviously, favorite colors consist of a set ofcolors modeled by the given table.

    To transform this NF table into a 1NF an "unnest" operator is required which extends the

    relational algebra of the higher normal forms. The reverse operator is called "nest" which is

    not always the mathematical inverse of "unnest", although "unnest" is the mathematicalinverse to "nest". Another constraint required is for the operators to bebijective, which is

    covered by the Partitioned Normal Form (PNF).

    Further reading

    Litt's Tips: Normalization

    http://en.wikipedia.org/wiki/Bijectionhttp://en.wikipedia.org/w/index.php?title=Partitioned_Normal_Form&action=edithttp://en.wikipedia.org/w/index.php?title=PNF&action=edithttp://www.troubleshooters.com/littstip/ltnorm.htmlhttp://en.wikipedia.org/wiki/Bijectionhttp://en.wikipedia.org/w/index.php?title=Partitioned_Normal_Form&action=edithttp://en.wikipedia.org/w/index.php?title=PNF&action=edithttp://www.troubleshooters.com/littstip/ltnorm.html
  • 7/31/2019 Database Normalization Mihir

    23/23

    Date, C. J., & Lorentzos, N., & Darwen, H. (2002).Temporal Data & the

    Relational Model(1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.

    Zimyani, E (2006), Temporal Aggregates and Temporal Universal Quantificationin Standard SQL ACM SIGMOD Record, Vol 35, Number 2, June 2006.

    Date, C. J. (1999),An Introduction to Database Systems (8th ed.). Addison-Wesley

    Longman. ISBN 0-321-19784-4. Kent, W. (1983)A Simple Guide to Five Normal Forms in Relational Database

    Theory, Communications of the ACM, vol. 26, pp. 120-125

    Date, C.J., & Darwen, H., & Pascal, F.Database Debunkings

    H.-J. Schek, P.Pistor Data Structures for an Integrated Data Base Management and

    Information Retrieval System

    References

    1. ^ His term eliminate is misleading, as nothing is "lost" in normalization. Heprobably described eliminate in a mathematical sense to mean elimination of

    complexity.2. ^ Codd, Edgar F. (June 1970). "A Relational Model of Data for Large Shared Data

    Banks". Communications of the ACM13 (6): 377-387.3. ^ DBDebunk

    4. ^ Zimyani

    http://www.elsevier.com/wps/product/cws_home/680662http://www.elsevier.com/wps/product/cws_home/680662http://www.elsevier.com/wps/product/cws_home/680662http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=1558608559http://www.sigmod.org/sigmod/record/issues/0606/p16-article-zimanyi.pdfhttp://www.sigmod.org/sigmod/record/issues/0606/p16-article-zimanyi.pdfhttp://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.htmlhttp://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.htmlhttp://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=0321197844http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=0321197844http://www.bkent.net/Doc/simple5.htmhttp://www.bkent.net/Doc/simple5.htmhttp://www.bkent.net/Doc/simple5.htmhttp://www.dbdebunk.com/http://en.wikipedia.org/wiki/Database_normalization#_ref-0%23_ref-0http://en.wikipedia.org/wiki/Database_normalization#_ref-1%23_ref-1http://en.wikipedia.org/wiki/Edgar_F._Coddhttp://www.acm.org/classics/nov95/toc.htmlhttp://www.acm.org/classics/nov95/toc.htmlhttp://en.wikipedia.org/wiki/Communications_of_the_ACMhttp://en.wikipedia.org/wiki/Database_normalization#_ref-DBDebunk_0%23_ref-DBDebunk_0http://www.dbdebunk.com/page/page/621935.htmhttp://en.wikipedia.org/wiki/Database_normalization#_ref-Zimyani_0%23_ref-Zimyani_0http://www.elsevier.com/wps/product/cws_home/680662http://www.elsevier.com/wps/product/cws_home/680662http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=1558608559http://www.sigmod.org/sigmod/record/issues/0606/p16-article-zimanyi.pdfhttp://www.sigmod.org/sigmod/record/issues/0606/p16-article-zimanyi.pdfhttp://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.htmlhttp://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=0321197844http://www.bkent.net/Doc/simple5.htmhttp://www.bkent.net/Doc/simple5.htmhttp://www.dbdebunk.com/http://en.wikipedia.org/wiki/Database_normalization#_ref-0%23_ref-0http://en.wikipedia.org/wiki/Database_normalization#_ref-1%23_ref-1http://en.wikipedia.org/wiki/Edgar_F._Coddhttp://www.acm.org/classics/nov95/toc.htmlhttp://www.acm.org/classics/nov95/toc.htmlhttp://en.wikipedia.org/wiki/Communications_of_the_ACMhttp://en.wikipedia.org/wiki/Database_normalization#_ref-DBDebunk_0%23_ref-DBDebunk_0http://www.dbdebunk.com/page/page/621935.htmhttp://en.wikipedia.org/wiki/Database_normalization#_ref-Zimyani_0%23_ref-Zimyani_0