Today’s Agenda
• Review First Week
• Data Modeling, ER Diagrams
• Normalization techniques
– 2nd Normal Form
– 3rd Normal Form
• Physical Data Model
First Normal Form
• First Normal Form (1NF) occurs when all
attributes are single valued.
– No repeating or attributes with multiple values
• Examples:
– A Movie entity with attributes actor1, actor2,
actor3.
– A Sundae entity with a “toppings” attributes
In Class Exercise
• Create an Data Model in 1st Normal Form
for the following applications:
1. Recipes
2. Dating Service
3. Bookstore
4. Photo Sharing
5. Movie Collection
ER Diagram Terminology• NULL: The database term for a value that
does not exist.
– What attributes in our data model could be
NULL?
• Unique Identifiers (IDs)
– Every entity needs a Unique Identifier
– It must be unique across all instances of an
entity
– It must not be NULL
– Its value must never change.
Unique Identifiers
• How do we pick IDs?
– From attributes
• What are the IDs in our data model?
– Auto-generated IDs
• Very common
• Security Issues
Relationships• An association between two entities
• Indicates the degree of the relationship
– one and only one (also zero or one)
– one or many (also zero or many)
• Examples:
– A Donor gives one or many Donations
– A Donation is given by one and only one
Donor
• What are the relationships in our model?
Relationships
• Three degrees make three types of
relationships
• One to one
– rare
• One to many
– very common
• Many to many
– will need special handling
ER Diagrams
• Entities are rectangles
• Attributes are ellipses
• Unique IDs are underlined
• Relationships are lines between entities
– Straight line for one and only one
– “Crow’s foot” for one or many
– Can use bars and circles to represent one or
zero
Junction Entities
• Many to many relationships can be hard to
represent in a RDBMS
• They are replaced with junction entities
– Take the many-to-many relationship
– Replace it with an entity
– Create two new one-to-many relationships to
the new entity
• Which side should the “many” be?
New Example DataDonor
ID Name Address Phone Email
1 Fred
Smith
123
Bedrock
555-
1212
f@fred.
com
2 Beth
Kirsh
104
Ballard
555-
1234
b@kirsh
.com
3 Erin
Lovett
1580
Stone Ln
555-
5098
e@erin.
com
Donation
ID Amount Date P Name
1 100.00 01/02/04 Martha
2 250.00 12/11/04 Jim
3 10.00 09/07/04 Jim
4 100.00 02/02/04 Jim
Division
ID Name
1 Marketing
2 Child-care
3 Trips
DonationToDivision
ID Percentage
1 100%
2 50%
3 50%
ER Diagram Terminology• Non-identifying attribute: An attribute that
is not the Unique ID and is dependent on
the Unique ID.
• Repeating entries are often a sign of an
non-dependent attribute
• Examples:
– Is Donor Name a non-identifying attribute?
– Is Processor name?
2nd Normal Form (2NF)
• Model has to be in 1NF
• All attributes must be non-identifying
attributes.
• To make 2NF, we have two options
– Create a new entity for the attribute
– Move the attribute to the entity where it really
belongs
2nd Normal Form
• Don’t simply look for repeating entries to determine 2NF
• Example: Is percentage already in 2NF? Many entries have 100% for their value
– Yes
– The value is dependent on the DonationToDivisionID
– Percentage also doesn’t make sense as an Entity: it has no attributes other than itself, and 75% isn’t a “thing”.
3rd Normal Form
• Must already be in 2nd normal form
• Non-identifying attributes cannot be
dependent on each other.
• Examples:
– Employee(eid, name, position, salary)
– Address(street, city, state, state abbr.)
• Move the dependent attributes into a new
Entity
In Class Exercise
• Update your Data Models to 3rd Normal
Form for the following applications:
1. Recipes
2. Dating Service
3. Bookstore
4. Photo Sharing
5. Movie Collection
Physical Database Design
• ER Diagram completed – review design
carefully
• Time to convert our conceptual ER
diagram into a real database system.
Physical Database Design
• Step 1
– Convert all entities into tables
• A database is typically made up of many tables
• A table is made up of columns and rows
• Each row in a table represents one instance of
the entity
Physical Database Design
• Step 2
– Attributes become columns in the tables
• Important to pick the appropriate data type for the
columns
• More on data types later
Physical Database Design
• Step 3
– Unique IDs become primary keys
• Remember, they cannot be NULL, and no
duplicates are allowed
• Primary key is the just the database name
for an entity’s unique ID
– Primary keys are automatically indexed by
the database (more on this later)
Physical Database Design
• Step 4
– Relationships become foreign keys in one table of the
relationship.
• A foreign key is a unique ID of another table.
– This creates a reference to a unique row in another
table
• This simply means we have a column in one
table that contains the unique ID of the other
table.
Physical Database Design
• Step 4 Continued
• Which table does the foreign key belong in
for a one-to-many relationship?
– Store the unique ID from the "one" side of the
relationship in the table representing the
"many" side of the relationship
Physical Database DesignDonor
DonorID Name Email address Phone number
Donation
DonationID Date Amount DonorID ProcessorID
Processor
ProcessorID name
DonationToDivision
DonationToDivisionID Percentage DonationID DivisionID
Division
DivisionID name
Primary Keys & Unique IDs
http://www.extension.washington.
edu/dl/webulearningobjects/media
_fit/ids.html
Data Types
• Each database has its own data types
– Most share a common core of data types,
including integers, character strings, and
dates.
• MySQL has 36 different data types
Data Types
• Numeric Types
– store numeric data such as integers and
floating point numbers
– Modifiers:
• UNSIGNED: 0 to 255 instead of -128 to 127
• AUTO_INCREMENT: integers only, one per table
Data Types
• Numeric Types Cont.
Numeric Data Types
INT (also INTEGER) a simple whole number, like 1 or 4,000 or –2
TINYINT a whole number with a range of only –128 to 128.
For example, use this when you want to store
simple true/false Booleans
FLOAT a floating-point number with single precision
DOUBLE (also REAL) a floating-point number with double precision
DECIMAL (also
NUMERIC)
a floating-point number, but with accurate
precision. This type should be used for all
monetary values
Data Types
• String Types
– store textual data.
– Modifiers:
• BINARY: allows case-sensitive searching
Data Types
• String Types Cont.String Data Types
CHAR
(also CHARACTER)
a text field of a fixed length. When a column
is defined as CHAR, the length of the text
string is fixed and all values stored will use
that much storage space. If a string shorter
than the fixed length is stored, the right side
of the string is padded with white space.
VARCHAR
(also CHARACTER
VARYING)
a text field of varying length. Trailing spaces
are removed, and the storage space is one
byte larger than the size of the text.
Maximum size for this data type is 255
characters. This is a common type used for
short character strings like names, phone
numbers, street addresses, and so on.
TEXT text up to 65 kilobytes in length
MEDIUMTEXT text up to 16 megabytes in length
LONGTEXT text up to four gigabytes in length
Data Types
• Date types
– store dates and times
Date Data Types
DATE stores a date in the format YYYY-MM-DD
DATETIME stores a date and time in the format YYYY-
MM-DD HH:MM:SS
TIME stores a time from "00:00:00" to "23:59:59"
TIMESTAMP stores the current date and time. This type of
column is updated automatically whenever
there are modifications to a record. This type
of field is great for recording when a row is
modified
YEAR stores a four digit year
Data Types
• Complex data types
– Enumerations (ENUM)
• list of predefined strings, value must be one of
them.
– Sets (SET)
• list of predefined strings, value can be any
combination of them.
Current Physical Database DesignDonor
DonorID Name Email address Phone number
Donation
DonationID Date Amount DonorID ProcessorID
Processor
ProcessorID name
DonationToDivision
DonationToDivisionID Percentage DonationID DivisionID
Division
DivisionID name
New Physical Database Design
Column Name Type
DonorID int unsigned
PhoneNumber Varchar(14)
Name Varchar(255)
Address Varchar(255)
Email Varchar(255)
• Donor table: includes types!
Physical Database Design
• Choosing Column Options
– Column options help enforce data integrity
– Can make the programmer’s job easier
– Which makes the DBA’s job easier
• Column Options
– NULLs allowed, default values, auto
incrementing values, and keys
Column Options
• NOT NULL
– By default, columns can contain a NULL
instead of a value; this overrides that behavior
– Requires that some value always exists in
that column for any given row of data.
– Will cause a database error if the programmer
tries to add a NULL to that column.
– What should be NOT NULL in our donation
ER diagram?
Column Options
• DEFAULT value
– If a user doesn’t supply a value for a column,
you can specify a default value
– Example: For a local organization, State and
Country might default to WA, USA
– Should there be any DEFAULT columns in
our donation database?
Column Options
• AUTO_INCREMENT
– Provides a default value to an INTEGER
column
– The value will automatically be incremented
for each insert
– Only one column per table can have this
option.
– Great option for an internal (meaning not
shown to a external user) primary key
Column Options
• PRIMARY KEY
– Creates an index on the column
– Forces each column entry to be unique from
all other column entries
– Automatically is NOT NULL
• UNIQUE
– Just like PRIMARY KEY, without the special
name.
Physical Database Design
• Now includes column options!
Column Name Type Options
DonorID int unsigned Auto_increment primary key
PhoneNumber Varchar(14)
Name Varchar(255) Not NULL
Address Varchar(255)
Email Varchar(255)
Relational Database Schema
• We now have our database schema
• Whereas our E/R Diagram was very abstract, we
now have a very concrete, relational design
Requirements / Ideas Database Schema
E/R Diagram RDBMS