database - microsoft azureclasses.eastus.cloudapp.azure.com/~barr/classes/... · definition...
TRANSCRIPT
database
Comp 205 advanced web programming
1
Definition Database - a collection of structured information
for one or more specific purposes – often represented by a cylinders
2
Definition Database - a collection of structured information
for one or more specific purposes
Relational Database - information is stored in a set of related tables
Table – organizational structure used to represent an entity – Columns are attributes of entity also called fields
• Name & Data Type
– Rows are instances of the entity also called records
3
Music Database - 1st Attempt We can store our data in a delimited text file – Represents One-Table Solution– Problems?
4
Name Ar(st Album Year
TeenageDream
KatyPerry TeenageDream 2010
VivalaVida Coldplay DeathandAllhisFriends 2009
Stronger KanyeWest
GraduaHon 2007
Teenage Dream, Katy Perry, Teenage Dream, 2010 Viva la Vida, Coldplay, Death and All His Friends, 2009 Strong, Kanye West, Graduation, 2007 …
AJributes
Instances
Normal Forms Normalization - process to develop clean DB design
Normal Forms – incremental set of DB designs– increases the number of tables & attributes– try to use most simple form as possible– goal is to have the greatest access to all data with the
fewest operations
5
There are three main reasons to normalize a database:
1. to minimize duplicate data,2. to minimize or avoid data modification issues, and3. to simplify queries.
Reasons for Normalization
The first thing to notice is this table serves many purposes including:
1. Identifying the organization’s salespeople2. Listing the sales offices and phone numbers3. Associating a salesperson with an sales office4. sShowing each salesperson’s customers
Reasons for Normalization
1. Insert Anomoly. We cannot record a new sales office until we alsoknow the sales person.
a. in order to create the record, we need provide a primary key.In our case this is the EmployeeID.
2. Update Anomoly. The same information is recorded in multiplerows.
a. if the office number changes, then there are multiple updatesthat need to be made across all rows.
3. Deletion Anomoly. Deletion of a row can cause more than one setof facts to be removed.
a. if John Hunt retires, then deleting that row cause use to loseinformation about the New York office.
First Normal Form (1NF) 1. All attributes are “single-valued”2. All instances have a unique identifier
The repeating groups of columns now become separate rows in the Customer table linked by the EmployeeID foreign key. A foreign key is a value which matches back to another table’s primary key.
This design is superior to our original table in several ways:
The original design limited each SalesStaffInformation entry to three customers. In the new design, the number of customers associated to each design is practically unlimited.
It was nearly impossible to Sort the original data by Customer. Now, it is simple to sort customers.
The insert and deletion anomalies for Customer have been eliminated. You can delete all the customer for a SalesPerson without having to delete the entire SalesStaffInformaiton row.
First Normal Form (1NF) 1. All attributes are “single-valued”2. All instances have a unique identifier
Does this 1NF work for our Music DB? – No, collaborations between artists
6
Song
Name
ArHst
Album
Year
Genre
RecordLabel
Multiple Tables Multiple-value attribute should be removed by
adding multiple tables
7
Song
Name
Album
Year
Genre
RecordLabel
Ar(st
Name
Country
CountryAbbr
Unique Identifiers We want a way to uniquely identify each song – covers, remakes, songs with same name
Solution: create an artificial ID for each instance in each table – auto-incrementing integer
Turnbull-CS205-Topic11 8
Song
ID
Name
Album
Year
Genre
RecordLabel
Ar(st
ID
Name
Country
CountryAbbr
Relationships Three-types of Relationships – one-to-one - can usually merge two tables
– one-to-many - most common
– many-to-many - most complex
What is the relationship between the song and artist tables?
9
Song
ID
Name
Album
Year
Genre
RecordLabel
M2M
Ar(st
ID
Name
Country
CountryAbbr
2nd Normal Form (2NF) • Everything from 1NF• Non-identifying attributes should be moved
• Idea: if same value appears multiple time for an attribute, it should be another entity or
All the non-key columns are dependent on the table’s primary key.
The primary key uniquely identifies each row in a table.
All columns must depend on the primary key:
in order to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID, to look up the answer.
2nd Normal Form (2NF) • Everything from 1NF• Non-identifying attributes should be moved
• Idea: if same value appears multiple time for an attribute,it should be another entity
Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself,
“Does this column serve to describe what the primary key identifies?”
If you answer “yes,” then the column is dependent on the primary key and belongs in the table.
If you answer “no,” then the column should be moved different table.
When a table is in second normal form, it has a single purpose, such as storing employee information.
2nd Normal Form (2NF) The first issue is the SalesStaffInformation table has two columns which aren’t dependent on the EmployeeID.
The second issue is that there are several attributes which don’t completely rely on the entire Customer table primary key.
2nd Normal Form (2NF)
Since the columns identified in red aren’t completely dependent on the table’s primary key, they belong elsewhere. In both cases, the columns are moved to new tables.
In the case of SalesOffice and OfficeNumber, a SalesOffice was created. A foreign key was then added to SalesStaffInformaiton so we can still describe in which office a sales person is based.
2nd Normal Form (2NF)
The changes to make Customer a second normal form table are trickier.
Rather than move the offending columns CustomerName, CustomerCity, and CustomerPostalCode to new table, recognize that the issue is EmployeeID! The three columns don’t depend on this part of the key.
So remove EmployeeID from the table
2nd Normal Form (2NF)
Now create a table named SalesStaffCustomer to describe which customers a sales person calls upon.
This table has two columns CustomerID and EmployeeID.
Together, they form a primary key.
Separately, they are foreign keys to the Customer and SalesStaffInformation tables respectively.
2nd Normal Form (2NF)
2nd Normal Form (2NF)
You can now eliminate all the sales people, yet retain customer records. Also, if all the SalesOffices close, it doesn’t mean you have to delete the records containing sales people.
The SalesStaffCustomer table is all keys!
This type of table is called an intersection table. An intersection table is useful when you need to model a many-to-many relationship.
2nd Normal Form (2NF) • Everything from 1NF• Non-identifying attributes should be moved
• Idea: if same value appears multiple time for an attribute,it should be another entity
10
Song
ID
Name
Album
Year
Genre
Ar(st
ID
Name
Country
CountryAbbr
M2M
Album
ID
Name
Year
O2M
Genre
ID
Name
O2M
2nd Normal Form (2NF) • Everything from 1NF• Non-identifying attributes should be moved
• Idea: if same value appears multiple time for an attribute,it should be another entity
11
Song
ID
NameM2M
Album
ID
Name
Year
O2M
Genre
ID
Name
O2M
Ar(st
ID
Name
Country
CountryAbbr
3rd Normal Form (3NF) • Everything from 2NF• No Attribute Dependencies
• Idea: don’t allow of bad data entry to corrupt DB
12
Song
ID
NameM2M
Album
ID
Name
Year
O2M
Genre
ID
Name
O2M
Ar(st
ID
Name
Country
CountryAbbr
Country
ID
Name
Abbr
3rd Normal Form (2NF) • Everything from 2NF• No Attribute Dependencies
• Idea: don’t allow of bad data entry to corrupt DB
13
Song
ID
NameM2M
Album
ID
Name
Year
O2M
Genre
ID
Name
O2M
Ar(st
ID
Name
Country
ID
Name
Abbr
O2Mtransitive dependence: a column’s value relies upon another column through a second intermediate column.
see https://www.essentialsql.com/get-ready-to-learn-sql-11-database-third-normal-form-explained-in-simple-english/
Six Important Concepts 1. Entites are tables2. Attributes or Fields are columns of tables
3. Each attributes has a data type (int, string, date)4. Instances or Records are rows of a tables
5. Unique ID for instance is call the “primary key”
6. Relationships encoded as “foreign keys”
14
Foreign Keys For a one-to-many relationship, we add a
“foreign key” to the “many” table.
15
Song
ID
Name
AlbumID
Album
ID
Name
Year
O2M
Song
ID
Name
Album
ID
Name
Year
O2M
Many-To-Many We can implement M2M by adding “join tables” – sometime called junctions– Idea: M2M ≈ M2O + O2M
16
Song
ID
Name
M2MAr(st
ID
Name
Song
ID
Name
O2MAr(st
ID
Name
O2MSongToAr(st
ID
SongID
Ar(stID
Putting it all together
17
Album
ID
Name
Year
Song
ID
Name
AlbumID
GenreID
Genre
ID
Name
Ar(st
ID
Name
CountryID
Country
ID
Name
Abbr
SongToAr(st
ID
SongID
ArHstID
Summary: DB Schema Creation Algorithm
1. Identify Major Entities– draw a box for each table
2. Figure out attributes for each entity– add integer id– name & data type
3. Figure out relationship between each pair of entities– O2O – combine entities– O2M – add foreign key to– M2M – create a new join table
18
Exercise
Design a database schema for keeping track of class rosters (e.g., Homer): Hints: Consider students, courses and professors Assume each course has at most one professor
19
Next time We will introduce you to SQL – Structure Query Language– Designed to directly encode semantics of DB
• “Select all songs by Kanye West from 2007”
20
sql
COMP 205 advanced web programming
21
Pop Quiz Design a database schema for keeping track of class rosters (e.g., Homer):
Hints: Consider students, courses and professors Assume each course has at most one professor
22
IC Database Schema
23
Student
id
firstname
lastname
gpa
major
CourseidcourseNumberdaysHmeroominstructorID
StudentToCourseidstudentIDcourseID
Instructoridfirstnamelastnameemail
Rules:1) TableareCapitalizedCammelback2) ProperHesare(lowercase)Cammelback3) FirstAJributeisalwaysthe“id”4) JoinTablesarecalled“Table1ToTable2”
Why Databases? Make it easy to relate, store, and retrieve data
24
client request
response
serverwebserver
24database
server-sideprogram
SQL Structured Query Language
standard for most DBs – mysql, sqlite3, postgres
Uses: – create database “schema”– insert, update, delete data– “query” the database of information
25
Learn by doing…
26