1.1 cas cs 460/660 introduction to database systems relational model and more…

30
1.1 CAS CS 460/660 CAS CS 460/660 Introduction to Database Introduction to Database Systems Systems Relational Model Relational Model and more… and more…

Upload: jewel-sharp

Post on 26-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.1

CAS CS 460/660CAS CS 460/660Introduction to Database SystemsIntroduction to Database Systems

Relational ModelRelational Modeland more…and more…

Page 2: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.2

The Structure SpectrumThe Structure Spectrum

Structured (schema-

first)

Relational DatabaseFormatted Messages

Semi-Structured

(schema-later)

DocumentsXML

Tagged Text/Media

Unstructured (schema-

never)

Plain Text

Media

Page 3: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.3

The Relational ModelThe Relational Model

The Relational Model is UbiquitousMySQL, PostgreSQL, Oracle, DB2, SQLServer, …l Foundational work done at

IBM Santa Teresa Labs (now IBM Almaden in SJ) – “System R” UC Berkeley CS – the “Ingres” System

Note: some Legacy systems use older models e.g., IBM’s IMS

Object-oriented concepts have been merged in Early work: POSTGRES research project at Berkeley Informix, IBM DB2, Oracle 8i

As has support for XML (semi-structured data)

Page 4: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.4

Relational ModelRelational Model

The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.

Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.

Page 5: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.5

Relational Database: DefinitionsRelational Database: Definitions

Relational database: a set of relations

Relation: made up of 2 parts:

Schema : specifies name of relation, plus name and type of each column

Students(sid: string, name: string, login: string, age: integer, gpa: real)

Instance : the actual data at a given time

#rows = cardinality #fields = degree / arity

Page 6: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.7

Ex: Instance of Students RelationEx: Instance of Students Relation

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

• Cardinality = 3, arity = 5 , all rows distinct

• Do all values in each column of a relation instance have to be unique?

Page 7: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.8

SQL - A language for Relational DBsSQL - A language for Relational DBs

Say: “ess-cue-ell” or “sequel” But spelled “SQL”

Data Definition Language (DDL) create, modify, delete relations specify constraints administer users, security, etc.

Data Manipulation Language (DML) Specify queries to find tuples that satisfy criteria add, modify, remove tuples

The DBMS is responsible for efficient evaluation.

Page 8: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.9

The SQL Query LanguageThe SQL Query Language

The most widely used relational query language.

Originally IBM, then ANSI in 1986

Current standard is SQL-2011 2008 added x-query stuff, new triggers,…

2003 was last major update: XML, window functions, sequences, auto-generated IDs. Not fully supported yet

SQL-1999 Introduced “Object-Relational” concepts. Also not fully supported yet.

SQL92 is a basic subset Most systems support at least this

PostgreSQL has some “unique” aspects (as do most systems).

SQL is not synonymous with Microsoft’s “SQL Server”

Page 9: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.10

Creating Relations in SQLCreating Relations in SQL

Creates the Students relation. Note: the type (domain) of each field is specified, and enforced by

the DBMS whenever tuples are added or modified.

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT)

Page 10: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.11

Table Creation (continued)Table Creation (continued) Another example: the Enrolled table holds information about

courses students take.

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))

Page 11: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.12

Adding and Deleting TuplesAdding and Deleting Tuples Can insert a single tuple using:

INSERT INTO Students (sid, name, login, age, gpa) VALUES ('53688', 'Smith', 'smith@ee', 18, 3.2)

• Can delete all tuples satisfying some condition (e.g., name = Smith):

DELETE FROM Students S WHERE S.name = 'Smith'

Powerful variants of these commands are available; more later!

Page 12: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.13

KeysKeys Keys are a way to associate tuples in different relations Keys are one form of integrity constraint (IC)

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

Enrolled Students

PRIMARY KeyFOREIGN Key

Page 13: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.14

Primary KeysPrimary Keys

A set of fields is a superkey if: No two distinct tuples can have same values in all key fields

A set of fields is a key for a relation if : It is a superkey

No subset of the fields is a superkey

what if >1 key for a relation? One of the keys is chosen (by DBA) to be the primary key.

Other keys are called candidate keys.

E.g. sid is a key for Students.

What about name?

The set {sid, gpa} is a superkey.

Page 14: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.15

Primary and Candidate Keys in SQLPrimary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE),

one of which is chosen as the primary key.

• Keys must be used carefully!• “For a given student and course, there is a single

grade.”

“Students can take only one course, and no two students in a course receive the same grade.”

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid))

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))

vs.

Page 15: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.16

Foreign Keys, Referential IntegrityForeign Keys, Referential Integrity

Foreign key: a “logical pointer” Set of fields in a tuple in one relation

that `refer’ to a tuple in another relation. Reference to primary key of the other relation.

All foreign key constraints enforced? referential integrity! i.e., no dangling references.

Page 16: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.17

Foreign Keys in SQLForeign Keys in SQL E.g. Only students listed in the Students relation should be allowed

to enroll for courses. sid is a foreign key referring to Students:

CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students);

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

Enrolled

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

Students

11111 English102 A

Page 17: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.18

Next UpNext Up

We’ll talk a bit about the SQL DML

Then we’ll start describing the DBMS from storage on up

Page 18: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.19

The SQL DMLThe SQL DML

Single-table queries are straightforward.

To find records for all 18 year old students with gpa’s above 2.0, we can write:

SELECT * FROM Students S WHERE S.age=18

AND S.gpa > 2.0

To get just names and logins, replace the first line:SELECT S.name, S.login

Page 19: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.20

Basic SQL QueriesBasic SQL Queries

SELECT [DISTINCT] target-listFROM relation-listWHERE qualification• relation-list : A list of relation names

– possibly with a range-variable after each name

• target-list : A list of attributes of tables in relation-list

• qualification : Comparisons combined using AND, OR and NOT.– Comparisons are Attr op const or Attr1 op

Attr2, where op is one of =≠<>≤≥• DISTINCT: (optional) indicates that the answer should

have no duplicates.

– In SQL SELECT, the default is that duplicates are not eliminated! (Result is called a “multiset”)

Page 20: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.21

Querying Multiple RelationsQuerying Multiple Relations

Can specify a join over two tables as follows:SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

result = S.name E.cid Jones History105

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

Note: obviously no referential integrity constraints have been used here.

Page 21: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.22

Basic Query SemanticsBasic Query Semantics

The Semantics of a SQL query are defined in terms of the following conceptual evaluation strategy:

1. do FROM clause: compute cross-product of tables (e.g., Students and Enrolled).

2. do WHERE clause: Check conditions, discard tuples that fail.

3. do SELECT clause: Delete unwanted fields.

4. If DISTINCT specified, eliminate duplicate rows.

Probably the least efficient way to compute a query! • A query optimizer will find more efficient strategies to get the same answer.

Page 22: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.23

Step 1 – Cross ProductStep 1 – Cross Product

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

Page 23: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.24

Step 2) Discard tuples that fail predicateStep 2) Discard tuples that fail predicate

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

Page 24: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.25

Step 3) Discard Unwanted ColumnsStep 3) Discard Unwanted Columns

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

Page 25: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.26

Aggregate OperatorsAggregate OperatorsFor calculation and analytics COUNT (*)

COUNT ([DISTINCT] A)SUM ([DISTINCT] A)AVG (A)MAX (A)MIN (A)

SELECT AVG (S.gpa)FROM Students SWHERE S.age=18;

SELECT COUNT (*)FROM Students;

SELECT COUNT (DISTINCT S.age)FROM Students SWHERE S.name=‘Bob’;

Page 26: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.27

GROUP BY and HAVINGGROUP BY and HAVING

Sometimes, we want to apply aggs to each of several groups of tuples.

This query computes the average gpa per major (assume students have a “major” attribute)

If you want to exclude “small” majors, use Having:

SELECT S.major, AVG (S.gpa) as AvgGPAFROM Students SGROUP BY S.major ;

SELECT S.major, AVG (S.gpa) as AvgGPAFROM Students SGROUP BY S.majorHAVING COUNT (*) > 10 ;

Page 27: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.28

(Slightly) Less Basic SQL Queries(Slightly) Less Basic SQL Queries

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification

Page 28: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.29

Conceptual EvaluationConceptual Evaluation

The cross-product of relation-list is computed, tuples that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list.

One answer tuple is generated per qualifying group.

Page 29: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.30

Conceptual Evaluation (cont.)Conceptual Evaluation (cont.)

The group-qualification is then applied to eliminate some groups.

Expressions in group-qualification must have a single value per group!

That is, attributes in group-qualification must be arguments of an aggregate op or must also appear in the grouping-list.

One answer tuple is generated per qualifying group.

Page 30: 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

1.31

Okay: Let’s start from the Okay: Let’s start from the bottom up…bottom up…

Query Optimizationand Execution

Relational Operators

Access Methods

Buffer Management

Disk Space Management

Student Records stored on disk

Database app

These layersmust considerconcurrencycontrol andrecovery