comp 430 intro. to database systemsphys 101 1 baker phys 102 2 baker response counter. what is key?...

42
COMP 430 Intro. to Database Systems Single-table SQL Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today! SELECT name FROM sqlite_master WHERE type='table'

Upload: others

Post on 09-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

COMP 430Intro. to Database Systems

Single-table SQL

Slides use ideas from Chris Ré and Chris Jermaine.

Get clickers today!

SELECT name FROM sqlite_master WHERE type='table'

Page 2: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Clicker test – Have you used clickers before?

A. True

B. False

True

False

0%0%

Page 3: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

What is Structured Query Language (SQL)?

“Standard” language for databases• Many related standards: ANSI SQL, SQL92 (= SQL2), SQL99 (= SQL3)• Vendors support various subsets and extensions

Very high-level programming language – Highly optimized, parallelized

Huge language – We’ll cover only a core subset.

Typically called from main application language (Python, Java, …)• SQLexecute(“…SQL CODE…”)

Page 4: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Two languages in one

• Data Definition Language (DDL)• Define relational schemas declarative

• Create/modify/delete tables and their attributes imperative

• Data Manipulation Language (DML)• Query DB for desired data declarative

• Add/modify/delete data in tables imperative

• Define/use functions, procedures, triggers imperative

Page 5: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Basic concepts

Page 6: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Table=Relation

p_name price manufacturer

Gizmo $19.99 GizmoWorks

Powergizmo $39.99 GizmoWorks

Widget $19.99 WidgetsRUs

HyperWidget $203.99 Hyper

Product

Column=attribute=field:A typed data entry in each row.

#columns = arity=degree

Row=tuple=record:A single entry in the table.#rows = cardinality

Heading:Names of each column.

Page 7: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Table=Relation

Product

Table is a multiset of rows. Not ordered. Duplicate rows allowed.

p_name price manufacturer

Gizmo $19.99 GizmoWorks

Powergizmo $39.99 GizmoWorks

Widget $19.99 WidgetsRUs

HyperWidget $203.99 Hyper

Page 8: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Schema defines table attributes

Product

Product (p_name: string, price: float, manufacturer: string)

p_name price manufacturer

Gizmo $19.99 GizmoWorks

Powergizmo $39.99 GizmoWorks

Widget $19.99 WidgetsRUs

HyperWidget $203.99 Hyper

Page 9: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Creating a table

Product (p_name: string, price: float, manufacturer: string)

CREATE TABLE Product (p_name VARCHAR(50),price NUMERIC(6,2),manufacturer VARCHAR(50)

);

Some SQL variants have a CURRENCY or

MONEY type.

Page 10: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

SQL attribute types

All types are atomic!

Examples:• CHAR(n), VARCHAR(n), NCHAR(n), NVARCHAR(n)

• INT, BIGINT, SMALLINT, FLOAT, DECIMAL(m,n)

• BOOLEAN, MONEY, DATETIME

Traditionally. However, sets and arrays are allowed

in some SQL versions.

Page 11: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

NULL data

Any value of any type can be NULL.• Signals missing value.

• Value doesn’t exist.

• Value exists but unknown.

• Value not applicable to this record.

Page 12: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Keys

Key = minimal set of attributes acting as a unique tuple identifier.• Consider the universe of all potential relation data, not just what the table

currently contains.

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

Page 13: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

What is key?

A. dept

B. dept, number

C. dept, number, section

D. dept, number, section, instructor

dept

dept, num

ber

dept, num

ber, s

ectio

n

dept, num

ber, s

ectio

n, inst

ruct

or

25% 25%25%25%

Course

dept number section instructor

MATH 101 1 Jones

MATH 101 2 Smith

MATH 102 1 Williams

PHYS 101 1 Baker

PHYS 102 2 Baker

Response Counter

Page 14: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

What is key?

A. s_name

B. abbrev

C. The pair s_name, abbrev

D. Each of s_name and abbrev

s_nam

e

abbre

v

The p

air s

_name, a

bbrev

Each

of s

_nam

e an

d abbre

v

25% 25%25%25%

s_name abbrev year_admitted

Alabama AL 1819

Alaska AK 1959

Arizona AZ 1912

Arkansas AR 1836

California CA 1850

State

Response Counter

Page 15: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Primary keys

Every table should have one primary key.• Each tuple must have distinct non-NULL values of primary key attributes.

• Guarantees table is a mathematical relation.

Product

Product (p_name: string, price: float, manufacturer: string)

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

Page 16: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Primary keys are a DB-checked constraint

Adding a new record with duplicate or NULL primary key will fail.• Failure reported as exception or erroneous return value.

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

Widget 15.99 GizmoWorks

We will see other kinds of DB-checked constraints.

Product (p_name: string, price: float, manufacturer: string)

Page 17: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Creating a table with primary key

Product (p_name: string, price: float, manufacturer: string)

CREATE TABLE Product (p_name VARCHAR(50),price DECIMAL(6,2),manufacturer VARCHAR(50),PRIMARY KEY (p_name)

);

Page 18: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

NOT NULL constraint

Course (c_id: integer, c_name: string, instructor: string)

CREATE TABLE Course (c_id INTEGER,c_name VARCHAR(50) NOT NULL,instructor VARCHAR(50),PRIMARY KEY (c_id)

);

Page 19: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Adding a single record to a table

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

INSERT INTO Product VALUES (‘MiniWidget’, 21.99, ‘WidgetsRUs’);INSERT INTO Product (p_name, manufacturer)VALUES (‘NanoWidget’, ‘WidgetsRUs’);

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

MiniWidget 21.99 WidgetsRUs

NanoWidget NULL WidgetsRUs

Page 20: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Activity: Creating a table

02a-table.ipynb

Course

dept number section instructor

MATH 101 1 Jones

MATH 101 2 Smith

MATH 102 1 Williams

PHYS 101 1 Baker

PHYS 102 2 Baker

Course (dept, number, section, instructor)

CREATE TABLE Course (…);INSERT INTO Course …;

Page 21: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Activity partial solution

Course (dept, number, section, instructor)

CREATE TABLE Course (dept CHAR(4),number CHAR(3),section INT DEFAULT 1,instructor VARCHAR(50),PRIMARY KEY (dept, number, section)

);

Convenient for this example.

Page 22: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Simple queries

Page 23: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

SELECT-FROM-WHERE

SELECT attributesFROM tablesWHERE conditions

Results in a new table, which can be returned or stored.

Page 24: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Selecting some records

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

SELECT *FROM ProductWHERE Price > 20;

p_name price manufacturer

Powergizmo 39.99 GizmoWorks

HyperWidget 203.99 Hyper

Page 25: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

More selection examples

SELECT *FROM ProductWHERE price IS NOT NULL;

SELECT *FROM ProductWHERE price > 20 AND manufacturer = ‘GizmoWorks’;

SELECT *FROM ProductWHERE manufacturer IN (‘GizmoWorks’, ‘WidgetsRUs’);

SELECT *FROM ProductWHERE price BETWEEN 20 AND 40;

Page 26: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Selection with pattern matching

SELECT *FROM ProductWHERE p_name LIKE ‘%Gizmo%’;

% = Match any sequence of 0-or-more characters_ = Match any single character[abc] = Match any one character listed[a-c] = Match any one character in range

Page 27: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Projecting some fields

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

SELECT p_name, manufacturerFROM Product;

p_name manufacturer

Gizmo GizmoWorks

Powergizmo GizmoWorks

Widget WidgetsRUs

HyperWidget Hyper

Answer (p_name, manufacturer)

Page 28: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Combining selection & projection

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

SELECT p_name, manufacturerFROM ProductWHERE price > 20;

p_name manufacturer

Powergizmo GizmoWorks

HyperWidget Hyper

Page 29: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Results not necessarily distinct

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

SELECT manufacturerFROM Product;

manufacturer

GizmoWorks

GizmoWorks

WidgetsRUs

Hyper

Tables are multisets!

Answer (manufacturer)

Page 30: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Making results distinct

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

SELECT DISTINCT manufacturerFROM Product;

manufacturer

GizmoWorks

WidgetsRUs

Hyper

Ensures results are a set.

Page 31: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Query semantics – set notation

SELECT [DISTINCT] a1, a2, …, am

FROM TWHERE Conditions(a’1, a’2, …, a’p);

{(a1, a2, …, am) | Conditions(a’1, a’2, …, a’p)}Multisets by default.Sets with DISTINCT.

Page 32: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Query semantics – sequence of ops

SELECT [DISTINCT] a1, a2, …, am

FROM TWHERE Conditions(a’1, a’2, …, a’p);

Answer = {}for row in T do

if Conditions(row.a’1, row.a’2, …, row.a’p)then Answer = Answer {(row.a1, row.a2, …, row.am)}

return Answer

Multiset union by default.Set union with DISTINCT.

Page 33: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Subset of results

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

SELECT p_name, manufacturerFROM ProductLIMIT 2;

p_name manufacturer

Gizmo GizmoWorks

Powergizmo GizmoWorks

Which 2 is implementation-

dependent.

Page 34: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Computation in SELECT clause

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

MiniWidget 21.99 WidgetsRUs

NanoWidget NULL WidgetsRUs

SELECT p_name, IsNull(price, 0) AS priceFROM Product;

p_name price

Gizmo 19.99

Powergizmo 39.99

Widget 19.99

HyperWidget 203.99

MiniWidget 21.99

NanoWidget 0

Page 35: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

More computation in SELECT clauses

SELECT location, time, celsius * 1.8 + 32 AS fahrenheitFROM SensorReading;

SELECT player_id, Floor(height) AS feet, (height – Floor(height)) * 12 AS inchesFROM Player;

Use AS. Without it, SQL will create a default field name.

Page 36: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Sorting

Product

p_name price manufacturer

Gizmo 19.99 GizmoWorks

Powergizmo 39.99 GizmoWorks

Widget 19.99 WidgetsRUs

HyperWidget 203.99 Hyper

SELECT p_name, manufacturerFROM ProductORDER BY price DESC, manufacturer;

p_name manufacturer

HyperWidget Hyper

Powergizmo GizmoWorks

Gizmo GizmoWorks

Widget WidgetsRUs

Page 37: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Sorting vs. Multiset semantics

Table rows are unordered, except when they’re ordered.

More accurately, unless you use ORDER BY:• Can’t assume anything about ordering.

• Ordering depends on implementation, which can vary.

• Queries don’t necessarily maintain order of original table.

Page 38: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Activity: Writing queries

02b-queries.ipynb

Page 39: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Some details

Page 40: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

NULL semantics

NULL is not a value. It is the lack of a value.

In numeric operations:• f(NULL) NULL

In Boolean operations, we use 3-value logic (FALSE, UNKNOWN, TRUE):• NULL = ‘Houston’ UNKNOWN

• NULL = NULL UNKNOWNA NOT A

F T

U U

T F

A AND BB

F U T

A

F F F F

U F U U

T F U T

A OR BB

F U T

A

F F U T

U U U T

T T T T

Page 41: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Syntax details

• Strings use single quotes, not double ‘Houston’

• Equality test uses single =, not double x = 5

• Amount of whitespace doesn’t matter

Page 42: COMP 430 Intro. to Database SystemsPHYS 101 1 Baker PHYS 102 2 Baker Response Counter. What is key? A. s_name B. abbrev C. The pair s_name, abbrev D. Each of s_name and abbrev e v

Syntax details – case sensitivity

• Case insensitive• Keywords: SELECT, NULL• Table names: Product• Function names: IsNull()

• Depends on SQL version• Attribute names: product_id

• Case sensitive• String literals: ‘HOUSTON’, ‘Houston’, ‘houston’