cmpt354: databasesystemi · sql provides pattern matching support with the like operator and two...
TRANSCRIPT
CMPT 354:Database System I
Lecture 3. SQL Basics
1
Announcements!
• About Piazza• 97 enrolled (as of today)• Postsareanonymous to classmates
• You should have starteddoingA1• Please come to office hours if you need any help
2
SQLMotivation• Darktimesin2000s• Arerelationaldatabasesdead?
• Now,asbefore:everyonesellsSQL• Pig,Hive,Impala• SparkSQL
• NoSQL• “NonSQL”• “Not-Only-SQL”• “Not-Yet-SQL”
3
SQL:Introduction
• “S.Q.L.”or“sequel”• Supportedbyallmajorcommercialdatabasesystems• Standardized– manynewfeaturesovertime• Declarativelanguage
4
5
SQLisa…
• DataDefinitionLanguage(DDL)• Definerelationalschema• Create/alter/deletetablesandtheirattributes
• DataManipulationLanguage(DML)• Insert/delete/modifytuplesintables• Queryoneormoretables– discussednext!
Outline
• Single-table Queries• The SFW query• Usefuloperators:DISTINCT,ORDER BY,LIKE• Handlemissingvalues:NULLs
• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics
6
• To write the query, ask yourself three questions:• Which table are you interested in?• Which rows are you interested in?• Which columns are you interested in?
The SFWQuery
7
SELECT <columns>FROM <table name>WHERE <conditions>
Conditions
• Which rows are you interested in?• WHERE gpa > 3.5• WHERE school = ‘SFU’ AND gpa > 3.5• WHERE (school = ‘SFU’ OR school = ‘UBC’) AND gpa > 3.5• WHERE age *365 > 7500
8
SELECT <columns>FROM <table name>WHERE <conditions>
Columns
• Which columns are you interested in?• SELECT *• SELECT name, age• SELECT name as studentName, age• SELECT name, age *365 as ageDay
9
SELECT <columns>FROM <table name>WHERE <conditions>
AFewDetails
• SQLcommands arecaseinsensitive:• Same:SELECT,Select,select• Same:Student,student• Same:gpa, GPA
• Values arenot:• Different: 'SFU','sfu'
• SQLstringsareenclosedinsinglequotes• e.g.name='Mike’• Singlequotesinastringcanbespecifiedusinganinitialsinglequotecharacterasanescape• author='ShaqO''Neal'
• Stringscanbecomparedalphabetically withthecomparisonoperators• e.g.'fodder'<'foo'isTRUE
10
11
DISTINCT:EliminatingDuplicates
SELECT DISTINCT SchoolFROM Students
Versus
SELECT SchoolFROM Students
School
SFU
UBC
UT
School
SFU
SFU
UBC
UT
UT
12
ORDERBY:SortingtheResults
SELECT name, gpa, ageFROM StudentsWHERE school = 'SFU'ORDER BY gpa DESC, age ASC
• TheoutputofanSQLquerycanbeordered• Byanynumberofattributes,and• Ineitherascendingordescendingorder
• Thedefaultistouseascendingorder,thekeywordsASCandDESC,followingthecolumnname,setstheorder
13
LIKE:SimpleStringPatternMatching
SELECT *FROM StudentsWHERE name LIKE ' Sm_t%'
SQLprovidespatternmatchingsupportwiththeLIKEoperatorandtwosymbols• The% symbolstandsforzeroormorearbitrarycharacters• The_ symbolstandsforexactlyonearbitrarycharacter• The% and _ characterscanbeescapedwith\
• E.g., nameLIKE ’Michael\_Jordan'
Exercise - 1
• Which names will be returned?
14
SELECT *FROM StudentsWHERE name LIKE 'Sm_t%'
1. Smit2. SMIT3. Smart4. Smith5. Smythe6. Smut7. Smeath8. Smt
Exercise - 1
• Which names will be returned?
15
SELECT *FROM StudentsWHERE name LIKE 'Sm_t%'
1. Smit2. SMIT3. Smart4. Smith5. Smythe6. Smut7. Smeath8. Smt
1, 4, 5, 6
16
NULLSinSQL
• Wheneverwedon’thaveavalue,wecanputaNULL• Canmeanmanythings:
• Valuedoesnotexists• Valueexistsbutisunknown• Valuenotapplicable• Etc.
• NULLconstraints
CREATE TABLE Students (name CHAR(20) NOT NULL,age CHAR(20) NOT NULL,gpa FLOAT
)
What will happen?
name age gpa
Mike 20 4.0
Joe 18 NULL
Alice 21 3.8
17
1. SELECT gpa*100 FROM students2. SELECT name FROM students WHERE gpa > 3.53. SELECT name FROM students WHERE age > 15 OR gpa > 3.5
Two Important Rules
• Arithmeticoperations (+, -, *, /) onnullsreturnNULL• NULL *100
• NULL• NULL*0
• NULL
• ComparisonswithnullsevaluatetoUNKNOWN• NULL > 3.5
• UNKNOWN• NULL=NULL
• UNKNOWN
18
1. SELECT gpa*100 FROM students
2. SELECT gpa*0 FROM students
3. SELECT name FROM students WHERE gpa > 3.5
4. SELECT name FROM students WHERE gpa = NULL
Combinations of true, false, unknown
• Truthvaluesforunknown results• true OR unknown =true,• falseOR unknown= unknown,• unknown OR unknown =unknown ,• true AND unknown =unknown,• false AND unknown =false,• unknown AND unknown =unknown
• TheresultofaWHERE clauseistreatedasfalse ifitevaluatestounknown• WHERE unknownà false
19
SELECT *FROM students WHEREage > 15 OR gpa > 3.5
SELECT *FROM students WHEREage > 15 AND gpa > 3.5
What will happen?
name age gpa
Mike 20 4.0
Joe 18 NULL
Alice 21 3.8
20
1. SELECT gpa*100 FROM students2. SELECT name FROM students WHERE gpa > 3.53. SELECT name FROM students WHERE age > 15 OR gpa > 3.5
gpa
400
NULL
380
name
Mike
Alice
name
Mike
Joe
Alice
21
Exercise - 2
• Will it return all students?
SELECT *FROM StudentsWHERE age < 25 OR age >= 25
22
Exercise - 2
• Will it return all students?
SELECT *FROM StudentsWHERE age < 25 OR age >= 25
OR age is NULL
Therearespecialoperatorstotestfornullvalues• ISNULLtestsforthepresenceofnullsand• ISNOTNULL testsfortheabsenceofnulls
Outline
• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs
• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics
23
• Foreign-keyconstraint:• student_id referencessid
ForeignKeyconstraints
sid name gpa101 Bob 3.2123 Mary 3.8
student_id cid grade123 354 A123 454 A+156 354 A
Students Enrolled
24
• Foreign-keyconstraint:• student_id referencessid
ForeignKeyconstraints
sid name gpa101 Bob 3.2123 Mary 3.8156 Mike 3.7
Students Enrolledstudent_id cid grade
123 354 A123 454 A+156 354 A
25
DeclaringForeignKeys
CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)
)
student_id cid grade123 354 A123 454 A+156 354 A
26
Insertoperations• WhatifweinsertatupleintoEnrolled,butnocorrespondingstudent?• INSERTisrejected
27
sid name gpa123 Mary 3.8156 Mike 3.7
Students Enrolledstudent_id cid grade
123 354 A123 454 A+156 354 A190 354 A
Delete operations• Whatifwedeleteastudent,whohasenrolledcourses?• Disallowthedelete(ONDELETERESTRICT)
28
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A123 454 A+156 354 A
156 Mike 3.7
ONDELETERESTRICT
CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)ON DELETE RESTRICT
)
student_id cid grade123 354 A123 454 A+156 354 A
29
Delete operations• Whatifwedeleteastudent,whohasenrolledcourses?• Removeallofthecoursesforthatstudent(ONDELETE
CASCADE)
30
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A123 454 A+156 Mike 3.7156 354 A
ONDELETECASCADE
CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)ON DELETE CASCADE
)
student_id cid grade123 354 A123 454 A+156 354 A
31
Delete operations• Whatifwedeleteastudent,whohasenrolledcourses?• Set Foreign Key to NULL (ONDELETESETNULL)
32
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A123 454 A+156 354 A
156 Mike 3.7NULL 354 A
Interestingly, although it satisfies the foreign-key constraint,it violates the primary-key constraint, thus the deletionoperation is disallowed.
ONDELETESETNULL
CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)ON DELETE SET NULL
)
student_id cid grade123 354 A123 454 A+156 354 A
33
Outline
• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs
• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics• SetOperators
34
Why do we havemultiple tables?
35
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A123 454 A+156 Mike 3.7156 354 A
EnrolledStudents
student_id name gpa cid grade123 Mary 3.8 354 A123 Mary 3.8 454 A+156 Mike 3.7 354 A
VS.
• Multiple tables• Data updating is easier (e.g., update Mary’s gpa to 3.9)• Queryingeachindividualtableis faster (e.g., retrieveMary’s gpa)
• Asingle table• Data exchange is easier (e.g., share your data withothers)• Avoid the cost of joining multiple tables (e.g., retrievalall the courses that Mary has taken)
36
Storedataintomultiple tablesvs.singletable
Joins
37
SELECT <columns>FROM <table name>WHERE <conditions>
SELECT <columns>FROM <table names>WHERE <conditions>
TheSFWqueryoverasingletable
TheSFWqueryovermultipletables
Whichrowsareyouinterestedin?
Whichrowsareyouinterestedin?
Howtojointhemultipletables?
Joins:Example
38
SELECT name
FROM Students, Enrolled
WHERE sid = student_id AND
cid = 354 AND grad = ‘A+’Whichrowsareyouinterestedin?
Howtojointhetwotables?
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
FindallstudentwhohavegotanA+in354;returntheirnamesandgpas
Otherwaystowritejoins
39
SELECT nameFROM Students, EnrolledWHERE sid = student_id AND
cid = 354 AND grad = ‘A+’
SELECT nameFROM StudentsJOIN Enrolled ON sid = student_id
AND cid = 354 AND grad = ‘A+’
SELECT nameFROM StudentsJOIN Enrolled ON sid = student_idWHERE cid = 354 AND grad = ‘A+’
The Need fo TupleVariable
40
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid name grade123 354 DBI A+123 454 DBII A+156 354 DBI A
156 Mike 3.7
SELECT name
FROM Students, Enrolled
WHERE sid = student_id
Whichname?
TupleVariable
41
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid name grade123 354 DBI A+123 454 DBII A+156 354 DBI A
156 Mike 3.7
SELECT Students.nameFROM Students, EnrolledWHERE sid = student_id
SELECT S.nameFROM Students S, EnrolledWHERE sid = student_id
Outline
• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs
• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics• SetOperators
42
Meaning(Semantics)ofJoinQueries
43
SELECT x1.a1, x1.a2, …, xn.akFROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions(x1,…, xn)
Answer={}for x1 in R1 do
for x2 in R2 do…..
for xn in Rn doif Conditions(x1,…,xn)
then Answer=AnswerÈ {(x1.a1,x1.a2,…,xn.ak)}return Answer
Thisiscallednestedloopsemantics sinceweareinterpretingwhatajoinmeansusinganestedloop
Note:this isamultisetunion
Threesteps
1. Takecrossproduct• R1 × R2 × … × Rn
2. Applyconditions• Conditions(x1,…, xn)
3. Applyprojections• x1.a1, x1.a2, …, xn.ak
44
SELECT x1.a1, x1.a2, …, xn.akFROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions(x1,…, xn)
Note:ThisisNOThowtheDBMSexecutesthequery.
Exercise
45
SELECT nameFROM Students, EnrolledWHERE sid = student_id AND grade >= ‘A’
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
nameMary
nameMaryMike
nameMaryMikeMary
nameMaryMaryMike
Whichone(s)arecorrect?
(A) (B) (C) (D)
Outline
• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs
• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics• SetOperators
46
SetOperations
• SQLsupportsunion,intersectionandsetdifferenceoperations• CalledUNION,INTERSECT,andEXCEPT• Theseoperationsmustbeperformedonunioncompatibletables
• AlthoughtheseoperationsaresupportedintheSQLstandard,implementationsmayvary• EXCEPTmaynotbeimplemented
• Whenitis,itissometimescalledMINUS
47
OneofTwoCourses
• Findallstudentswhohavetakeneither354or454
48
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
SELECT nameFROM Students, EnrolledWHERE sid = student_id AND (cid = 354 OR cid = 454)
OneofTwoCourses- UNION
• Findallstudentswhohavetakeneither354or454
49
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
SELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 354UNIONSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454
BothCourses
• Findallstudentswhohavetakenboth354and454
50
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
SELECT nameFROM Students S, Enrolled EWHERE sid = student_id AND
(E.cid = 354 AND E.cid = 454)
BothCoursesAgain
• Findallstudentswhohavetakenboth354and454
51
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
SELECT nameFROM Students S, Enrolled E1, Enrolled E2WHERE S.sid = E1.student_id AND S.sid = E2.student_id
AND (E1.cid = 354 AND E2.cid = 454)
BothCourses- INTERSECT
• Findallstudentswhohavetakenboth354and454
52
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
SELECT nameFROM Students, Enrolled WHERE sid = student_id AND cid = 354INTERSECTSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454
OneCourseButNotTheOther
• Findallstudentswhohavetaken354butnot454
53
sid name gpa123 Mary 3.8
Students Enrolledstudent_id cid grade
123 354 A+123 454 A+156 Mike 3.7156 354 A
SELECT nameFROM Students, Enrolled WHERE sid = student_id AND cid = 354EXCEPTSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454
SetOperationsandDuplicates
• UnlikeotherSQLoperations,UNION,INTERSECT,andEXCEPT querieseliminateduplicatesbydefault• SQLallowsduplicatestoberetainedinthesethreeoperationsusingtheALL keyword(i.e.,multi-setoperations)
54
SELECT nameFROM Students, Enrolled WHERE sid = student_id AND cid = 354INTERSECT ALLSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454
Acknowledge
• Some lecture slides were copied from or inspired by thefollowing course materials• “W4111: Introduction to databases” by Eugene Wu atColumbia University• “CSE344: IntroductiontoDataManagement” by Dan Suciu atUniversityof Washington• “CMPT354: Database System I” by JohnEdgar at Simon FraserUniversity• “CS186: Introduction to Database Systems” by Joe Hellersteinat UC Berkeley• “CS145: IntroductiontoDatabases” by Peter Bailis at Stanford• “CS348:IntroductiontoDatabaseManagement” by GrantWeddell at University of Waterloo
55