introduc)on to sql - indian statistical institutedebapriyo/teaching/dbms/sql1.pdf · introduc)on to...

39
Introduc)on to SQL Debapriyo Majumdar DBMS – Fall 2016 Indian Statistical Institute Kolkata Slides re-used, with minor modification, from Silberschatz, Korth and Sudarshan www.db-book.com

Upload: duongthuy

Post on 31-Mar-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Introduc)ontoSQL

Debapriyo Majumdar DBMS – Fall 2016

Indian Statistical Institute Kolkata

Slides re-used, with minor modification, from Silberschatz, Korth and Sudarshan

www.db-book.com

Outline§  Overview of The SQL Query Language §  Data Definition §  Basic Query Structure §  Additional Basic Operations §  Set Operations §  Null Values §  Aggregate Functions

2

History§  IBM Sequel language developed as part of System R project at the

IBM San Jose Research Laboratory §  Renamed as Structured Query Language (SQL) §  ANSI and ISO standard SQL:

–  SQL-86 –  SQL-89 –  SQL-92 –  SQL:1999 (language name became Y2K compliant!) –  SQL:2003

§  Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features –  Not all examples here may work on any particular system.

3

Data Definition Language

§  The schema for each relation §  The domain of values associated with each attribute §  Integrity constraints §  And as we will see later, also other information such as –  The set of indices to be maintained for each relations –  Security and authorization information for each

relation –  The physical storage structure of each relation on disk

The SQL data-definition language (DDL) allows the specification of information about relations, including:

4

Domain Types in SQL§  char(n): Fixed length character string, with user-specified length n §  varchar(n): Variable length character strings, with user-specified

maximum length n §  int: Integer (a finite subset of the integers that is machine-dependent) §  smallint: Small integer (a machine-dependent subset of the integer

domain type) §  numeric(p,d): Fixed point number, with user-specified precision of p

digits, with d digits to the right of decimal point. §  Example: numeric(3,1) allows 44.5 to be stores exactly, but not 444.5

or 0.32 §  real, double precision: Floating point and double-precision floating

point numbers, with machine-dependent precision §  float(n): Floating point number, with user-specified precision of at least n

digits §  More (will be covered later) …

5

charvsvarcharinMySQL§  CHAR values are right padded with spaces to meet the specified length n

(any value from 0 to 255) §  Length of VARCHAR can be specified as a value from 0 to 65,535 §  The effective maximum length of a VARCHAR is subject to the maximum

row size (65,535 bytes, which is shared among all columns) and the character set used

§  VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data –  The length prefix indicates the number of bytes in the value –  A column uses one length byte if values require no more than 255

bytes, two length bytes if values may require more than 255 bytes –  Takes more space if the data is actually fixed sized

§  Note: CHAR increases speed only if all the columns of the table are of fixed length

6http://dev.mysql.com/doc/refman/5.7/en/char.html

Create Table Construct§  create table command: defining an SQL relation

create table r (A1 D1, A2 D2, ..., An Dn, (integrity-constraint1), ..., (integrity-constraintk))

–  r is the name of the relation –  each Ai is an attribute name in the schema of relation r –  Di is the data type of values in the domain of attribute Ai

§  Example: create table instructor ( ID char(5), name varchar(20), dept_name varchar(20), salary numeric(8,2))

7

Integrity Constraints in Create Table§  not null §  primary key (A1, ..., An ) §  foreign key (Am, ..., An ) references r

Example: create table instructor ( ID char(5), name varchar(20) not null, dept_name varchar(20), salary numeric(8,2), primary key (ID), foreign key (dept_name) references department);

primary key declaration on an attribute automatically ensures not null

8

More examples§  create table student (

ID varchar(5), name varchar(20) not null, dept_name varchar(20), tot_cred numeric(3,0), primary key (ID),

foreign key (dept_name) references department); §  create table takes (

ID varchar(5), course_id varchar(8), sec_id varchar(8), semester varchar(6), year numeric(4,0), grade varchar(2),

primary key (ID, course_id, sec_id, semester, year) , foreign key (ID) references student,

foreign key (course_id, sec_id, semester, year) references section);

§  Note: sec_id can be dropped from primary key above, to ensure a student cannot be registered for two sections of the same course in the same semester

9

Updates to tables§  Insert

–  insert into instructor values (‘10211’, ’Smith’, ’Biology’, 66000); –  There are other commands too

§  Delete –  Remove all tuples from the student relation

•  delete from student §  Drop Table

–  drop table r §  Alter

–  alter table r add A D •  where A is the name of the attribute to be added to relation r and

D is the domain of A. •  All exiting tuples in the relation are assigned null as the value for

the new attribute. –  alter table r drop A

•  where A is the name of an attribute of relation r •  Dropping of attributes not supported by many databases. 10

Basic Query Structure §  A typical SQL query has the form:

select A1, A2, ..., An from r1, r2, ..., rm where P

–  Ai represents an attribute –  Ri represents a relation –  P is a predicate.

§  The result of an SQL query is a relation.

11

The select Clause§  The select clause lists the attributes desired in the result –  corresponds to the projection operation of the relational

algebra §  Example: find the names of all instructors:

select name from instructor

§  NOTE: SQL names are case insensitive (i.e., upper or lowercase letters mean the same) –  E.g., Name ≡ NAME ≡ name –  Some people use upper case wherever we use bold font

12

The select Clause (Cont.)§  SQL allows duplicates in relations as well as in query results §  To force the elimination of duplicates, insert the keyword

distinct after select §  Find the department names of all instructors, and remove

duplicates select distinct dept_name

from instructor §  The keyword all specifies that duplicates should not be

removed select all dept_name

from instructor

13

The select Clause (Continued)§  An asterisk in the select clause denotes “all attributes”

select * from instructor §  An attribute can be a literal with no from clause

select ‘437’ –  Results is a table with one column and a single row with

value “437” –  Can give the column a name using select ‘437’ as FOO

§  An attribute can be a literal with from clause select ‘A’ from instructor –  Result is a table with one column and N rows (number of

tuples in the instructors table), each row with value “A” 14

The select Clause (Continued)§  The select clause can contain arithmetic expressions

involving the operation, +, –, *, and /, and operating on constants or attributes of tuples – The query:

select ID, name, salary/12 from instructor would return a relation that is the same as the instructor relation, except that the value of the attribute salary is divided by 12

– Can rename “salary/12” using the as clause: select ID, name, salary/12 as monthly_salary

15

The where Clause§  The where clause specifies conditions that the result must satisfy

–  Corresponds to the selection predicate of the relational algebra §  To find all instructors in CompSc dept

select name from instructor where dept_name = ‘CompSc'

§  Comparison results can be combined using the logical connectives and, or, and not –  To find all instructors in CompSc dept with salary > 80000

select name from instructor where dept_name = ‘CompSc' and salary > 80000

§  Comparisons can be applied to results of arithmetic expressions

16

The from Clause§  The from clause lists the relations involved in the query

–  Corresponds to the Cartesian product operation of the relational algebra.

§  Find the Cartesian product instructor × teaches select * from instructor, teaches –  generates every possible instructor – teaches pair, with all

attributes from both relations –  For common attributes (e.g., ID), the attributes in the

resulting table are renamed using the relation name (e.g., instructor.ID)

§  Cartesian product not very useful directly, but useful combined with where-clause condition (selection operation in relational algebra)

17

Cartesian Productinstructor teaches

18

Examples§  Find the names of all instructors who have taught some course

and the course_id –  select name, course_id

from instructor , teaches where instructor.ID = teaches.ID

§  Find the names of all instructors in the Art department who

have taught some course and the course_id –  select name, course_id

from instructor , teaches where instructor.ID = teaches.ID

and instructor. dept_name = ‘Art’

19

The Rename Operation§  The SQL allows renaming relations and attributes using the

as clause: old-name as new-name

§  Find the names of all instructors who have a higher salary than some instructor in ‘Comp. Sci’. –  select distinct T.name

from instructor as T, instructor as S where T.salary > S.salary and S.dept_name = ‘Comp. Sci.’

§  Keyword as is optional and may be omitted instructor as T ≡ instructor T

20

Cartesian Product Example■  Relation emp-super

■  Find the supervisor of “Bob” ■  Find the supervisor of the supervisor of “Bob” ■  Find ALL the supervisors (direct and indirect) of “Bob”

person supervisorBob AliceMary SusanAlice DavidDavid Mary

21

String Operations§  SQL includes a string-matching operator for comparisons on

character strings. The operator like uses patterns that are described using two special characters: –  percent ( % ). The % character matches any substring. –  underscore ( _ ). The _ character matches any character.

§  Find the names of all instructors whose name includes the substring “dar”. select name

from instructor where name like '%dar%'

§  Match the string “100%” like ‘100 \%' escape '\'

in that above we use backslash (\) as the escape character. 22

String Operations (Continued)§  Patterns are case sensitive. §  Pattern matching examples: –  ‘Intro%’ matches any string beginning with “Intro”. –  ‘%Comp%’ matches any string containing “Comp” as a

substring. –  ‘_ _ _’ matches any string of exactly three characters. –  ‘_ _ _ %’ matches any string of at least three characters.

§  SQL supports a variety of string operations such as –  concatenation (using “||”) –  converting from upper to lower case (and vice versa) –  finding string length, extracting substrings, etc.

23

Ordering the Display of Tuples§  List in alphabetic order the names of all instructors select distinct name

from instructor order by name

§  We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default –  Example: order by name desc

§  Can sort on multiple attributes –  Example: order by dept_name, name

24

Where Clause Predicates§  SQL includes a between comparison operator §  Example: Find the names of all instructors with salary

between $90,000 and $100,000 (that is, ≥ $90,000 and ≤ $100,000) –  select name

from instructor where salary between 90000 and 100000

§  Tuple comparison –  select name, course_id

from instructor, teaches where (instructor.ID, dept_name) = (teaches.ID, ’Biology’);

25

Duplicates§  In relations with duplicates, SQL can define how many

copies of tuples appear in the result. §  Multiset versions of some of the relational algebra operators

– given multiset relations r1 and r2: 1. σθ (r1): If there are c1 copies of tuple t1 in r1, and t1

satisfies selections σθ,, then there are c1 copies of t1 in σθ (r1)

2. ΠA (r ): For each copy of tuple t1 in r1, there is a copy of tuple ΠA (t1) in ΠA (r1) where ΠA (t1) denotes the projection of the single tuple t1

3. r1 × r2: If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1c2 copies of the tuple t1. t2 in r1 x r2

26

Duplicates (Continued)§  Example: Suppose multiset relations r1 (A, B) and r2 (C)

are as follows: r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}

§  Then ΠB(r1) would be {(a), (a)}, while ΠB(r1) x r2 would be {(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}

§  SQL duplicate semantics: select A1,, A2, ..., An

from r1, r2, ..., rm where P

is equivalent to the multiset version of the expression: ∏A1,A2 ,…,An

(σ P (r1 × r2 ×…× rm ))

27

Set Operations§  Find courses that ran in Fall 2009 or in Spring 2010

■  Find courses that ran in Fall 2009 but not in Spring 2010

(select course_id from section where sem = ‘Fall’ and year = 2009) union (select course_id from section where sem = ‘Spring’ and year = 2010)

■  Find courses that ran in Fall 2009 and in Spring 2010

(select course_id from section where sem = ‘Fall’ and year = 2009) intersect (select course_id from section where sem = ‘Spring’ and year = 2010)

(select course_id from section where sem = ‘Fall’ and year = 2009) except (select course_id from section where sem = ‘Spring’ and year = 2010)

28

Set Operations (Continued)§  Find the salaries of all instructors that are less than the largest salary.

–  select distinct T.salary from instructor as T, instructor as S where T.salary < S.salary

§  Find all the salaries of all instructors

–  select distinct salary from instructor

§  Find the largest salary of all instructors.

–  (select “second query” ) except (select “first query”)

29

Set Operations (Continued)§  Set operations union, intersect, and except

§  Each of the above operations automatically eliminates duplicates

§  To retain all duplicates use the corresponding multiset versions union all, intersect all and except all

§  Suppose a tuple occurs m times in r and n times in s, then, it occurs: §  m + n times in r union all s §  min(m,n) times in r intersect all s §  max(0, m – n) times in r except all s

30

Null Values§  It is possible for tuples to have a null value, denoted by null,

for some of their attributes §  null signifies an unknown value or that a value does not exist. §  The result of any arithmetic expression involving null is null –  Example: 5 + null returns null

§  The predicate is null can be used to check for null values –  Example: Find all instructors whose salary is null

select name from instructor where salary is null

31

Null Values and Three Valued Logic§  Three values – true, false, unknown §  Any comparison with null returns unknown

–  Example: 5 < null or null <> null or null = null §  Three-valued logic using the value unknown:

–  OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown

–  AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown

–  NOT: (not unknown) = unknown –  “P is unknown” evaluates to true if predicate P evaluates to

unknown §  Result of where clause predicate is treated as false if it evaluates to

unknown 32

Aggregate Functions§  These functions operate on the multiset of values of

a column of a relation, and return a value avg: average value

min: minimum value max: maximum value sum: sum of values count: number of values

33

Aggregate Functions (Continued)§  Find the average salary of instructors in the Computer Science

department –  select avg (salary)

from instructor where dept_name= ’Comp. Sci.’;

§  Find the total number of instructors who teach a course in the Spring 2010 semester –  select count (distinct ID)

from teaches where semester = ’Spring’ and year = 2010;

§  Find the number of tuples in the course relation –  select count (*)

from course;

34

Aggregate Functions – Group By§  Find the average salary of instructors in each department

–  select dept_name, avg (salary) as avg_salary from instructor group by dept_name;

avg_salary

35

Aggregation (Continued)§  Attributes in select clause outside of aggregate functions

must appear in group by list –  /* erroneous query */

select dept_name, ID, avg (salary) from instructor group by dept_name;

36

Aggregate Functions – Having Clause

§  Find the names and average salaries of all departments whose average salary is greater than 42000

Note: predicates in the having clause are applied after the formation of groups whereas predicates in the where clause are applied before forming groups

select dept_name, avg (salary) from instructor group by dept_name having avg (salary) > 42000;

37

Null Values and Aggregates§  Total all salaries

select sum (salary ) from instructor

–  Above statement ignores null amounts –  Result is null if there is no non-null amount

§  All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes

§  What if collection has only null values? –  count returns 0 –  all other aggregates return null

38

ENDOFCLASS1ONSQL

39