sql unit 7 set operations

97
SQL Unit 7 Set Operations Kirk Scott 1

Upload: nantai

Post on 22-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

SQL Unit 7 Set Operations. Kirk Scott. 7.1 Introduction 7.2 UNION Queries 7.3 Queries with IN (Intersection) 7.4 Queries with NOT IN (Set Subtraction) 7.5 Unions, Joins, and Outer Joins. 7.1 Introduction. 1. The technical term for a table is a relation. A relation is like a set. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SQL Unit 7 Set Operations

1

SQL Unit 7Set Operations

Kirk Scott

Page 2: SQL Unit 7 Set Operations

2

• 7.1 Introduction• 7.2 UNION Queries• 7.3 Queries with IN (Intersection)• 7.4 Queries with NOT IN (Set Subtraction)• 7.5 Unions, Joins, and Outer Joins

Page 3: SQL Unit 7 Set Operations

3

7.1 Introduction

Page 4: SQL Unit 7 Set Operations

4

• 1. The technical term for a table is a relation. A relation is like a set.

• The technical term for a row in a table is a tuple.

• A tuple is like an element in a set.

Page 5: SQL Unit 7 Set Operations

5

• The fundamental difference between an element of a set and a tuple in a relation is that the tuple may be a composite.

• It may contain values for more than one different attribute.

Page 6: SQL Unit 7 Set Operations

6

• The similarity between sets and relations explains some of the aspects of relations.

• The order of elements in a set is immaterial;• likewise, the order of tuples in a relation is

immaterial.

Page 7: SQL Unit 7 Set Operations

7

• A set can't contain duplicate elements;• likewise, a relation can't contain duplicate

tuples.• (Although a query result can.)

Page 8: SQL Unit 7 Set Operations

8

• 2. Recall the logical operator OR. • This allowed you to make conditions on the

values of attributes. • There is also a set level operation, union,

which is related in meaning. • Union applies not to attribute values, but to

collections of tuples in relations.

Page 9: SQL Unit 7 Set Operations

9

• Given two sets, A and B, you may recall this definition of union from math class:

• = the union of A and B • = the set of elements in A or in B or in both A

and B

BA

Page 10: SQL Unit 7 Set Operations

10

• In this Venn diagram, both A and B and the area where they overlap are shaded, indicating that they all are included in the union.

A B

Page 11: SQL Unit 7 Set Operations

11

• Microsoft Access SQL has the keyword UNION, which implements the idea behind a logical union.

Page 12: SQL Unit 7 Set Operations

12

• 3. Here is a simple example illustrating the use of the union operator.

• Suppose that table A and table B have each been defined with the same number of fields, of the same type, in the same order.

• The names of the fields don't have to be the same.

Page 13: SQL Unit 7 Set Operations

13

• Then consider this query:• • SELECT *• FROM A• UNION• SELECT *• FROM B

Page 14: SQL Unit 7 Set Operations

14

• The results will contain the set of rows that were in A or B or both

• Because the query uses a set operator, and duplicates are not allowed in sets, any row that might have occurred in both A and B will only appear once in the results.

• It’s not a major thing, but to emphasize, you can observe the following:

• In general, query results may contain duplicate rows, but the use of set operators has a similar effect to the keyword DISTINCT.

Page 15: SQL Unit 7 Set Operations

15

• 4. The previous example specified that the tables in the two parts of the query had to have the same number of fields of the same type and in the same order.

• It wouldn't do to have records in the same result table which varied in the number of fields they contained.

• It also wouldn't do for numeric fields to hold non-numeric values, and vice-versa.

Page 16: SQL Unit 7 Set Operations

16

• Having a correspondence between fields in the two parts of the query is known as union compatibility.

• The specific requirements for this are:• A. The corresponding fields in the two parts

of the query should mean the same thing. • This may be referred to as semantic

equivalence.

Page 17: SQL Unit 7 Set Operations

17

• B. If the corresponding fields are of exactly the same type and size, there is no problem at all.

• The formal requirements are less stringent though: • i. All numeric fields are union compatible with each

other.• ii. All text fields are union compatible with each

other.• iii. All date fields are union compatible with each

other.

Page 18: SQL Unit 7 Set Operations

18

• In cases where the types of the fields are not the same, but they are union compatible, the "larger" of the two types will be used in the results.

• Given two union compatible types, the "larger" kind of one type can always hold values of the other type.

Page 19: SQL Unit 7 Set Operations

19

• A text field with a large width can hold the values of a text field with a smaller width.

• A numeric type that can have decimal points can hold integer values.

• Since the one that can hold the other is used in the results, no data will be lost when a union is done.

Page 20: SQL Unit 7 Set Operations

20

7.2 UNION Queries

Page 21: SQL Unit 7 Set Operations

21

• 1. Here is a concrete example of a union query using tables and fields from the cardealership database:

• • SELECT *• FROM Car• WHERE make = 'Chevrolet'• UNION• SELECT *• FROM Car• WHERE make = 'Toyota'

Page 22: SQL Unit 7 Set Operations

22

• This query illustrates the relationship between the UNION operator and the OR operator.

• Because the two parts of the query are on the same table, the following query would accomplish the same thing:

• • SELECT *• FROM Car• WHERE make = 'Chevrolet'• OR make = 'Toyota'

Page 23: SQL Unit 7 Set Operations

23

• In this query the Car table is the "universe", and the query finds the union of two disjoint subsets of the Car table, because no car could have two different makes at the same time.

• This is the Venn diagram for the query:

Page 24: SQL Unit 7 Set Operations

24

make = 'Chevro-let'

make = 'Toyota'

Car

Page 25: SQL Unit 7 Set Operations

25

• 2. Here is another example of a union query. • The two parts of the query are based on two

different tables:• • SELECT name, addr, city, state• FROM Customer• UNION• SELECT name, addr, city, state• FROM Salesperson

Page 26: SQL Unit 7 Set Operations

26

• Because two tables are involved, it would not be possible to accomplish this with the OR operator.

• Notice that there is no problem with union compatibility because the corresponding fields in the two tables were defined in exactly the same way.

Page 27: SQL Unit 7 Set Operations

27

• The Venn diagram for this query is more typical than the previous diagram.

• The same person could be both a salesperson and a customer.

• The results of the query would include the names, addresses, cities, and states of all customers, all salespeople, and anybody who fell into both categories.

Page 28: SQL Unit 7 Set Operations

28

Custom-ers

Sales- people

Page 29: SQL Unit 7 Set Operations

29

• A union can be thought of as a vertical combination of two tables:

Page 30: SQL Unit 7 Set Operations

30

UNION

Page 31: SQL Unit 7 Set Operations

31

• 3. As noted previously, the union operator eliminates duplicates from the results of a query.

• If by chance you would like to do a union and not have the duplicates eliminated, you would use the keywords UNION ALL:

• • SELECT city• FROM Customer• UNION ALL• SELECT city• FROM Salesperson

Page 32: SQL Unit 7 Set Operations

32

• There is a side effect related to eliminating or keeping duplicates in the results.

• When plain UNION is used, the duplicates will be eliminated and the results will typically be sorted in some order.

• The explanation is that the system uses the following approach to eliminate duplicates:

• First it sorts the records. • After sorting, duplicates should be next to each other. • Then the system finds and eliminates them.

Page 33: SQL Unit 7 Set Operations

33

• 4. It is possible to do unions where one part of the query doesn't have fields corresponding to the fields in the other part.

• Those fields that correspond have to be union compatible.

• For those fields without corresponding fields, nulls have to be used.

Page 34: SQL Unit 7 Set Operations

34

• Recall that the schemas for Customer and Salesperson look like this:

• • Customer(custno pk, name, addr, city, state,

phone)• Salesperson(spno pk, name, addr, city, state,

phone, bossno, commrate)

Page 35: SQL Unit 7 Set Operations

35

• Here is an example of a union query where all of the fields of the Customer table are matched with the explicitly listed corresponding fields of the Salesperson table:

• • SELECT *• FROM Customer• UNION• SELECT spno, name, addr, city, state, phone• FROM Salesperson

Page 36: SQL Unit 7 Set Operations

36

• If you would like to keep all of the fields from the Salesperson table while also including all of the records from the Customer table in the results, you could do this:

• • SELECT *, NULL, NULL• FROM Customer• UNION• SELECT *• FROM Salesperson

Page 37: SQL Unit 7 Set Operations

37

7.3 Queries with IN (Intersection)

Page 38: SQL Unit 7 Set Operations

38

• 1. Among the concepts of set theory, along with union, there is intersection and there is the idea of set containment.

• Given two sets, A and B, here is the definition of union again, along with the definitions of intersection and containment:

Page 39: SQL Unit 7 Set Operations

39

• = the union of A and B • = the set of elements in A or in B or in both A

and B

BA

Page 40: SQL Unit 7 Set Operations

40

• = the intersection of A and B • = the set of elements that A and B have in

common

BA

Page 41: SQL Unit 7 Set Operations

41

• = A is contained in B; • as a proposition this is either true or false,

either the elements of A are also in B, or they're not

BA

Page 42: SQL Unit 7 Set Operations

42

• In this Venn diagram, the area where A and B overlap is crosshatched, indicating that this is the area in the intersection.

A B

Page 43: SQL Unit 7 Set Operations

43

• This Venn diagram signifies that A is contained in B:

A

B

Page 44: SQL Unit 7 Set Operations

44

• Microsoft Access SQL does not have keywords for intersection or containment, but it does have this operator:

• • IN• • Using IN it is possible to write expressions that check

whether or not a given set of tuples is included in another set.

• This makes it possible to find the intersection of two sets.

Page 45: SQL Unit 7 Set Operations

45

• 2. It is possible to specify a set of values in SQL by enclosing the values in parentheses (not curly braces) and separating them with commas.

• This first example of the use of the keyword IN involves such a set:

• • SELECT *• FROM Car• WHERE make IN ('Chevrolet', 'Toyota')

Page 46: SQL Unit 7 Set Operations

46

• This query is equivalent in results to the following query already seen above:

• • SELECT *• FROM Car• WHERE make = 'Chevrolet'• OR make = 'Toyota'• • The results of the query are the union of two sets.

Page 47: SQL Unit 7 Set Operations

47

• 3. The more general use of the keyword IN occurs when a set of values in a query is defined by a subquery rather than a set listed in parentheses.

• An example is shown below. • Notice that its structure is similar to the

foregoing examples.

Page 48: SQL Unit 7 Set Operations

48

• The outer query selects from a table where some field value is in or is not in the set specified by the subquery:

• • SELECT name• FROM Salesperson• WHERE spno IN• (SELECT spno• FROM Carsale)

Page 49: SQL Unit 7 Set Operations

49

• This query illustrates the ideas of intersection and containment.

• You're selecting the names of salespeople whose spno's appear in the Carsale table.

• Because of referential integrity, every spno in the Carsale table has to appear in the Salesperson table.

• That means that the set of spno's from the Carsale table is a subset of the spno's in the Salesperson table.

Page 50: SQL Unit 7 Set Operations

50

• Not every salesperson has to have sold a car, so not necessarily every spno in Salesperson appears in the Carsale table.

• When you find the intersection between the two, it is simply the set of spno's from Carsale.

Page 51: SQL Unit 7 Set Operations

51

• This is a Venn diagram of the situation:

Carsale spno's

Salesperson spno's— get the names from this table

Page 52: SQL Unit 7 Set Operations

52

• The query finds the names of salespeople who sold cars.

• Notice that because of the way this query is structured as a set query, a salesperson's name will appear only once in the results, even if that salesperson sold more than one car.

• In other words, a given spno may occur more than once in the Carsale table, but it will only appear once in the query results.

Page 53: SQL Unit 7 Set Operations

53

• This happens because this is how a set query with IN logically works:

• In the outer query, when checking whether a given spno is in the set defined by the subquery, the answer is either yes or no.

• If the answer is yes, then the spno is valid in the outer query, but it only appears there once.

Page 54: SQL Unit 7 Set Operations

54

• The "IN" can be read as "squeezing out" duplicate occurrences of spno.

• Then in the outer query, for each distinct spno, the one corresponding name is shown.

Page 55: SQL Unit 7 Set Operations

55

• 4. Here is another example of an IN query with a subquery.

• It shows the stickerprices of cars that sold.• • SELECT stickerprice• FROM Car• WHERE vin IN• (SELECT vin• FROM Carsale)

Page 56: SQL Unit 7 Set Operations

56

• This is the Venn diagram for this query. • Since not all cars have sold, the Carsale vin's

will be a subset of the Car vin's:

Carsale vin's

Car vin's— get the stickerprices from this table

Page 57: SQL Unit 7 Set Operations

57

• The previous example illustrated how the use of IN can remove duplicates.

• This example illustrates another point.• Remember that cars can only be sold once, so

there would be no duplicate vin values to squeeze out of the subquery results.

• IN would still squeeze them out if they existed, but that won’t actually happen in this case.

Page 58: SQL Unit 7 Set Operations

58

• However, duplicates can still arise in the overall results.

• In this example, among the cars that sold, there are two of them with a stickerprice of 18,000.

• 18,000 will show up twice in the results of the query.

Page 59: SQL Unit 7 Set Operations

59

• In the previous example, if two salespeople had the same name, duplicates would appear in the results.

• It was just assumed that there would be no duplicate names.

Page 60: SQL Unit 7 Set Operations

60

• The explanation for this is that in the Carsale table as given, cars can only be sold once, so their vin's show up there only once.

• The use of IN would check to see whether a car had been sold more than once and eliminate any duplicates if it had, but that would have no effect in this example because there are no duplicate car sales.

Page 61: SQL Unit 7 Set Operations

61

• However, once execution reaches the outer query, for whatever set of vin's the inner query found, you select the stickerprice.

• If two different vin's have the same stickerprice, that stickerprice will be shown twice in the overall results of the query.

Page 62: SQL Unit 7 Set Operations

62

• 5. In the previous two example queries, one table is opened in the inner query and another is opened in the outer query.

• In order to do an IN query, the inner query has to select exactly one field, and there has to be a field which corresponds to it in the table of the outer query.

Page 63: SQL Unit 7 Set Operations

63

• In the examples given, the names of the corresponding fields, vin and spno, were the same in the inner and outer queries.

• There was no need to fully qualify the field names because the parentheses serve as a barrier between the inner and outer queries.

• Inside the parentheses the field name belongs to the table of the inner query.

• Outside of the parentheses the field name belongs to the table of the outer query.

Page 64: SQL Unit 7 Set Operations

64

7.4 Queries with NOT IN (Set Subtraction)

Page 65: SQL Unit 7 Set Operations

65

• 1. Among the concepts of set theory, along with union, intersection, and containment, there are two more concepts to consider:

• complement or negation; • and set subtraction. • Given two sets, A and B, here are the

definitions of complement and subtraction:

Page 66: SQL Unit 7 Set Operations

66

• A' = the complement of A • = the set of elements not in A

• A – B = the difference between A and B • = the set of elements which are in A but not in

B

Page 67: SQL Unit 7 Set Operations

67

• This Venn diagram shows the complement of A:

A'

A

Page 68: SQL Unit 7 Set Operations

68

• This Venn diagram shows A – B:

A - B B

Page 69: SQL Unit 7 Set Operations

69

• Microsoft Access SQL does not have separate operators for complement or set subtraction.

• However, it does have this operator:• • NOT This is negation or complement• • Using NOT, you can negate expressions and

effectively find a complement. • Using NOT IN, you can accomplish set subtraction.

Page 70: SQL Unit 7 Set Operations

70

• 2. It is possible to negate the initial query of the previous section.

• This gives a straightforward example of the use of NOT IN:

• • SELECT *• FROM Car• WHERE make NOT IN ('Chevrolet', 'Toyota')

Page 71: SQL Unit 7 Set Operations

71

• The results of this query are the complement of the results of the plain IN query of the previous section.

• This is the Venn diagram for the negated query, where once again, it is the shaded area which is included in the results:

Page 72: SQL Unit 7 Set Operations

72

Car

make = 'Toyota'

make = 'Chevro-let'

Page 73: SQL Unit 7 Set Operations

73

• 3. The non-negated version of the query, which simply used IN, could be interpreted as an OR query.

• The negated version of the query under discussion here, which uses NOT, can be regarded as the negation of an OR query.

• Once you start negating logical expressions, you need to be careful in interpreting what the results might be.

Page 74: SQL Unit 7 Set Operations

74

• The following query is approximately logically equivalent to the NOT IN query:

• • SELECT *• FROM Car• WHERE make <> 'Chevrolet'• AND make <> 'Toyota'

Page 75: SQL Unit 7 Set Operations

75

• If you negate something that can be regarded as an OR, you get the AND of the two parts negated separately.

• Likewise, if you negate something that can be regarded as an AND, you get the OR of the two parts negated separately.

• The following two rules are DeMorgan's Laws for sets.

• They give the general result described here:

Page 76: SQL Unit 7 Set Operations

76

'')'( BABA

'')'( BABA

Page 77: SQL Unit 7 Set Operations

77

• The reason why the AND query is only approximately equivalent to the NOT IN query has to do with null values.

• The query with AND will return records where make is null.

• The NOT IN query will not return records where make is null.

Page 78: SQL Unit 7 Set Operations

78

• To understand why, look at the NOT IN query again:

• • SELECT *• FROM Car• WHERE make NOT IN ('Chevrolet', 'Toyota')

Page 79: SQL Unit 7 Set Operations

79

• The logic of this is that a null is not a value at all.

• Null could never be an element of a set. • When you use the set operator NOT IN, the

system will only return actual values that occur in the set of values for make.

• It will not return null as a value.

Page 80: SQL Unit 7 Set Operations

80

• You can add this to the list of “peculiarities” of set queries.

• Together, the list consists of these two points:• Set queries do not include duplicate values.• Set queries do not include nulls.

Page 81: SQL Unit 7 Set Operations

81

• 4. Here is a more general NOT IN query, where the set is defined by a subquery:

• • SELECT stickerprice• FROM Car• WHERE vin NOT IN• (SELECT vin• FROM Carsale)

Page 82: SQL Unit 7 Set Operations

82

• Here is the Venn diagram for this query:

Car vin's— get the stickerprices from here

Carsale vin's

Page 83: SQL Unit 7 Set Operations

83

• The meaning of this is straightforward. • It finds the stickerprices of cars that haven't

sold. • There are no surprises because the field in

question, vin, is the primary key of both the Car and Carsale tables, so it will never be null.

Page 84: SQL Unit 7 Set Operations

84

7.5 Unions, Joins, and Outer Joins

Page 85: SQL Unit 7 Set Operations

85

• 1. Part of the relationship between unions and outer joins was brought up in a previous unit:

• A join can be thought of as a horizontal combination of two tables:

Page 86: SQL Unit 7 Set Operations

86

JOIN

Page 87: SQL Unit 7 Set Operations

87

• A union can be thought of as a vertical combination of two tables:

Page 88: SQL Unit 7 Set Operations

88

UNION

Page 89: SQL Unit 7 Set Operations

89

• A full outer join can be found with the union of a left join and a right join on the same two tables:

• • …LEFT JOIN…• UNION• …RIGHT JOIN…

Page 90: SQL Unit 7 Set Operations

90

• 2. It is also possible to do an outer join with the help of set operators.

• To get started on this topic, here is a review of what an outer join is.

• A left or right outer join is a join that includes all of the records from both tables that match on the joining field, plus it includes all of the records from one of the tables, either left or right, that don't have a match on the joining field.

• For those that don't have a match, it supplies NULL as the value for the fields that come from the other table.

Page 91: SQL Unit 7 Set Operations

91

• This is a left join on the Car and Carsale tables:• • SELECT *• FROM Car LEFT JOIN Carsale• ON Car.vin = Carsale.vin

Page 92: SQL Unit 7 Set Operations

92

• This will give a result table containing records for all cars, both those sold and unsold.

• For those that sold, there will be values for the fields vin (from the Carsale table), spno, custno, salesdate, and salesprice.

• For cars that didn't sell, those fields will be null.

Page 93: SQL Unit 7 Set Operations

93

• 3. It's not important to be able to do an outer join using UNION, NOT, and IN.

• The outer join syntax is easier. • However, writing an outer join query with the

set operators gives an additional chance to see them used to accomplish a desired result.

• The first part of the problem is simple.

Page 94: SQL Unit 7 Set Operations

94

• This plain join will give all of the records which have matches:

• • SELECT *• FROM Car, Carsale• WHERE Car.vin = Carsale.vin

Page 95: SQL Unit 7 Set Operations

95

• This nested query with NOT IN will find the records of all of the cars that didn't sell, and it will put the value NULL into unspecified fields in the result table that will correspond to the five fields of the Carsale table:

• • SELECT *, NULL, NULL, NULL, NULL, NULL• FROM Car• WHERE vin NOT IN• (SELECT vin• FROM Carsale)

Page 96: SQL Unit 7 Set Operations

96

• The left join is completed by finding the UNION of the two previous results:

• • SELECT *• FROM Car, Carsale• WHERE Car.vin = Carsale.vin• UNION• SELECT *, NULL, NULL, NULL, NULL, NULL• FROM Car• WHERE vin NOT IN• (SELECT vin• FROM Carsale)

Page 97: SQL Unit 7 Set Operations

97

The End