murach’s sql server 2008, c6© 2008, mike murach & associates, inc.slide 1
TRANSCRIPT
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 1
Chapter 6
How to code subqueries
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 2
Objectives
Applied
Code SELECT statements that require subqueries.
Code SELECT statements that use common table expressions (CTEs) to define the subqueries.
Knowledge
Describe the way subqueries can be used in the WHERE, HAVING, FROM and SELECT clauses of a SELECT statement.
Describe the difference between a correlated subquery and a noncorrelated subquery.
Describe the use of common table expressions (CTEs).
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 3
Objectives (continued) Explain the difference between a correlated subquery and a
noncorrelated subquery.
Given a SELECT statement that uses a subquery, explain how the result set of the subquery will affect the final result set.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 4
How to use subqueries A subquery is a SELECT statement that’s coded within another
SQL statement. A subquery can return a single value, a result set that contains a
single column, or a result set that contains one or more columns. A subquery that returns a single value can be coded, or introduced,
anywhere an expression is allowed. A subquery that returns a single column can be introduced in place
of a list of values, such as the values for an IN phrase. A subquery that returns one or more columns can be introduced in
place of a table in the FROM clause. A subquery that’s used in a WHERE or HAVING clause is called
a subquery search condition or a subquery predicate. This is themost common use for a subquery.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 5
How to use subqueries (continued) The syntax for a subquery is the same as for a standard SELECT
statement. However, a subquery doesn’t typically include the GROUP BY or HAVING clause, and it can’t include an ORDER BY clause unless the TOP phrase is used.
Subqueries can be nested within other subqueries. However, subqueries that are nested more than two or three levels deep can be difficult to read and can result in poor performance.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 6
Four ways to introduce a subquery in a SELECT statement 1. In a WHERE clause as a search condition
2. In a HAVING clause as a search condition
3. In the FROM clause as a table specification
4. In the SELECT clause as a column specification
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 7
A SELECT statement that uses a subquery in theWHERE clause
SELECT InvoiceNumber, InvoiceDate, InvoiceTotalFROM InvoicesWHERE InvoiceTotal > (SELECT AVG(InvoiceTotal) FROM Invoices)ORDER BY InvoiceTotal
The value returned by the subquery1879.7413
The result set
(21 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 8
How subqueries compare to joins Like a join, a subquery can be used to code queries that work with
two or more tables. Most subqueries can be restated as joins and most joins can be
restated as subqueries.
Advantages of joins The result set of a join can include columns from both tables. The
result set of a query with a subquery can only include columnsfrom the table named in the outer query, not in the subquery.
A join tends to be more intuitive when it uses an existingrelationship between the two tables, such as a primary key toforeign key relationship.
A query with a join typically performs faster than the same querywith a subquery, especially if the query uses only inner joins.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 9
Advantages of subqueries You can use a subquery to pass an aggregate value to the outer
query. A subquery tends to be more intuitive when it uses an ad hoc
relationship between the two tables. Long, complex queries can sometimes be easier to code using
subqueries.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 10
A query that uses an inner joinSELECT InvoiceNumber, InvoiceDate, InvoiceTotalFROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorIDWHERE VendorState = 'CA'ORDER BY InvoiceDate
The same query restated with a subquerySELECT InvoiceNumber, InvoiceDate, InvoiceTotalFROM InvoicesWHERE VendorID IN (SELECT VendorID FROM Vendors WHERE VendorState = 'CA')ORDER BY InvoiceDate
The result set returned by both queries
(40 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 11
The syntax of a WHERE clause that uses an INphrase with a subquery
WHERE test_expression [NOT] IN (subquery)
How to use subqueries with the IN operator You can introduce a subquery with the IN operator to provide
the list of values that are tested against the test expression. When you use the IN operator, the subquery must return a
single column of values. A query that uses the NOT IN operator with a subquery can
typically be restated using an outer join.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 12
A query that returns vendors without invoicesSELECT VendorID, VendorName, VendorStateFROM VendorsWHERE VendorID NOT IN (SELECT DISTINCT VendorID FROM Invoices)
The result of the subquery
(34 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 13
The result set of vendors without invoices
(88 rows)
The query restated without a subquerySELECT Vendors.VendorID, VendorName, VendorStateFROM Vendors LEFT JOIN Invoices ON Vendors.VendorID = Invoices.VendorIDWHERE Invoices.VendorID IS NULL
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 14
The syntax of a WHERE clause that compares anexpression with the value returned by a subquery
WHERE expression comparison_operator [SOME|ANY|ALL] (subquery)
How to compare the result of a subquery with anexpression You can use a comparison operator in a search condition to
compare an expression with the results of a subquery. If you code a search condition without the SOME, ANY, and ALL
keywords, the subquery must return a single value.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 15
A query that returns invoices with a balance dueless than the average
SELECT InvoiceNumber, InvoiceDate, InvoiceTotal, InvoiceTotal - PaymentTotal - CreditTotal AS BalanceDueFROM InvoicesWHERE InvoiceTotal - PaymentTotal - CreditTotal > 0 AND InvoiceTotal - PaymentTotal - CreditTotal < (SELECT AVG(InvoiceTotal - PaymentTotal – CreditTotal) FROM Invoices WHERE InvoiceTotal - PaymentTotal - CreditTotal > 0)ORDER BY InvoiceTotal DESC
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 16
The result of the subquery (the average balance due)
2910.9472
The result set
(9 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 17
How the ALL keyword works
ConditionEquivalentexpression Description
x > ALL (1, 2) x > 2 x must be greater than themaximum value returned bythe subquery.
x < ALL (1, 2) x < 1 x must be less than theminimum value returned bythe subquery.
x = ALL (1, 2) (x = 1) AND(x = 2)
This condition evaluates toTrue only if the subqueryreturns a single value or if allthe values returned by thesubquery are the same.
x <> ALL (1, 2) (x <> 1) AND(x <> 2)
This condition is equivalent to:x NOT IN (1, 2)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 18
How the ALL keyword works (continued) The ALL keyword tests that a comparison condition is true for all
of the values returned by a subquery.
If no rows are returned by the subquery, a comparison that uses ALL is always true.
If all of the rows returned by the subquery contain a null value, a comparison that uses ALL is always false.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 19
A query that returns invoices that are larger than the largest invoice for vendor 34
SELECT VendorName, InvoiceNumber, InvoiceTotal FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID WHERE InvoiceTotal > ALL (SELECT InvoiceTotal FROM Invoices WHERE VendorID = 34) ORDER BY VendorName
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 20
The result of the subquery (the invoice totals forvendor 34)
The result set
(25 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 21
How the ANY and SOME keywords work
ConditionEquivalentexpression Description
x > ANY (1, 2) x > 1 x must be greater than theminimum value returned by thesubquery.
x < ANY (1, 2) x < 2 x must be less than the maximumvalue returned by the subquery.
x = ANY (1, 2) (x = 1) OR(x = 2)
This condition is equivalent to:x IN (1, 2)
x <> ANY (1, 2) (x <> 1) OR(x <> 2)
This condition will evaluate toTrue for any non-empty result setcontaining at least one non-nullvalue that isn’t equal to x.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 22
How the ANY and SOME keywords work (continued) The ANY or SOME keyword tests that a condition is true for at
least one of the values returned by a subquery. SOME is the ANSI-standard keyword, but ANY is more commonly used.
If no rows are returned by the subquery or all of the rows returned by the subquery contain a null value, a comparison that uses ANY or SOME is always false.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 23
A query that returns invoices smaller than the largest invoice for vendor 115
SELECT VendorName, InvoiceNumber, InvoiceTotal FROM Vendors JOIN Invoices ON Vendors.VendorID = Invoices.InvoiceID WHERE InvoiceTotal < ANY (SELECT InvoiceTotal FROM Invoices WHERE VendorID = 115)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 24
The result of the subquery (the invoice totals forvendor 115)
The result set
(17 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 25
How to code correlated subqueries A correlated subquery is a subquery that is executed once for each
row processed by the outer query. A noncorrelated subquery isexecuted only once.
A correlated subquery refers to the value of a column in the outerquery. Because that value varies depending on the row that’s beingprocessed, each execution of the subquery returns a different result.
To refer to a value in the outer query, a correlated subquery uses aqualified column name that includes the table name from the outerquery.
If the subquery uses the same table as the outer query, an alias, orcorrelation name, must be assigned to one of the tables to removeambiguity.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 26
A query that uses a correlated subquery tocalculate the average invoice total for each vendor
SELECT VendorID, InvoiceNumber, InvoiceTotalFROM Invoices AS Inv_MainWHERE InvoiceTotal > (SELECT AVG(InvoiceTotal) FROM Invoices AS Inv_Sub WHERE Inv_Sub.VendorID = Inv_Main.VendorID)ORDER BY VendorID, InvoiceTotal
The value returned by the subquery for vendor 9528.5016
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 27
The result set for the query with the correlatedsubquery
(36 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 28
The syntax of a subquery that uses the EXISTSoperator
WHERE [NOT] EXISTS (subquery)
How to use the EXISTS operator You can use the EXISTS operator to test that one or more rows
are returned by the subquery. You can use NOT along with the EXISTS operator to test that no
rows are returned by the subquery. A subquery with the EXISTS operator doesn’t return any rows.
It returns an indication of whether any rows meet the condition. Because no rows are returned by the subquery, it doesn’t matter
what columns you specify in the SELECT clause. The EXISTS operator is used most often with correlated
subqueries.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 29
A query that returns vendors without invoicesSELECT VendorID, VendorName, VendorStateFROM VendorsWHERE NOT EXISTS (SELECT * FROM Invoices WHERE Invoices.VendorID = Vendors.VendorID)
The result set
(88 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 30
How to code subqueries in the FROM clause A subquery that’s coded in the FROM clause returns a result
set called a derived table. When you create a derived table, you must assign an alias to
it. Then, you can use the derived table within the outer queryjust as you would any other table.
When you code a subquery in the FROM clause, you mustassign names to any calculated values in the result set.
Derived tables are most useful when you need to furthersummarize the results of a summary query.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 31
A query that uses a derived table to retrieve thetop 5 vendors by average invoice total
SELECT Invoices.VendorID, MAX(InvoiceDate) AS LatestInv, AVG(InvoiceTotal) AS AvgInvoiceFROM Invoices JOIN (SELECT TOP 5 VendorID, AVG(InvoiceTotal) AS AvgInvoice FROM Invoices GROUP BY VendorID ORDER BY AvgInvoice DESC) AS TopVendor ON Invoices.VendorID = TopVendor.VendorIDGROUP BY Invoices.VendorIDORDER BY LatestInv DESC
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 32
The derived table generated by the subquery thatretrieves the top 5 vendors
The result set
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 33
How to code subqueries in the SELECT clause When you code a subquery for a column specification in the
SELECT clause, the subquery must return a single value. A subquery that’s coded within a SELECT clause is usually a
correlated subquery. Subqueries are seldom coded in the SELECT clause. Joins are
used instead because they’re generally faster and morereadable.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 34
A query that uses a correlated subquery in its SELECT to retrieve the most recent invoice for each vendor
SELECT DISTINCT VendorName, (SELECT MAX(InvoiceDate) FROM Invoices WHERE Invoices.VendorID = Vendors.VendorID) AS LatestInv FROM Vendors ORDER BY LatestInv DESC
The result set
(122 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 35
The same query using a join instead of asubquery in the SELECT clause
SELECT VendorName, MAX(InvoiceDate) AS LatestInvFROM Vendors JOIN Invoices ON Vendors.VendorID = Invoices.VendorIDGROUP BY VendorNameORDER BY LatestInv DESC
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 36
A procedure for building complex queries1. State the problem to be solved by the query in English.
2. Use pseudocode to outline the query. The pseudocode shouldidentify the subqueries used by the query and the data they return.It should also include aliases used for any derived tables.
3. If necessary, use pseudocode to outline each subquery.
4. Code the subqueries and test them to be sure that they return thecorrect data.
5. Code and test the final query.
The problem to be solved Which vendor in each state has the largest invoice total?
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 37
Pseudocode for the querySELECT Summary1.VendorState, Summary1.VendorName, TopInState.SumOfInvoicesFROM (Derived table returning VendorState, VendorName, SumOfInvoices) AS Summary1 JOIN (Derived table returning VendorState, MAX(SumOfInvoices)) AS TopInState ON Summary1.VendorState = TopInState.VendorState AND Summary1.SumOfInvoices = TopInState.SumOfInvoicesORDER BY Summary1.VendorState
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 38
Pseudocode for the TopInState subquerySELECT Summary2.VendorState, MAX(Summary2.SumOfInvoices)FROM (Derived table returning VendorState, VendorName, SumOfInvoices) AS Summary2GROUP BY Summary2.VendorState
The code for the Summary1 and Summary2subqueries
SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoicesFROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorIDGROUP BY V_Sub.VendorState, V_Sub.VendorName
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 39
The result of the Summary1 and Summary2subqueries
(34 rows)
The result of the TopInState subquery
(10 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 40
All of the code for the query with three subqueries SELECT Summary1.VendorState, Summary1.VendorName, TopInState.SumOfInvoices FROM (SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoices FROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorID GROUP BY V_Sub.VendorState, V_Sub.VendorName) AS Summary1 JOIN (SELECT Summary2.VendorState, MAX(Summary2.SumOfInvoices) AS SumOfInvoices FROM (SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoices FROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorID GROUP BY V_Sub.VendorState, V_Sub.VendorName) AS Summary2 GROUP BY Summary2.VendorState) AS TopInState ON Summary1.VendorState = TopInState.VendorState AND Summary1.SumOfInvoices = TopInState.SumOfInvoices ORDER BY Summary1.VendorState
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 41
The result set of the query with three subqueries
(10 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 42
The syntax of a CTE WITH cte_name1 AS (query_definition1) [, cte_name2 AS (query_definition2)] [...] sql_statement
How to code a CTE A common table expression (CTE) is an expression that creates
one or more temporary tables that can be used by the following query.
To use a CTE with a query, you code the WITH keyword followed by the definition of the CTE. Then, immediately after the CTE, you code the statement that uses it.
Separate multiple CTEs with commas. Each CTE can refer to itself and any previously defined CTEs in the same WITH clause.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 43
Two CTEs and a query that uses them WITH Summary AS ( SELECT VendorState, VendorName, SUM(InvoiceTotal) AS SumOfInvoices FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID GROUP BY VendorState, VendorName ), TopInState AS ( SELECT VendorState, MAX(SumOfInvoices) AS SumOfInvoices FROM Summary GROUP BY VendorState ) SELECT Summary.VendorState, Summary.VendorName, TopInState.SumOfInvoices FROM Summary JOIN TopInState ON Summary.VendorState = TopInState.VendorState AND Summary.SumOfInvoices = TopInState.SumOfInvoices ORDER BY Summary.VendorState
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 44
The result set
(10 rows)
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 45
How to code a recursive CTE A recursive query is a query that is able to loop through a result
set and perform processing to return a final result set. A recursive CTE can be used to create a recursive query.
A recursive CTE must contain at least two query definitions, an anchor member and a recursive member, and these members must be connected by the UNION ALL operator.
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 47
A recursive CTE that returns hierarchical data WITH EmployeesCTE AS ( -- Anchor member SELECT EmployeeID, FirstName + ' ' + LastName As EmployeeName, 1 As Rank FROM Employees WHERE ManagerID IS NULL UNION ALL -- Recursive member SELECT Employees.EmployeeID, FirstName + ' ' + LastName, Rank + 1 FROM Employees JOIN EmployeesCTE ON Employees.ManagerID = EmployeesCTE.EmployeeID ) SELECT * FROM EmployeesCTE ORDER BY Rank, EmployeeID