Download - Merging data (1)
Merging Data and Passing Tables 10-1
Module 10
Merging Data and Passing Tables Contents:
Lesson 1: Using the MERGE Statement 10-3
Lesson 2: Implementing Table Types 10-14
Lesson 3: Using TABLE Types As Parameters 10-22
Lab 10: Passing Tables and Merging Data 10-26
10-2 Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
Each time a client application makes a call to a SQL Server system, considerable delay is encountered at
the network layer. The basic delay is unrelated to the amount of data being passed. It relates to the
latency of the network. For this reason, it is important to minimize the number of times that a client needs
to call a server for a given amount of data that must be passed between them. Each call is termed a
"roundtrip".
In this module you will review the techniques that provide the ability to process sets of data rather than
individual rows. You will then see how these techniques can be used in combination with TABLE
parameter types to minimize the number of required stored procedure calls in typical applications.
Objectives
After completing this lesson, you will be able to:
Use the MERGE statement Implement table types Use TABLE types as parameters
Merging Data and Passing Tables 10-3
Lesson 1
Using the MERGE Statement
A very common requirement when coding in T-SQL is the need to update a row if it exists but to insert
the row if it does not already exist. SQL Server 2008 introduced the MERGE statement that provides this
ability plus the ability to process entire sets of data rather than processing row by row or in several
separate set-based statements. This leads to much more efficient execution and simplifies the required
coding. In this lesson, you will investigate the use of the MERGE statement and the use of the most
common options associated with the statement.
Objectives
After completing this lesson, you will be able to:
Explain the role of the MERGE statement Describe how to use the WHEN MATCHED clause Describe how to use the WHEN NOT MATCHED BY TARGET clause Describe how to use the WHEN NOT MATCHED BY SOURCE clause Explain the role of the OUTPUT clause and $action Describe MERGE determinism and performance
10-4 Implementing a Microsoft® SQL Server® 2008 R2 Database
MERGE Statement
Key Points
The MERGE statement is most commonly used to insert data that does not already exist but to update the
data if it does exist. It can operate on entire sets of data rather than just on single rows and can perform
alternate actions such as deletes.
MERGE
It is a common requirement to need to update data if it already exists but to insert it if it does not already
exist. Some other database engines (not SQL Server) provide an UPSERT statement for this purpose. The
MERGE statement provided by SQL Server is a more capable replacement for such statements in other
database engines and is based on the ANSI SQL standard together with some Microsoft extensions to the
standard.
A typical situation where the need for the MERGE statement arises is in the population of data warehouses
from data in source transactional systems. For example, consider a data warehouse holding details of a
customer. When a customer row is received from the transactional system, it needs to be inserted into the
data warehouse. When later updates to the customer are made, the data warehouse would then need to
be updated.
Atomicity
Where statements in other languages typically operate on single rows, the MERGE statement in SQL
Server can operate on entire sets of data in a single statement execution. It is important to realize that the
MERGE statement functions as an atomic operation in that all inserts, updates or deletes occur or none
occur.
Source and Target
The MERGE statement uses two table data sources. The target table is the table that is being modified and
is specified first in the MERGE statement. Any inserts, updates or deletes are applied only to the target
table.
Merging Data and Passing Tables 10-5
The source table provides the rows that need to be matched to the rows in the target table. You can think
of the source table as the incoming data. It is specified in a USING clause. The source table does not have
to be an actual table but can be other types of expressions that return a table such as:
A view A sub-select (or derived table) with an alias A common table expression (CTE) A VALUES clause with an alias
The source and target are matched together as the result of an ON clause. This can involve one or more
columns from both tables.
10-6 Implementing a Microsoft® SQL Server® 2008 R2 Database
WHEN MATCHED
Key Points
The WHEN MATCHED clause defines the action to be taken when a row in the source is matched to a row
in the target.
WHEN MATCHED
The ON clause is used to match source rows to target rows. The WHEN MATCHED clause specifies the
action that needs to occur when a source row matches a target row. In most cases, this will involve an
UPDATE statement but it could alternately involve a DELETE statement.
In the example shown in the slide, rows in the EmployeeUpdate table are being matched to rows in the
Employee table based upon the EmployeeID. When a source row matches a target row, the FullName and
EmploymentStatus columns in the target table are updated with the values of those columns in the
source.
Note that only the target table can be updated. If an attempt is made to modify any other table, a syntax
error is returned.
Multiple Clauses
It is also possible to include two WHEN MATCHED clauses such as shown in the following code block:
WHEN MATCHED AND s.Quantity > 0
...
WHEN MATCHED
...
No more than two WHEN MATCHED clauses can be present. When two clauses are used, the first clause
must have an AND condition. If the source row matches the target and also satisfies the AND condition,
then the action specified in the first WHEN MATCHED clause is performed. Otherwise, if the source row
Merging Data and Passing Tables 10-7
matches the target but does not satisfy the AND condition, the condition in the second WHEN MATCHED
clause is evaluated instead.
When two WHEN MATCHED clauses are present, one action must specify an UPDATE and the other action
must specify a DELETE.
Question: What is different about the UPDATE statement in the example shown, compared to a normal
UPDATE statement?
10-8 Implementing a Microsoft® SQL Server® 2008 R2 Database
WHEN NOT MATCHED BY TARGET
Key Points
The WHEN NOT MATCHED BY TARGET clause specifies the action that needs to be taken when a row in
the source cannot be matched to a row in the target.
WHEN NOT MATCHED
The next clause in the MERGE statement that you will consider is the WHEN NOT MATCHED BY TARGET
statement. It was mentioned in the last topic that the most common action performed by a WHEN
MATCHED clause is to update the existing row in the target table. The most common action performed by
a WHEN NOT MATCHED BY TARGET clause is to insert a new row into the target table.
In the example shown in the slide, when a row from the EmployeeUpdate table cannot be found in the
Employee table, a new employee row would be added into the Employee table.
With a standard INSERT statement in T-SQL, the inclusion of a column list is considered a best practice
and avoids issues related to changes to the underlying table such as the reordering of columns or the
addition of new columns. The same recommendation applies to an INSERT action within a MERGE
statement. While a column list is optional, best practice suggests including one.
Syntax
The words BY TARGET are optional and are often omitted. The clause is then just written as WHEN NOT
MATCHED. Note again that no table name is included in the action statement (INSERT statement) as
modifications may only be made to the target table.
The WHEN NOT MATCHED BY TARGET clause is part of the ANSI SQL standard.
Merging Data and Passing Tables 10-9
WHEN NOT MATCHED BY SOURCE
Key Points
The WHEN NOT MATCHED BY SOURCE statement is used to specify an action to be taken for rows in the
target that were not matched by rows from the source.
WHEN NOT MATCHED BY SOURCE
While much less commonly used than the clauses discussed in the previous topics, you can also take an
action for rows in the target that did not match any incoming rows from the source.
Generally, this will involve deleting the unmatched rows in the target table but UPDATE actions are also
permitted.
Note the format of the DELETE statement in the example on the slide. At first glance, it might seem quite
odd as it has no table or predicate specified. In this example, all rows in the Employee table that were not
matched by an incoming source row from the EmployeeUpdate table would be deleted.
Question: What would the DELETE statement look like if it only deleted rows where the date in a column
called LastModifed were older than a year?
10-10 Implementing a Microsoft® SQL Server® 2008 R2 Database
OUTPUT Clause and $action
Key Points
The OUTPUT clause was added in SQL Server 2005 and allows the return of a set of rows when performing
data modifications. In 2005, this applied to INSERT, DELETE and UPDATE. In SQL Server 2008 and later,
this clause can also be used with the MERGE statement.
OUTPUT Clause
The OUTPUT clause was a useful addition to the INSERT, UPDATE and DELETE statements in SQL Server
2005. For example, consider the following code:
DELETE FROM HumanResources.Employee
OUTPUT deleted.BusinessEntityID, deleted.NationalIDNumber
WHERE ModifiedDate < DATEADD(YEAR,-10,SYSDATETIME());
In this example, employees are deleted when their rows have not been modified within the last ten years.
As part of this modification, a set of rows is returned that provides details of the BusinessEntityID and
NationalIDNumber for each row deleted.
As well as returning rows to the client application, the OUTPUT clause can include an INTO sub-clause
that causes the rows to be inserted into another existing table instead. Consider the following example:
DELETE FROM HumanResources.Employee
OUTPUT deleted.BusinessEntityID, deleted.NationalIDNumber
INTO Audit.EmployeeDelete
WHERE ModifiedDate < DATEADD(YEAR,-10,SYSDATETIME());
In this example, details of the employees being deleted are inserted into the Audit.EmployeeDelete table
instead of being returned to the client.
Merging Data and Passing Tables 10-11
OUTPUT and MERGE
The OUTPUT clause can also be used with the MERGE statement. When an INSERT is performed, rows can
be returned from the inserted virtual table. When a DELETE is performed, rows can be returned from the
deleted virtual table. When an UPDATE is performed, values will be available in both the inserted and
deleted virtual tables.
Because a single MERGE statement can perform INSERT, UPDATE and DELETE actions, it can be useful to
know which action was performed for each row returned by the OUTPUT clause. To make this possible,
the OUTPUT clause also supports a $action virtual column that returns details of the action performed on
each row. It returns the words "INSERT", "UPDATE" or "DELETE".
Composable SQL
In SQL Server 2008 and later, it is now possible to consume the rowset returned by the OUTPUT clause
more directly. The rowset cannot be used as a general purpose table source but can be used as a table
source for an INSERT SELECT statement. Consider the following example:
INSERT INTO Audit.EmployeeDelete
SELECT Mods.EmployeeID
FROM (MERGE INTO dbo.Employee AS e
USING dbo.EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID
WHEN MATCHED THEN
UPDATE SET e.FullName = eu.FullName,
e.EmploymentStatus = eu.EmploymentStatus
WHEN NOT MATCHED THEN
INSERT (EmployeeID,FullName,EmploymentStatus)
VALUES
(eu.EmployeeID,eu.FullName,eu.EmploymentStatus)
OUTPUT $action AS Action,deleted.EmployeeID) AS Mods
WHERE Mods.Action = 'DELETE';
In this example, the OUTPUT clause is being used with the MERGE statement. A row would be returned for
each row either updated or deleted. However, you wish to only audit the deletion. You can treat the
MERGE statement with an OUTPUT clause as a table source for an INSERT SELECT statement. The enclosed
statement must be given an alias. In this case, the alias "Mods" has been assigned.
The power of being able to SELECT from a MERGE statement is that you can then apply a WHERE clause.
In this example, only the DELETE actions have been selected.
Note that from SQL Server 2008 onwards, this level of query composability also applies to the OUTPUT
clause when used in standard T-SQL INSERT, UPDATE and DELETE statements.
Question: How could the OUTPUT clause be useful in a DELETE statement?
10-12 Implementing a Microsoft® SQL Server® 2008 R2 Database
MERGE Determinism and Performance
Key Points
The actions performed by a MERGE statement are not identical to those that would be performed by
separate INSERT, UPDATE or DELETE statements.
Determinism
When an UPDATE statement is executed with a join, if more than one source row matches a target row,
no error is thrown. This is not permitted for an UPDATE action performed within a MERGE statement. Each
source row must match only a single target row or none at all. If more than a single source row matches a
target row, an error occurs and all actions performed by the MERGE statement are rolled back.
Performance of MERGE
The MERGE statement will often outperform code constructed from separate INSERT, UPDATE and
DELETE statements and conditional logic. In particular, the MERGE statement only ever makes a single
pass through the data.