Download - Merging data (1)

Transcript
Page 1: Merging data (1)

Merging Data and Passing Tables 10-1

Module 10

Merging Data and Passing Tables Contents:

Lesson 1: Using the MERGE Statement 10-3

Lesson 2: Implementing Table Types 10-14

Lesson 3: Using TABLE Types As Parameters 10-22

Lab 10: Passing Tables and Merging Data 10-26

Page 2: Merging data (1)

10-2 Implementing a Microsoft® SQL Server® 2008 R2 Database

Module Overview

Each time a client application makes a call to a SQL Server system, considerable delay is encountered at

the network layer. The basic delay is unrelated to the amount of data being passed. It relates to the

latency of the network. For this reason, it is important to minimize the number of times that a client needs

to call a server for a given amount of data that must be passed between them. Each call is termed a

"roundtrip".

In this module you will review the techniques that provide the ability to process sets of data rather than

individual rows. You will then see how these techniques can be used in combination with TABLE

parameter types to minimize the number of required stored procedure calls in typical applications.

Objectives

After completing this lesson, you will be able to:

Use the MERGE statement Implement table types Use TABLE types as parameters

Page 3: Merging data (1)

Merging Data and Passing Tables 10-3

Lesson 1

Using the MERGE Statement

A very common requirement when coding in T-SQL is the need to update a row if it exists but to insert

the row if it does not already exist. SQL Server 2008 introduced the MERGE statement that provides this

ability plus the ability to process entire sets of data rather than processing row by row or in several

separate set-based statements. This leads to much more efficient execution and simplifies the required

coding. In this lesson, you will investigate the use of the MERGE statement and the use of the most

common options associated with the statement.

Objectives

After completing this lesson, you will be able to:

Explain the role of the MERGE statement Describe how to use the WHEN MATCHED clause Describe how to use the WHEN NOT MATCHED BY TARGET clause Describe how to use the WHEN NOT MATCHED BY SOURCE clause Explain the role of the OUTPUT clause and $action Describe MERGE determinism and performance

Page 4: Merging data (1)

10-4 Implementing a Microsoft® SQL Server® 2008 R2 Database

MERGE Statement

Key Points

The MERGE statement is most commonly used to insert data that does not already exist but to update the

data if it does exist. It can operate on entire sets of data rather than just on single rows and can perform

alternate actions such as deletes.

MERGE

It is a common requirement to need to update data if it already exists but to insert it if it does not already

exist. Some other database engines (not SQL Server) provide an UPSERT statement for this purpose. The

MERGE statement provided by SQL Server is a more capable replacement for such statements in other

database engines and is based on the ANSI SQL standard together with some Microsoft extensions to the

standard.

A typical situation where the need for the MERGE statement arises is in the population of data warehouses

from data in source transactional systems. For example, consider a data warehouse holding details of a

customer. When a customer row is received from the transactional system, it needs to be inserted into the

data warehouse. When later updates to the customer are made, the data warehouse would then need to

be updated.

Atomicity

Where statements in other languages typically operate on single rows, the MERGE statement in SQL

Server can operate on entire sets of data in a single statement execution. It is important to realize that the

MERGE statement functions as an atomic operation in that all inserts, updates or deletes occur or none

occur.

Source and Target

The MERGE statement uses two table data sources. The target table is the table that is being modified and

is specified first in the MERGE statement. Any inserts, updates or deletes are applied only to the target

table.

Page 5: Merging data (1)

Merging Data and Passing Tables 10-5

The source table provides the rows that need to be matched to the rows in the target table. You can think

of the source table as the incoming data. It is specified in a USING clause. The source table does not have

to be an actual table but can be other types of expressions that return a table such as:

A view A sub-select (or derived table) with an alias A common table expression (CTE) A VALUES clause with an alias

The source and target are matched together as the result of an ON clause. This can involve one or more

columns from both tables.

Page 6: Merging data (1)

10-6 Implementing a Microsoft® SQL Server® 2008 R2 Database

WHEN MATCHED

Key Points

The WHEN MATCHED clause defines the action to be taken when a row in the source is matched to a row

in the target.

WHEN MATCHED

The ON clause is used to match source rows to target rows. The WHEN MATCHED clause specifies the

action that needs to occur when a source row matches a target row. In most cases, this will involve an

UPDATE statement but it could alternately involve a DELETE statement.

In the example shown in the slide, rows in the EmployeeUpdate table are being matched to rows in the

Employee table based upon the EmployeeID. When a source row matches a target row, the FullName and

EmploymentStatus columns in the target table are updated with the values of those columns in the

source.

Note that only the target table can be updated. If an attempt is made to modify any other table, a syntax

error is returned.

Multiple Clauses

It is also possible to include two WHEN MATCHED clauses such as shown in the following code block:

WHEN MATCHED AND s.Quantity > 0

...

WHEN MATCHED

...

No more than two WHEN MATCHED clauses can be present. When two clauses are used, the first clause

must have an AND condition. If the source row matches the target and also satisfies the AND condition,

then the action specified in the first WHEN MATCHED clause is performed. Otherwise, if the source row

Page 7: Merging data (1)

Merging Data and Passing Tables 10-7

matches the target but does not satisfy the AND condition, the condition in the second WHEN MATCHED

clause is evaluated instead.

When two WHEN MATCHED clauses are present, one action must specify an UPDATE and the other action

must specify a DELETE.

Question: What is different about the UPDATE statement in the example shown, compared to a normal

UPDATE statement?

Page 8: Merging data (1)

10-8 Implementing a Microsoft® SQL Server® 2008 R2 Database

WHEN NOT MATCHED BY TARGET

Key Points

The WHEN NOT MATCHED BY TARGET clause specifies the action that needs to be taken when a row in

the source cannot be matched to a row in the target.

WHEN NOT MATCHED

The next clause in the MERGE statement that you will consider is the WHEN NOT MATCHED BY TARGET

statement. It was mentioned in the last topic that the most common action performed by a WHEN

MATCHED clause is to update the existing row in the target table. The most common action performed by

a WHEN NOT MATCHED BY TARGET clause is to insert a new row into the target table.

In the example shown in the slide, when a row from the EmployeeUpdate table cannot be found in the

Employee table, a new employee row would be added into the Employee table.

With a standard INSERT statement in T-SQL, the inclusion of a column list is considered a best practice

and avoids issues related to changes to the underlying table such as the reordering of columns or the

addition of new columns. The same recommendation applies to an INSERT action within a MERGE

statement. While a column list is optional, best practice suggests including one.

Syntax

The words BY TARGET are optional and are often omitted. The clause is then just written as WHEN NOT

MATCHED. Note again that no table name is included in the action statement (INSERT statement) as

modifications may only be made to the target table.

The WHEN NOT MATCHED BY TARGET clause is part of the ANSI SQL standard.

Page 9: Merging data (1)

Merging Data and Passing Tables 10-9

WHEN NOT MATCHED BY SOURCE

Key Points

The WHEN NOT MATCHED BY SOURCE statement is used to specify an action to be taken for rows in the

target that were not matched by rows from the source.

WHEN NOT MATCHED BY SOURCE

While much less commonly used than the clauses discussed in the previous topics, you can also take an

action for rows in the target that did not match any incoming rows from the source.

Generally, this will involve deleting the unmatched rows in the target table but UPDATE actions are also

permitted.

Note the format of the DELETE statement in the example on the slide. At first glance, it might seem quite

odd as it has no table or predicate specified. In this example, all rows in the Employee table that were not

matched by an incoming source row from the EmployeeUpdate table would be deleted.

Question: What would the DELETE statement look like if it only deleted rows where the date in a column

called LastModifed were older than a year?

Page 10: Merging data (1)

10-10 Implementing a Microsoft® SQL Server® 2008 R2 Database

OUTPUT Clause and $action

Key Points

The OUTPUT clause was added in SQL Server 2005 and allows the return of a set of rows when performing

data modifications. In 2005, this applied to INSERT, DELETE and UPDATE. In SQL Server 2008 and later,

this clause can also be used with the MERGE statement.

OUTPUT Clause

The OUTPUT clause was a useful addition to the INSERT, UPDATE and DELETE statements in SQL Server

2005. For example, consider the following code:

DELETE FROM HumanResources.Employee

OUTPUT deleted.BusinessEntityID, deleted.NationalIDNumber

WHERE ModifiedDate < DATEADD(YEAR,-10,SYSDATETIME());

In this example, employees are deleted when their rows have not been modified within the last ten years.

As part of this modification, a set of rows is returned that provides details of the BusinessEntityID and

NationalIDNumber for each row deleted.

As well as returning rows to the client application, the OUTPUT clause can include an INTO sub-clause

that causes the rows to be inserted into another existing table instead. Consider the following example:

DELETE FROM HumanResources.Employee

OUTPUT deleted.BusinessEntityID, deleted.NationalIDNumber

INTO Audit.EmployeeDelete

WHERE ModifiedDate < DATEADD(YEAR,-10,SYSDATETIME());

In this example, details of the employees being deleted are inserted into the Audit.EmployeeDelete table

instead of being returned to the client.

Page 11: Merging data (1)

Merging Data and Passing Tables 10-11

OUTPUT and MERGE

The OUTPUT clause can also be used with the MERGE statement. When an INSERT is performed, rows can

be returned from the inserted virtual table. When a DELETE is performed, rows can be returned from the

deleted virtual table. When an UPDATE is performed, values will be available in both the inserted and

deleted virtual tables.

Because a single MERGE statement can perform INSERT, UPDATE and DELETE actions, it can be useful to

know which action was performed for each row returned by the OUTPUT clause. To make this possible,

the OUTPUT clause also supports a $action virtual column that returns details of the action performed on

each row. It returns the words "INSERT", "UPDATE" or "DELETE".

Composable SQL

In SQL Server 2008 and later, it is now possible to consume the rowset returned by the OUTPUT clause

more directly. The rowset cannot be used as a general purpose table source but can be used as a table

source for an INSERT SELECT statement. Consider the following example:

INSERT INTO Audit.EmployeeDelete

SELECT Mods.EmployeeID

FROM (MERGE INTO dbo.Employee AS e

USING dbo.EmployeeUpdate AS eu

ON e.EmployeeID = eu.EmployeeID

WHEN MATCHED THEN

UPDATE SET e.FullName = eu.FullName,

e.EmploymentStatus = eu.EmploymentStatus

WHEN NOT MATCHED THEN

INSERT (EmployeeID,FullName,EmploymentStatus)

VALUES

(eu.EmployeeID,eu.FullName,eu.EmploymentStatus)

OUTPUT $action AS Action,deleted.EmployeeID) AS Mods

WHERE Mods.Action = 'DELETE';

In this example, the OUTPUT clause is being used with the MERGE statement. A row would be returned for

each row either updated or deleted. However, you wish to only audit the deletion. You can treat the

MERGE statement with an OUTPUT clause as a table source for an INSERT SELECT statement. The enclosed

statement must be given an alias. In this case, the alias "Mods" has been assigned.

The power of being able to SELECT from a MERGE statement is that you can then apply a WHERE clause.

In this example, only the DELETE actions have been selected.

Note that from SQL Server 2008 onwards, this level of query composability also applies to the OUTPUT

clause when used in standard T-SQL INSERT, UPDATE and DELETE statements.

Question: How could the OUTPUT clause be useful in a DELETE statement?

Page 12: Merging data (1)

10-12 Implementing a Microsoft® SQL Server® 2008 R2 Database

MERGE Determinism and Performance

Key Points

The actions performed by a MERGE statement are not identical to those that would be performed by

separate INSERT, UPDATE or DELETE statements.

Determinism

When an UPDATE statement is executed with a join, if more than one source row matches a target row,

no error is thrown. This is not permitted for an UPDATE action performed within a MERGE statement. Each

source row must match only a single target row or none at all. If more than a single source row matches a

target row, an error occurs and all actions performed by the MERGE statement are rolled back.

Performance of MERGE

The MERGE statement will often outperform code constructed from separate INSERT, UPDATE and

DELETE statements and conditional logic. In particular, the MERGE statement only ever makes a single

pass through the data.


Top Related