chapter 17 data warehousing, archival, and repositories

35
Chapter 17 Data Warehousing, Archival, and Repositories 데데데데데데데데데 데데 3 데데 데데데

Upload: griffith-ward

Post on 01-Jan-2016

39 views

Category:

Documents


9 download

DESCRIPTION

Chapter 17 Data Warehousing, Archival, and Repositories. 데이터베이스연구실 석사 3 학기 박보영. Introduction. Common applications of XML in the enterprise Large-scale transaction system Content management system Persistence of information using XML - PowerPoint PPT Presentation

TRANSCRIPT

Chapter 17Data Warehousing, Archival, and Repositories

데이터베이스연구실석사 3 학기

박보영

2

Introduction

Common applications of XML in the enterprise

Large-scale transaction system

Content management system

Persistence of information using XML

Traditional strategies for these processes

How adding XML to the mix

XML improve

3

In this chapter Data warehousing

How data warehousing work How XML may be used to streamline the transaction and warehousing process

Data archival For most system

Data loses its relevance over time Moved off the system when some threshold is reached

Typically want to set this data aside in some form so that, should we need to refer to it later

Can retrieve the data with the minimum of effort How XML may be used to make this process easier

Data repositories How XML may be used to persist detail-only information in a way that makes the

relational database perform better, while taking advantage of XML technologies(such as XSLT) to present the detail information when necessary

“All the examples in this chapter are designed to work with SQL Server 6.x+, ADO 2.5+, VBScript(IE 4.0+)”

4

Data Warehousing

One of the problems often faced by data architects is to design a system

Detail information or transactionsquick retrieval

Query and summarize that dataeasily and efficiently

In this section Define data warehousing

Discuss the concepts that drive data warehouse design

Take a look XML can facilitate the data warehousing process

Look at examples of XML in use in a data warehouse

5

The Two Roles of Data

In an enterprise-level data solution, the data in the database plays two roles:

• Information gathering • Querying and summarization of data

Detailed information gathering- The first use of the database is to gather data from external sources(such as other databases, X

ML, or simple delimited text files)

- There are some things that remain consistent across all implementations that gather data:

Detail oriented

Write-heavy

Transacted

Space-conscious

Heavily normalized

6

Infromation Querying and Summarization

– The other use of database is to provide the ability to query the data and

summarize it to extrapolate trends, volumes, and other useful information from

the details.

– The specific mechanisms will vary from implementation to implementation, but

there are some constants:

Summary-oriented

Read-only

Results-conscious

Less normalized

7

The Traditional Solution- The traditional approach to designing a relational database to support a platform is to design one

database to perform both data acquisition function and querying/summarization functions

CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))

CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))

INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')

CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey))

CREATE INDEX ix_Invoice_invoiceDate ON Invoice (invoiceDate)CREATE INDEX ix_Invoice_shipDate ON Invoice (shipDate)CREATE INDEX ix_Invoice_CustomerKey ON Invoice (CustomerKey)CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)

CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float)

CREATE INDEX ix_LineItem_PartKey ON LineItem (PartKey)CREATE INDEX ix_LineItem_InvoiceKey ON LineItem (InvoiceKey)

(ch17_ex01.sql)

8

- When run, this script creates the following table structure:

Customer

Part

LineItem

Invoice

ShipMethod

This set of tables supports the addition of invoice

Space is conserved for these tables where possible by normalizing out Part and Customer

This normalization also facilitates querying and reporting on Part and Customer

however, there are a couple of issues with using this type of dual-purpose structure

the system is well tuned to either data entry or reporting

serious danger of wait states for reporting and querying due to the locking being performed by the data entry

9

Many databases try to get around the problem by adding structures that will be used to drive reporting

In this table

- separate from the transaction tables

- have all the appropriate indices to allow data to be queried and summarized from those tables in a

ny way the business rules might call for

for example, add a MonthlyPartTotal table (ch17_ex02.sql):

this is a step on the right direction

another problems

CREATE TABLE MonthlyPartTotal (

summaryMonth tinyint,

summaryYear smallint,

PartKey integer

CONSTRAINT FK_MPT_PartKey FOREIGN KEY (PartKey)

REFERENCES Part (PartKey),

Quantity integer)

10

CREATE TRIGGER UpdateMPT

ON LineItem

FOR INSERT, UPDATE

AS

BEGIN

IF (SELECT COUNT(*)

FROM MonthlyPartTotal, inserted, Invoice

WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate)

AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate)

AND inserted.InvoiceKey = Invoice.InvoiceKey

AND MonthlyPartTotal.PartKey = inserted.PartKey) > 0

UPDATE MonthlyPartTotal

SET Quantity = MonthlyPartTotal.Quantity + inserted.Quantity

FROM inserted, Invoice

WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate)

AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate)

AND inserted.InvoiceKey = Invoice.InvoiceKey

AND MonthlyPartTotal.PartKey = inserted.PartKey

ELSE

INSERT MonthlyPartTotal (summaryMonth, summaryYear, PartKey, Quantity)

SELECT DATEPART(mm, Invoice.invoiceDate),

DATEPART(yyyy, Invoice.invoiceDate),

inserted.PartKey,

inserted.Quantity

FROM inserted, Invoice

WHERE inserted.InvoiceKey = Invoice.InvoiceKey

IF (SELECT COUNT(*) FROM deleted) > 0

UPDATE MonthlyPartTotal

SET Quantity = MonthlyPartTotal.Quantity - deleted.Quantity

FROM deleted, Invoice

WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate)

AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate)

AND deleted.InvoiceKey = Invoice.InvoiceKey

AND MonthlyPartTotal.PartKey = deleted.PartKey

END

(Ch17_03.sql)

11

The Data Warehousing Solution

On-Line Transaction Processing(OLTP) databases – the

information gatherers

On Line Analytical Processing(OLAP) databases – the query and

summarization handlers

Parts that make up an OLAP database

The role of XML in improving the function of OLAP databases

12

On-Line Transaction Processing(OLTP)

The gathering of detail information is often referred to as OLTP

Handle all data gathering processes

Design tables to support the acquisition of transactional data

The database is an normalized ad possible, with the spedific table or ta

bles being kept as small as possible to reduce insert time and disk cons

umption

Data archival strategies - In place to make OLTP database does not grow t

oo large

13

Customer

Part

LineItem

Invoice

ShipMethod

CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))

CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))

INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')

CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer

CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey))

CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)

CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float)

OLTP database structure (ch17_ex04.sql):

14

On-Line Analytical processing(OLAP)

Database intended to support querying and summarization

Designed with one goal in mind – the querying and summarization of

detail data by any number of specifics defined in the business rules for

our system

Create our tables to support the querying and retrieval of this

information, with the ability to rapidly insert information(lower priority)

Indexing technologies designed specifically for OLAP querying

optimize(query and report)

15

Parts of an OLAP Database Fact Tables-where the information we wish to report on is stored

Measure Tables-where you store the measures used to do the reporting

Schema-where the two types of table above interact to give you your reports

Fact Tables

Contain the data that we are planning to report on, at the lowest level of granul

arity we will need to access

Denormalizationall facts together into oe table

CREATE TABLE factInvoicePart (

InvoiceKey integer PRIMARY KEY IDENTITY,

CustomerKey integer,

ShipDateKey integer,

ShipMethodKey integer,

PartKey integer,

Quantity integer,

Price float)

CREATE TABLE factDailyTotal (

DailyTotalKey integer PRIMARY KEY IDENTITY,

InvoiceDate integer,

partKey integer,

partCount integer,

partUnitPrice float)

Fact table creation script for the inventory control team(ch17-ex05.sql)

Fact table creation script for the executive team(ch17-ex06.sql)

16

Measure/Dimension Tables

Parameters(use in the WHERE clause in query)

CREATE TABLE measureCustomer (

CustomerKey integer PRIMARY KEY,

Name varchar(50),

Address varchar(50),

City varchar(30),

State char(2),

PostalCode varchar(10))

CREATE TABLE measureShipDate (

ShipDateKey integer PRIMARY KEY,

Month tinyint,

Day tinyint,

Year smallint)

CREATE TABLE measureShipMethod (

shipMethodKey integer PRIMARY KEY,

shipMethod varchar(5))

CREATE TABLE measurePart (

PartKey integer PRIMARY KEY,

name varchar(20),

size varchar(10) NULL,

color varchar(10) NULL)

(Ch17_ex07.sql)

17

Schema

Composed of the tables in our database

Joined together by the foreign keys that relate the individual tables

together

Two types of schema that are normally used when designing an OLAP

database:

Star schema

Snowflake schema

18

CONSTRAINT fk_fact_ShipMethod FOREIGN KEY (ShipMethodKey) REFERENCES measureShipMethod (shipMethodKey), PartKey integer, CONSTRAINT fk_fact_Part FOREIGN KEY (PartKey) REFERENCES measurePart (PartKey), Quantity integer, Price float)

CREATE INDEX ix_fact_Customer ON factInvoicePart (CustomerKey)CREATE INDEX ix_fact_ShipDate ON factInvoicePart (ShipDateKey)CREATE INDEX ix_fact_ShipMethod ON factInvoicePart (shipMethodKey)CREATE INDEX ix_fact_Part ON factInvoicePart (PartKey)

Star schema(ch17_ex08.sql)CREATE TABLE measureCustomer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))

CREATE TABLE measureShipDate ( ShipDateKey integer PRIMARY KEY, ShipMonth tinyint, ShipDay tinyint, ShipYear smallint) CREATE TABLE measureShipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))

CREATE TABLE measurePart ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)

CREATE TABLE factInvoicePart ( InvoiceKey integer PRIMARY KEY IDENTITY, CustomerKey integer CONSTRAINT fk_fact_Customer FOREIGN KEY (CustomerKey) REFERENCES measureCustomer (CustomerKey), ShipDateKey integer CONSTRAINT fk_fact_ShipDate FOREIGN KEY (ShipDateKey) REFERENCES measureShipDate (ShipDateKey), ShipMethodKey integer,

measureCustomer

factInvoicePart

measureShipDate

measureShipMethod

measurePart

This script creates the structure seen below:

19

Snowflake schema(ch17_ex09.sql)CREATE TABLE measureCustomer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))

CREATE TABLE measureShipYear ( ShipYearKey integer PRIMARY KEY, ShipYear smallint)

CREATE TABLE measureShipMonth ( ShipMonthKey integer PRIMARY KEY, ShipMonth tinyint, ShipYearKey integer CONSTRAINT fk_measure_ShipYear FOREIGN KEY (ShipYearKey) REFERENCES measureShipYear (ShipYearKey))

CREATE TABLE measureShipDate ( ShipDateKey integer PRIMARY KEY, ShipDay tinyint, ShipMonthKey integer CONSTRAINT fk_measure_ShipMonth FOREIGN KEY (ShipMonthKey) REFERENCES measureShipMonth (ShipMonthKey))

CREATE TABLE measureShipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))

CREATE TABLE factInvoicePart ( InvoiceKey integer PRIMARY KEY IDENTITY, CustomerKey integer CONSTRAINT fk_fact_Customer FOREIGN KEY (CustomerKey) REFERENCES measureCustomer (CustomerKey), ShipDateKey integer CONSTRAINT fk_fact_ShipDate FOREIGN KEY (ShipDateKey) REFERENCES measureShipDate (ShipDateKey), ShipMethodKey integer, CONSTRAINT fk_fact_ShipMethod FOREIGN KEY (ShipMethodKey) REFERENCES measureShipMethod (shipMethodKey), PartKey integer, CONSTRAINT fk_fact_Part FOREIGN KEY (PartKey) REFERENCES measurePart (PartKey), Quantity integer, Price float)

CREATE INDEX ix_fact_Customer ON factInvoicePart (CustomerKey)CREATE INDEX ix_fact_ShipDate ON factInvoicePart (ShipDateKey)CREATE INDEX ix_fact_ShipMethod ON factInvoicePart (shipMethodKey)CREATE INDEX ix_fact_Part ON factInvoicePart (PartKey)CREATE INDEX ix_measure_Month ON measureShipDate (shipMonthKey)CREATE INDEX ix_measure_Year ON measureShipMonth (shipYearKey)

CREATE TABLE measurePart ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)

20

measureCustomer

factInvoicePart

measureShipDate

measureShipMethod

measurePart

measureShipMonth

measureShipYear

This SQL script creates the following set of tables:

21

Cubes• To better facilitate the query and retrieval of OLAP data – each star schema or snowflake

schema creates a cube(the detail information is aggregated along each of the measures)

• Example-star schema

measure Customer

factInvoicePart

measureShipDate

measureShipMethod

measurePart

FedEx UPS USPS

ALL invoices 72,102 15,209 12,689

FedEx UPS USPS

Customer 1

22,615 5,192 3,972

Customer 2

12,541 2,188 918

Customer 3

7,342 3,182 1,212

Customer 4

15,320 716 3,132

Customer 5

14,284 3,931 3,455

factInvoicePart table-100,000 invoice(shipping method)

Add another dimension-customer

22

Building your Fact Tables

Two step:

Pulling the data out of the OLTP database

updating the OLAP database with that data

In order to perform rapidly update

Native export forms are often used (such as BCP files for SQL

Server)

Loaded into the OLAP database

Some problems

The most notable of which is the relative fragility of the export

and import code

If the OLTP or OLAP database change, this code will need to be

revisited

23

The Role of XML

Make it easier to update the OLAP database from the OLTP database

Effective way to handle OLTP data

Using XML for OLAP Update Data

Using XML for OLTP Data

Using XML for OLAP Update Data- rather than building specialized import routines for all the different

formats, if you build one XML importer and than building leverage

the data provider’s ability to export in XML

- Can save a lot of aggravation and coding time

24

<?xml version="1.0"?><!DOCTYPE listing SYSTEM "ch27_ex09.dtd" >

<InvoiceBulk> <Invoice CustomerKey="17" ShipDate="10/17/2000" ShipMethodKey="1"> <Part PartKey="33" Quantity="17" Price="0.20" /> <Part PartKey="29" Quantity="13" Price="0.15" /> </Invoice> <Invoice CustomerKey="12" ShipDate="10/11/2000" ShipMethodKey="2"> <Part PartKey="31" Quantity="19" Price="0.10" /> <Part PartKey="29" Quantity="17" Price="0.15" /> </Invoice></InvoiceBulk>

<!ELEMENT InvoiceBulk (Invoice+)>

<!ELEMENT Invoice (LineItem+)>

<!ATTLIST Invoice

CustomerKey CDATA #REQUIRED

ShipDate CDATA #REQUIRED

ShipMethodKey CDATA #REQUIRED>

<!ELEMENT Part EMPTY>

<!ATTLIST Part

PartKey CDATA #REQUIRED

Quantity CDATA #REQUIRED

Price CDATA #REQUIRED>

(Ch17_ex09.dtd) (Ch17_ex09.xml)

25

Function using XSLT(ch17_ex10.xsl):

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="text" />

<xsl:template match="/">

<xsl:for-each select="InvoiceBulk/Invoice/Part">

<xsl:text>INSERT factInvoicePart (

CustomerKey,

ShipDateKey,

ShipMethodKey,

PartKey,

Quantity,

Price)

SELECT </xsl:text>

<xsl:value-of select="../@CustomerKey" /><xsl:text>,

</xsl:text>

<xsl:text>measureShipDate.ShipDateKey,

</xsl:text>

<xsl:value-of select="../@ShipMethodKey" /><xsl:text>,

</xsl:text>

<xsl:value-of select="@PartKey" /><xsl:text>,

</xsl:text>

<xsl:value-of select="@Quantity" /><xsl:text>,

</xsl:text>

<xsl:value-of select="@Price" />

<xsl:text>

FROM measureShipDate</xsl:text>

<xsl:text>

WHERE DATEPART(mm, '</xsl:text>

<xsl:value-of select="../@ShipDate" />

<xsl:text>')=measureShipDate.shipMonth

AND DATEPART(dd, '</xsl:text>

<xsl:value-of select="../@ShipDate" />

<xsl:text>')=measureShipDate.shipDay

AND DATEPART(yyyy, '</xsl:text>

<xsl:value-of select="../@ShipDate" />

<xsl:text>')=measureShipDate.shipYear</xsl:text>

<xsl:text>&#x0d;&#x0a;GO&#x0d;&#x0a;&#x0d;&#x0a;</xsl:text>

</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

26

ch17_ex10.xsl -> Ch17_ex09.xml => ch17_10a.sql * 참고 < http://www.jclark.com/xml/xt.html >

INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)

SELECT 17, measureShipDate.ShipDateKey, 1, 33, 17, 0.20 FROM measureShipDate WHERE DATEPART(mm, '10/17/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/17/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/17/2000')=measureShipDate.shipYearGO

INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)

SELECT 17, measureShipDate.ShipDateKey, 1, 29, 13, 0.15 FROM measureShipDate WHERE DATEPART(mm, '10/17/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/17/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/17/2000')=measureShipDate.shipYearGO

INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)

SELECT 12, measureShipDate.ShipDateKey, 2, 31, 19, 0.10 FROM measureShipDate WHERE DATEPART(mm, '10/11/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/11/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/11/2000')=measureShipDate.shipYearGO

INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)

SELECT 12, measureShipDate.ShipDateKey, 2, 29, 17, 0.15 FROM measureShipDate WHERE DATEPART(mm, '10/11/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/11/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/11/2000')=measureShipDate.shipYearGO

27

(Ch17_ex10b.sql):

INSERT measureCustomer (

CustomerKey,

[Name],

Address,

City,

State,

PostalCode)

VALUES (

12,

'Homer J. Simpson',

'742 Evergreen Terrace',

'Springfield',

'KY',

'12345')

GO

INSERT measureCustomer (

CustomerKey,

[Name],

Address,

City,

State,

PostalCode)

VALUES (

17,

'Kevin B. Williams',

'744 Evergreen Terrace',

'Springfield',

'KY',

'12345')

GO

INSERT measureShipMethod (

shipMethodKey,

shipMethod)

VALUES (

1,

'Fedex')

GO

INSERT measureShipMethod (

shipMethodKey,

shipMethod)

VALUES (

2,

'USPS')

GO

INSERT measureShipMethod (

shipMethodKey,

shipMethod)

VALUES (

3,

'UPS')

GO

INSERT measurePart (

PartKey,

[name],

[size],

color)

VALUES (

31,

'grommets',

'3 in.',

'blue')

GO

INSERT measurePart (

PartKey,

[name],

[size],

color)

VALUES (

29,

'sprockets',

'2 in.',

'silver')

GO

INSERT measurePart (

PartKey,

[name],

[size],

color)

VALUES (

33,

'brackets',

'1 in.',

'red')

GO

INSERT measureShipDate (

ShipDateKey,

ShipMonth,

ShipDay,

ShipYear)

VALUES (

1,

10,

11,

2000

)

GO

INSERT measureShipDate (

ShipDateKey,

ShipMonth,

ShipDay,

ShipYear)

VALUES (

2,

10,

17,

2000

)

GO

28

Result table structure Run ch17_ex08.sql against a sample database to create the tables in the firdt place Transformed ch17_ex10.xml with ch17_ex10.xsl to produce the output script ch17_ex1

0a.sql Run ch17_ex10b.sql to populate tables with initial data Run ch17_ex10a.sql to add data to populate table factInvoicePart with data

measure Customer

factInvoicePart

measureShipDate

measureShipMethod

measurePart

29

Using XML for OLAP Data

OLTP systems typically access one discrete transaction at a time

Locking can be avoid

XML technology can be easily leveraged

30

RDBMS structure (ch17_ex11.sql):

CREATE TABLE Customer (

CustomerKey integer PRIMARY KEY,

Name varchar(50),

Address varchar(50),

City varchar(30),

State char(2),

PostalCode varchar(10))

CREATE TABLE shipMethod (

shipMethodKey integer PRIMARY KEY,

shipMethod varchar(5))

INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')

INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')

INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')

CREATE TABLE Part (

PartKey integer PRIMARY KEY,

name varchar(20),

size varchar(10) NULL,

color varchar(10) NULL)

<!ELEMENT Invoice (LineItem+)><!ATTLIST Invoice CustomerKey CDATA #REQUIRED ShipDate CDATA #REQUIRED ShipMethodKey CDATA #REQUIRED>

<!ELEMENT Part EMPTY><!ATTLIST Part PartKey CDATA #REQUIRED Quantity CDATA #REQUIRED Price CDATA #REQUIRED>

<?xml version="1.0"?><!DOCTYPE listing SYSTEM "ch27_ex12.dtd" >

<InvoiceBulk> <Invoice CustomerKey="17" ShipDate="10/17/2000" ShipMethodKey="1"> <Part PartKey="33" Quantity="17" Price="0.20" /> <Part PartKey="29" Quantity="13" Price="0.15" /> </Invoice></InvoiceBulk>

(ch17_ex12.dtd):

(ch17_ex12.xml):

31

Data Archival

CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))

CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))

INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')

CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer

CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey))

CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)

CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float)

OLTP database structure (ch17_ex04.sql):

Classical Approches What about the LineItem data? What about the other tables in the database? What about human readability?

32

Using XML for Data Archival

<!ELEMENT Invoice (Customer, LineItem+)><!ATTLIST Invoice invoiceDate CDATA #REQUIRED shipDate CDATA #REQUIRED shipMethod (USPS | UPS | FedEx) #REQUIRED>

<!ELEMENT Customer EMPTY><!ATTLIST Customer Name CDATA #REQUIRED Address CDATA #REQUIRED City CDATA #REQUIRED State CDATA #REQUIRED PostalCode CDATA #REQUIRED>

<!ELEMENT LineItem (Part)><!ATTLIST LineItem Quantity CDATA #REQUIRED Price CDATA #REQUIRED>

<!ELEMENT Part EMPTY><!ATTLIST Part Name CDATA #REQUIRED Size CDATA #REQUIRED Color CDATA #REQUIRED>

<?xml version="1.0"?>

<!DOCTYPE listing SYSTEM "ch27_ex13.dtd" >

<Invoice

invoiceDate="10/17/2000"

shipDate="10/20/2000"

shipMethod="USPS">

<Customer

Name="Homer J. Simpson"

Address="742 Evergreen Terrace"

City="Springfield"

State="KY"

postalCode="12345" />

<LineItem

Quantity="12"

Price="0.10">

<Part Color="Blue"

Size="3-inch"

Name="Grommets" />

</LineItem>

<LineItem

Quantity="12"

Price="0.10">

<Part Color="Blue"

Size="3-inch"

Name="Grommets" />

</LineItem>

</Invoice>

(Ch17_ex13.dtd) (Ch17_ex13.xml)

33

Data Repositories

Classical Approaches

CREATE TABLE Property (

PropertyKey integer PRIMARY KEY IDENTITY,

NumberOfBedrooms tinyint,

HasSwimmingPool bit,

Address varchar(50),

City varchar(30),

State char(2),

PostalCode varchar(10),

SellerName varchar(50),

SellerAgent varchar(50))

CREATE TABLE Property (

PropertyKey integer PRIMARY KEY IDENTITY,

NumberOfBedrooms tinyint,

HasSwimmingPool bit)

CREATE TABLE PropertyDetail (

PropertyKey integer PRIMARY KEY,

Address varchar(50),

City varchar(30),

State char(2),

PostalCode varchar(10),

SellerName varchar(50),

SellerAgent varchar(50))

(Ch17_ex14a.sql) (Ch17_ex14b.sql)

34

Using XML for Data Repositories

There are a number of advantages to using XML for data repositories: Greater flexibility in providers

Faster querying and summarization

More presentation options

Fewer locking concerns

<!ELEMENT Property EMPTY>

<!ATTLIST Property

NumberOfBedrooms CDATA #REQUIRED

HasSwimmingPool CDATA #REQUIRED

Address CDATA #REQUIRED

City CDATA #REQUIRED

State CDATA #REQUIRED

PostalCode CDATA #REQUIRED

SellerName CDATA #REQUIRED

SellerAgent CDATA #REQUIRED>

CREATE TABLE Property (

PropertyKey integer PRIMARY KEY IDENTITY,

NumberOfBedrooms tinyint,

HasSwimmingPool tinyint,

DocumentFile varchar(50))

(Ch17_ex15.dtd) (Ch17_ex15.sql)

35

Summary

How XML may be used to help create a data warehouse

The benefits you can realize by using XML as your archiv

al strategy

How XML can improve the functionality of your data rep

ository