chapter 17 data warehousing, archival, and repositories
DESCRIPTION
Chapter 17 Data Warehousing, Archival, and Repositories. 데이터베이스연구실 석사 3 학기 박보영. Introduction. Common applications of XML in the enterprise Large-scale transaction system Content management system Persistence of information using XML - PowerPoint PPT PresentationTRANSCRIPT
2
Introduction
Common applications of XML in the enterprise
Large-scale transaction system
Content management system
Persistence of information using XML
Traditional strategies for these processes
How adding XML to the mix
XML improve
3
In this chapter Data warehousing
How data warehousing work How XML may be used to streamline the transaction and warehousing process
Data archival For most system
Data loses its relevance over time Moved off the system when some threshold is reached
Typically want to set this data aside in some form so that, should we need to refer to it later
Can retrieve the data with the minimum of effort How XML may be used to make this process easier
Data repositories How XML may be used to persist detail-only information in a way that makes the
relational database perform better, while taking advantage of XML technologies(such as XSLT) to present the detail information when necessary
“All the examples in this chapter are designed to work with SQL Server 6.x+, ADO 2.5+, VBScript(IE 4.0+)”
4
Data Warehousing
One of the problems often faced by data architects is to design a system
Detail information or transactionsquick retrieval
Query and summarize that dataeasily and efficiently
In this section Define data warehousing
Discuss the concepts that drive data warehouse design
Take a look XML can facilitate the data warehousing process
Look at examples of XML in use in a data warehouse
5
The Two Roles of Data
In an enterprise-level data solution, the data in the database plays two roles:
• Information gathering • Querying and summarization of data
Detailed information gathering- The first use of the database is to gather data from external sources(such as other databases, X
ML, or simple delimited text files)
- There are some things that remain consistent across all implementations that gather data:
Detail oriented
Write-heavy
Transacted
Space-conscious
Heavily normalized
6
Infromation Querying and Summarization
– The other use of database is to provide the ability to query the data and
summarize it to extrapolate trends, volumes, and other useful information from
the details.
– The specific mechanisms will vary from implementation to implementation, but
there are some constants:
Summary-oriented
Read-only
Results-conscious
Less normalized
7
The Traditional Solution- The traditional approach to designing a relational database to support a platform is to design one
database to perform both data acquisition function and querying/summarization functions
CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))
CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))
INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')
CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey))
CREATE INDEX ix_Invoice_invoiceDate ON Invoice (invoiceDate)CREATE INDEX ix_Invoice_shipDate ON Invoice (shipDate)CREATE INDEX ix_Invoice_CustomerKey ON Invoice (CustomerKey)CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)
CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float)
CREATE INDEX ix_LineItem_PartKey ON LineItem (PartKey)CREATE INDEX ix_LineItem_InvoiceKey ON LineItem (InvoiceKey)
(ch17_ex01.sql)
8
- When run, this script creates the following table structure:
Customer
Part
LineItem
Invoice
ShipMethod
This set of tables supports the addition of invoice
Space is conserved for these tables where possible by normalizing out Part and Customer
This normalization also facilitates querying and reporting on Part and Customer
however, there are a couple of issues with using this type of dual-purpose structure
the system is well tuned to either data entry or reporting
serious danger of wait states for reporting and querying due to the locking being performed by the data entry
9
Many databases try to get around the problem by adding structures that will be used to drive reporting
In this table
- separate from the transaction tables
- have all the appropriate indices to allow data to be queried and summarized from those tables in a
ny way the business rules might call for
for example, add a MonthlyPartTotal table (ch17_ex02.sql):
this is a step on the right direction
another problems
CREATE TABLE MonthlyPartTotal (
summaryMonth tinyint,
summaryYear smallint,
PartKey integer
CONSTRAINT FK_MPT_PartKey FOREIGN KEY (PartKey)
REFERENCES Part (PartKey),
Quantity integer)
10
CREATE TRIGGER UpdateMPT
ON LineItem
FOR INSERT, UPDATE
AS
BEGIN
IF (SELECT COUNT(*)
FROM MonthlyPartTotal, inserted, Invoice
WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate)
AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate)
AND inserted.InvoiceKey = Invoice.InvoiceKey
AND MonthlyPartTotal.PartKey = inserted.PartKey) > 0
UPDATE MonthlyPartTotal
SET Quantity = MonthlyPartTotal.Quantity + inserted.Quantity
FROM inserted, Invoice
WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate)
AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate)
AND inserted.InvoiceKey = Invoice.InvoiceKey
AND MonthlyPartTotal.PartKey = inserted.PartKey
ELSE
INSERT MonthlyPartTotal (summaryMonth, summaryYear, PartKey, Quantity)
SELECT DATEPART(mm, Invoice.invoiceDate),
DATEPART(yyyy, Invoice.invoiceDate),
inserted.PartKey,
inserted.Quantity
FROM inserted, Invoice
WHERE inserted.InvoiceKey = Invoice.InvoiceKey
IF (SELECT COUNT(*) FROM deleted) > 0
UPDATE MonthlyPartTotal
SET Quantity = MonthlyPartTotal.Quantity - deleted.Quantity
FROM deleted, Invoice
WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate)
AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate)
AND deleted.InvoiceKey = Invoice.InvoiceKey
AND MonthlyPartTotal.PartKey = deleted.PartKey
END
(Ch17_03.sql)
11
The Data Warehousing Solution
On-Line Transaction Processing(OLTP) databases – the
information gatherers
On Line Analytical Processing(OLAP) databases – the query and
summarization handlers
Parts that make up an OLAP database
The role of XML in improving the function of OLAP databases
12
On-Line Transaction Processing(OLTP)
The gathering of detail information is often referred to as OLTP
Handle all data gathering processes
Design tables to support the acquisition of transactional data
The database is an normalized ad possible, with the spedific table or ta
bles being kept as small as possible to reduce insert time and disk cons
umption
Data archival strategies - In place to make OLTP database does not grow t
oo large
13
Customer
Part
LineItem
Invoice
ShipMethod
CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))
CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))
INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')
CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer
CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey))
CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)
CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float)
OLTP database structure (ch17_ex04.sql):
14
On-Line Analytical processing(OLAP)
Database intended to support querying and summarization
Designed with one goal in mind – the querying and summarization of
detail data by any number of specifics defined in the business rules for
our system
Create our tables to support the querying and retrieval of this
information, with the ability to rapidly insert information(lower priority)
Indexing technologies designed specifically for OLAP querying
optimize(query and report)
15
Parts of an OLAP Database Fact Tables-where the information we wish to report on is stored
Measure Tables-where you store the measures used to do the reporting
Schema-where the two types of table above interact to give you your reports
Fact Tables
Contain the data that we are planning to report on, at the lowest level of granul
arity we will need to access
Denormalizationall facts together into oe table
CREATE TABLE factInvoicePart (
InvoiceKey integer PRIMARY KEY IDENTITY,
CustomerKey integer,
ShipDateKey integer,
ShipMethodKey integer,
PartKey integer,
Quantity integer,
Price float)
CREATE TABLE factDailyTotal (
DailyTotalKey integer PRIMARY KEY IDENTITY,
InvoiceDate integer,
partKey integer,
partCount integer,
partUnitPrice float)
Fact table creation script for the inventory control team(ch17-ex05.sql)
Fact table creation script for the executive team(ch17-ex06.sql)
16
Measure/Dimension Tables
Parameters(use in the WHERE clause in query)
CREATE TABLE measureCustomer (
CustomerKey integer PRIMARY KEY,
Name varchar(50),
Address varchar(50),
City varchar(30),
State char(2),
PostalCode varchar(10))
CREATE TABLE measureShipDate (
ShipDateKey integer PRIMARY KEY,
Month tinyint,
Day tinyint,
Year smallint)
CREATE TABLE measureShipMethod (
shipMethodKey integer PRIMARY KEY,
shipMethod varchar(5))
CREATE TABLE measurePart (
PartKey integer PRIMARY KEY,
name varchar(20),
size varchar(10) NULL,
color varchar(10) NULL)
(Ch17_ex07.sql)
17
Schema
Composed of the tables in our database
Joined together by the foreign keys that relate the individual tables
together
Two types of schema that are normally used when designing an OLAP
database:
Star schema
Snowflake schema
18
CONSTRAINT fk_fact_ShipMethod FOREIGN KEY (ShipMethodKey) REFERENCES measureShipMethod (shipMethodKey), PartKey integer, CONSTRAINT fk_fact_Part FOREIGN KEY (PartKey) REFERENCES measurePart (PartKey), Quantity integer, Price float)
CREATE INDEX ix_fact_Customer ON factInvoicePart (CustomerKey)CREATE INDEX ix_fact_ShipDate ON factInvoicePart (ShipDateKey)CREATE INDEX ix_fact_ShipMethod ON factInvoicePart (shipMethodKey)CREATE INDEX ix_fact_Part ON factInvoicePart (PartKey)
Star schema(ch17_ex08.sql)CREATE TABLE measureCustomer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))
CREATE TABLE measureShipDate ( ShipDateKey integer PRIMARY KEY, ShipMonth tinyint, ShipDay tinyint, ShipYear smallint) CREATE TABLE measureShipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))
CREATE TABLE measurePart ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)
CREATE TABLE factInvoicePart ( InvoiceKey integer PRIMARY KEY IDENTITY, CustomerKey integer CONSTRAINT fk_fact_Customer FOREIGN KEY (CustomerKey) REFERENCES measureCustomer (CustomerKey), ShipDateKey integer CONSTRAINT fk_fact_ShipDate FOREIGN KEY (ShipDateKey) REFERENCES measureShipDate (ShipDateKey), ShipMethodKey integer,
measureCustomer
factInvoicePart
measureShipDate
measureShipMethod
measurePart
This script creates the structure seen below:
19
Snowflake schema(ch17_ex09.sql)CREATE TABLE measureCustomer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))
CREATE TABLE measureShipYear ( ShipYearKey integer PRIMARY KEY, ShipYear smallint)
CREATE TABLE measureShipMonth ( ShipMonthKey integer PRIMARY KEY, ShipMonth tinyint, ShipYearKey integer CONSTRAINT fk_measure_ShipYear FOREIGN KEY (ShipYearKey) REFERENCES measureShipYear (ShipYearKey))
CREATE TABLE measureShipDate ( ShipDateKey integer PRIMARY KEY, ShipDay tinyint, ShipMonthKey integer CONSTRAINT fk_measure_ShipMonth FOREIGN KEY (ShipMonthKey) REFERENCES measureShipMonth (ShipMonthKey))
CREATE TABLE measureShipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))
CREATE TABLE factInvoicePart ( InvoiceKey integer PRIMARY KEY IDENTITY, CustomerKey integer CONSTRAINT fk_fact_Customer FOREIGN KEY (CustomerKey) REFERENCES measureCustomer (CustomerKey), ShipDateKey integer CONSTRAINT fk_fact_ShipDate FOREIGN KEY (ShipDateKey) REFERENCES measureShipDate (ShipDateKey), ShipMethodKey integer, CONSTRAINT fk_fact_ShipMethod FOREIGN KEY (ShipMethodKey) REFERENCES measureShipMethod (shipMethodKey), PartKey integer, CONSTRAINT fk_fact_Part FOREIGN KEY (PartKey) REFERENCES measurePart (PartKey), Quantity integer, Price float)
CREATE INDEX ix_fact_Customer ON factInvoicePart (CustomerKey)CREATE INDEX ix_fact_ShipDate ON factInvoicePart (ShipDateKey)CREATE INDEX ix_fact_ShipMethod ON factInvoicePart (shipMethodKey)CREATE INDEX ix_fact_Part ON factInvoicePart (PartKey)CREATE INDEX ix_measure_Month ON measureShipDate (shipMonthKey)CREATE INDEX ix_measure_Year ON measureShipMonth (shipYearKey)
CREATE TABLE measurePart ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)
20
measureCustomer
factInvoicePart
measureShipDate
measureShipMethod
measurePart
measureShipMonth
measureShipYear
This SQL script creates the following set of tables:
21
Cubes• To better facilitate the query and retrieval of OLAP data – each star schema or snowflake
schema creates a cube(the detail information is aggregated along each of the measures)
• Example-star schema
measure Customer
factInvoicePart
measureShipDate
measureShipMethod
measurePart
FedEx UPS USPS
ALL invoices 72,102 15,209 12,689
FedEx UPS USPS
Customer 1
22,615 5,192 3,972
Customer 2
12,541 2,188 918
Customer 3
7,342 3,182 1,212
Customer 4
15,320 716 3,132
Customer 5
14,284 3,931 3,455
factInvoicePart table-100,000 invoice(shipping method)
Add another dimension-customer
22
Building your Fact Tables
Two step:
Pulling the data out of the OLTP database
updating the OLAP database with that data
In order to perform rapidly update
Native export forms are often used (such as BCP files for SQL
Server)
Loaded into the OLAP database
Some problems
The most notable of which is the relative fragility of the export
and import code
If the OLTP or OLAP database change, this code will need to be
revisited
23
The Role of XML
Make it easier to update the OLAP database from the OLTP database
Effective way to handle OLTP data
Using XML for OLAP Update Data
Using XML for OLTP Data
Using XML for OLAP Update Data- rather than building specialized import routines for all the different
formats, if you build one XML importer and than building leverage
the data provider’s ability to export in XML
- Can save a lot of aggravation and coding time
24
<?xml version="1.0"?><!DOCTYPE listing SYSTEM "ch27_ex09.dtd" >
<InvoiceBulk> <Invoice CustomerKey="17" ShipDate="10/17/2000" ShipMethodKey="1"> <Part PartKey="33" Quantity="17" Price="0.20" /> <Part PartKey="29" Quantity="13" Price="0.15" /> </Invoice> <Invoice CustomerKey="12" ShipDate="10/11/2000" ShipMethodKey="2"> <Part PartKey="31" Quantity="19" Price="0.10" /> <Part PartKey="29" Quantity="17" Price="0.15" /> </Invoice></InvoiceBulk>
<!ELEMENT InvoiceBulk (Invoice+)>
<!ELEMENT Invoice (LineItem+)>
<!ATTLIST Invoice
CustomerKey CDATA #REQUIRED
ShipDate CDATA #REQUIRED
ShipMethodKey CDATA #REQUIRED>
<!ELEMENT Part EMPTY>
<!ATTLIST Part
PartKey CDATA #REQUIRED
Quantity CDATA #REQUIRED
Price CDATA #REQUIRED>
(Ch17_ex09.dtd) (Ch17_ex09.xml)
25
Function using XSLT(ch17_ex10.xsl):
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" />
<xsl:template match="/">
<xsl:for-each select="InvoiceBulk/Invoice/Part">
<xsl:text>INSERT factInvoicePart (
CustomerKey,
ShipDateKey,
ShipMethodKey,
PartKey,
Quantity,
Price)
SELECT </xsl:text>
<xsl:value-of select="../@CustomerKey" /><xsl:text>,
</xsl:text>
<xsl:text>measureShipDate.ShipDateKey,
</xsl:text>
<xsl:value-of select="../@ShipMethodKey" /><xsl:text>,
</xsl:text>
<xsl:value-of select="@PartKey" /><xsl:text>,
</xsl:text>
<xsl:value-of select="@Quantity" /><xsl:text>,
</xsl:text>
<xsl:value-of select="@Price" />
<xsl:text>
FROM measureShipDate</xsl:text>
<xsl:text>
WHERE DATEPART(mm, '</xsl:text>
<xsl:value-of select="../@ShipDate" />
<xsl:text>')=measureShipDate.shipMonth
AND DATEPART(dd, '</xsl:text>
<xsl:value-of select="../@ShipDate" />
<xsl:text>')=measureShipDate.shipDay
AND DATEPART(yyyy, '</xsl:text>
<xsl:value-of select="../@ShipDate" />
<xsl:text>')=measureShipDate.shipYear</xsl:text>
<xsl:text>
GO

</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
26
ch17_ex10.xsl -> Ch17_ex09.xml => ch17_10a.sql * 참고 < http://www.jclark.com/xml/xt.html >
INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)
SELECT 17, measureShipDate.ShipDateKey, 1, 33, 17, 0.20 FROM measureShipDate WHERE DATEPART(mm, '10/17/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/17/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/17/2000')=measureShipDate.shipYearGO
INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)
SELECT 17, measureShipDate.ShipDateKey, 1, 29, 13, 0.15 FROM measureShipDate WHERE DATEPART(mm, '10/17/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/17/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/17/2000')=measureShipDate.shipYearGO
INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)
SELECT 12, measureShipDate.ShipDateKey, 2, 31, 19, 0.10 FROM measureShipDate WHERE DATEPART(mm, '10/11/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/11/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/11/2000')=measureShipDate.shipYearGO
INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price)
SELECT 12, measureShipDate.ShipDateKey, 2, 29, 17, 0.15 FROM measureShipDate WHERE DATEPART(mm, '10/11/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/11/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/11/2000')=measureShipDate.shipYearGO
27
(Ch17_ex10b.sql):
INSERT measureCustomer (
CustomerKey,
[Name],
Address,
City,
State,
PostalCode)
VALUES (
12,
'Homer J. Simpson',
'742 Evergreen Terrace',
'Springfield',
'KY',
'12345')
GO
INSERT measureCustomer (
CustomerKey,
[Name],
Address,
City,
State,
PostalCode)
VALUES (
17,
'Kevin B. Williams',
'744 Evergreen Terrace',
'Springfield',
'KY',
'12345')
GO
INSERT measureShipMethod (
shipMethodKey,
shipMethod)
VALUES (
1,
'Fedex')
GO
INSERT measureShipMethod (
shipMethodKey,
shipMethod)
VALUES (
2,
'USPS')
GO
INSERT measureShipMethod (
shipMethodKey,
shipMethod)
VALUES (
3,
'UPS')
GO
INSERT measurePart (
PartKey,
[name],
[size],
color)
VALUES (
31,
'grommets',
'3 in.',
'blue')
GO
INSERT measurePart (
PartKey,
[name],
[size],
color)
VALUES (
29,
'sprockets',
'2 in.',
'silver')
GO
INSERT measurePart (
PartKey,
[name],
[size],
color)
VALUES (
33,
'brackets',
'1 in.',
'red')
GO
INSERT measureShipDate (
ShipDateKey,
ShipMonth,
ShipDay,
ShipYear)
VALUES (
1,
10,
11,
2000
)
GO
INSERT measureShipDate (
ShipDateKey,
ShipMonth,
ShipDay,
ShipYear)
VALUES (
2,
10,
17,
2000
)
GO
28
Result table structure Run ch17_ex08.sql against a sample database to create the tables in the firdt place Transformed ch17_ex10.xml with ch17_ex10.xsl to produce the output script ch17_ex1
0a.sql Run ch17_ex10b.sql to populate tables with initial data Run ch17_ex10a.sql to add data to populate table factInvoicePart with data
measure Customer
factInvoicePart
measureShipDate
measureShipMethod
measurePart
29
Using XML for OLAP Data
OLTP systems typically access one discrete transaction at a time
Locking can be avoid
XML technology can be easily leveraged
30
RDBMS structure (ch17_ex11.sql):
CREATE TABLE Customer (
CustomerKey integer PRIMARY KEY,
Name varchar(50),
Address varchar(50),
City varchar(30),
State char(2),
PostalCode varchar(10))
CREATE TABLE shipMethod (
shipMethodKey integer PRIMARY KEY,
shipMethod varchar(5))
INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')
INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')
INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')
CREATE TABLE Part (
PartKey integer PRIMARY KEY,
name varchar(20),
size varchar(10) NULL,
color varchar(10) NULL)
<!ELEMENT Invoice (LineItem+)><!ATTLIST Invoice CustomerKey CDATA #REQUIRED ShipDate CDATA #REQUIRED ShipMethodKey CDATA #REQUIRED>
<!ELEMENT Part EMPTY><!ATTLIST Part PartKey CDATA #REQUIRED Quantity CDATA #REQUIRED Price CDATA #REQUIRED>
<?xml version="1.0"?><!DOCTYPE listing SYSTEM "ch27_ex12.dtd" >
<InvoiceBulk> <Invoice CustomerKey="17" ShipDate="10/17/2000" ShipMethodKey="1"> <Part PartKey="33" Quantity="17" Price="0.20" /> <Part PartKey="29" Quantity="13" Price="0.15" /> </Invoice></InvoiceBulk>
(ch17_ex12.dtd):
(ch17_ex12.xml):
31
Data Archival
CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10))
CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5))
INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS')INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS')
CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer
CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey))
CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)
CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float)
OLTP database structure (ch17_ex04.sql):
Classical Approches What about the LineItem data? What about the other tables in the database? What about human readability?
32
Using XML for Data Archival
<!ELEMENT Invoice (Customer, LineItem+)><!ATTLIST Invoice invoiceDate CDATA #REQUIRED shipDate CDATA #REQUIRED shipMethod (USPS | UPS | FedEx) #REQUIRED>
<!ELEMENT Customer EMPTY><!ATTLIST Customer Name CDATA #REQUIRED Address CDATA #REQUIRED City CDATA #REQUIRED State CDATA #REQUIRED PostalCode CDATA #REQUIRED>
<!ELEMENT LineItem (Part)><!ATTLIST LineItem Quantity CDATA #REQUIRED Price CDATA #REQUIRED>
<!ELEMENT Part EMPTY><!ATTLIST Part Name CDATA #REQUIRED Size CDATA #REQUIRED Color CDATA #REQUIRED>
<?xml version="1.0"?>
<!DOCTYPE listing SYSTEM "ch27_ex13.dtd" >
<Invoice
invoiceDate="10/17/2000"
shipDate="10/20/2000"
shipMethod="USPS">
<Customer
Name="Homer J. Simpson"
Address="742 Evergreen Terrace"
City="Springfield"
State="KY"
postalCode="12345" />
<LineItem
Quantity="12"
Price="0.10">
<Part Color="Blue"
Size="3-inch"
Name="Grommets" />
</LineItem>
<LineItem
Quantity="12"
Price="0.10">
<Part Color="Blue"
Size="3-inch"
Name="Grommets" />
</LineItem>
</Invoice>
(Ch17_ex13.dtd) (Ch17_ex13.xml)
33
Data Repositories
Classical Approaches
CREATE TABLE Property (
PropertyKey integer PRIMARY KEY IDENTITY,
NumberOfBedrooms tinyint,
HasSwimmingPool bit,
Address varchar(50),
City varchar(30),
State char(2),
PostalCode varchar(10),
SellerName varchar(50),
SellerAgent varchar(50))
CREATE TABLE Property (
PropertyKey integer PRIMARY KEY IDENTITY,
NumberOfBedrooms tinyint,
HasSwimmingPool bit)
CREATE TABLE PropertyDetail (
PropertyKey integer PRIMARY KEY,
Address varchar(50),
City varchar(30),
State char(2),
PostalCode varchar(10),
SellerName varchar(50),
SellerAgent varchar(50))
(Ch17_ex14a.sql) (Ch17_ex14b.sql)
34
Using XML for Data Repositories
There are a number of advantages to using XML for data repositories: Greater flexibility in providers
Faster querying and summarization
More presentation options
Fewer locking concerns
<!ELEMENT Property EMPTY>
<!ATTLIST Property
NumberOfBedrooms CDATA #REQUIRED
HasSwimmingPool CDATA #REQUIRED
Address CDATA #REQUIRED
City CDATA #REQUIRED
State CDATA #REQUIRED
PostalCode CDATA #REQUIRED
SellerName CDATA #REQUIRED
SellerAgent CDATA #REQUIRED>
CREATE TABLE Property (
PropertyKey integer PRIMARY KEY IDENTITY,
NumberOfBedrooms tinyint,
HasSwimmingPool tinyint,
DocumentFile varchar(50))
(Ch17_ex15.dtd) (Ch17_ex15.sql)