sept 1, 2005
DESCRIPTION
Native XML Support in DB2 Universal Database Matthias Nicola, Bert van der Linden IBM Silicon Valley Lab [email protected]. Sept 1, 2005. Agenda. Why “native XML” and what does it mean? Native XML in the forthcoming version of DB2 Native XML Storage XML Schema Support XML Indexes - PowerPoint PPT PresentationTRANSCRIPT
IBM Software Group
®
Native XML Support in DB2 Universal Database
Matthias Nicola, Bert van der LindenIBM Silicon Valley [email protected]
Sept 1, 2005
IBM Software Group | DB2 Information Management Software
2
Agenda
Why “native XML” and what does it mean?
Native XML in the forthcoming version of DB2Native XML StorageXML Schema SupportXML IndexesXQuery, and the Integration with SQL
Summary
IBM Software Group | DB2 Information Management Software
3
XML Databases
XML-enabled DatabasesThe core data model is not XML (but e.g. relational)Mapping between XML data model and DB’s data
model is required, or XML is stored as textE.g.: DB2 XML Extender (V7, V8)
Native XML DatabasesUse the hierarchical XML data model to store and
process XML internallyNo mapping, no storage as textStorage format = processing formatE.g.: Forthcoming version of DB2
IBM Software Group | DB2 Information Management Software
4
Problems of XML-enabled Databases
CLOB storage:Query evaluation & sub-document level access requires
costly XML Parsing – too slow !
Shredding:Mapping from XML to relational often too complexOften requires dozens or hundreds of tablesComplex multi-way joins to reconstruct documentsXML schema changes break the mapping
• no schema flexibility !
• For example: Change element from single- to multi-occurrence requires normalization of relational schema & data
IBM Software Group | DB2 Information Management Software
5
Integration of XML & Relational Capabilities in DB2
SERVER
CLIENT SQL/X
XQueryXMLInterface
RelationalInterface Relational
XML
DB2 Storage:
DB2 Client /Customer Client Application
Native XML data type • (not Varchar, not CLOB, not object-relational)
XML Capabilities in all DB2 componentsApplications combine XML & relational data
HybridDB2 Engine
IBM Software Group | DB2 Information Management Software
6
Native XML Storage DB2 stores XML in parsed hierarchical format
(similar to the DOM representation of the XML infoset)
create table dept (deptID char(8),…, deptdoc xml);
Relational columnsare stored in relationalformat (tables)
XML is stored nativelyas type-annotated trees(representing theXQuery Data Model).
deptID … deptdoc
“PR27” … <dept> … <emp>…</emp></dept>
… … …
DB2 Storage
IBM Software Group | DB2 Information Management Software
7
dept
name
employee
phoneid=901
John Doe
office
408-555-1212 344
name
employee
phoneid=902
Peter Pan
office
408-555-9918 216
0
1
4
25=901
John Doe
3
408-555-1212 344
1
4
24=902
Peter Pan
3
408-555-9918 216
<dept> <employee id=901> <name>John Doe</name>
<phone>408 555 1212</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
Efficient Document Tree Storage
0 dept4 employee1 name5 id2 phone3 office
String tableSYSIBM.SYSXMLSTRINGS
1 String table per database Database wide dictionary for all
tags in all XML columns
Reduces storage Fast comparisons &
navigation
IBM Software Group | DB2 Information Management Software
8
Information for Every Node
Tag name, encoded as unique StringID
A nodeID
Node kind (e.g. element, attribute, etc.)
Namespace / Namespace prefix
Type annotation
Pointer to parent
Array of child pointers Hints to the kind & name of child nodes
(for early-out navigation)
For text/attribute nodes: the data itself
IBM Software Group | DB2 Information Management Software
9
XML Node Storage Layout
Node hierarchy of an XML document stored on DB2 pages Documents that don’t fit on 1 page: split into regions/pages Docs < 1 page: 1 region, multiple docs/regions per page
Example:
Document split into 3 regions, stored on 3 pages
Split can be at anylevel of the document
IBM Software Group | DB2 Information Management Software
10
XML Storage: “Regions Index”
maps nodeIDs to regions & pages
allows to fetch required regions instead of full documents
allows intelligent prefetching
page page page
Regions index
not user defined, default component of XML storage layer
IBM Software Group | DB2 Information Management Software
11
XML Schema Support & Validation in DB2
Use of XML Schemas is optional, on a per-document basis
No need for a fixed schema per XML column
Validation per document (i.e. per row)
Zero, one, or many schemas per XML column For example: different versions of the same schema,
or schemas with conflicting definitions
Mix validated & non-validated documents in 1 XML column
Schemas are registered & stored in the DB2 Schema Repository (XSR) for fast and stable access.
IBM Software Group | DB2 Information Management Software
12
Validation using Schemas
Validate XML from a parameter marker using xsi:schemaLocation:
insert into dept(deptdoc) values (xmlvalidate(?))
Identify schema for a given document: select deptid, xmlxsrobjectid(deptdoc) from dept where deptid = “PR27”
Override schema location by referencing a schema ID or URI:
insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema id departments.deptschema)
insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema uri 'http://my.dept.com’)
IBM Software Group | DB2 Information Management Software
13
XML Indexes for High Query Performance
Define 0, 1 or multiple XML Value Indexes per XML column
XML index maps: (pathID, value) (nodeID, rowID)
Index any elements or attributes, incl. mixed content
Index definition uses an XML pattern to specify which elements/attributes to index (and which not to)
Can index all elements/attributes, but not forced to do so
Can index repeating elements 0 , 1 or multiple index entries per document
New XML-specific join and query evaluation methods, evaluate multiple predicates concurrently with minimal index I/O
xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)
IBM Software Group | DB2 Information Management Software
14
XML Indexing: Examplescreate table dept(deptID char(8) primary key, deptdoc xml);
create index idx1 on dept(deptdoc) generate keyusing xmlpattern '/dept/@bldg' as sql double;
create unique index idx2 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;
create index idx3 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/name' as sql varchar(35);
<dept bldg=101> <employee id=901> <name>John Doe</name>
<phone>408 555 1212</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
…xmlpattern '//name' as sql varchar(35); (Index on ALL “name” elements)…xmlpattern '//@*' as sql double; (Index on ALL numeric attributes)…xmlpattern '//text()‘ as sql varchar(hashed); (Index on ALL text nodes, hash code)…xmlpattern ‘/dept//name' as sql varchar(35);
…xmlpattern '/dept/employee//text()' as sql varchar(128); (All text nodes under employee)
…xmlpattern 'declare namespace m="http://www.myself.com/"; /m:dept/m:employee/m:name‘ as
sql varchar(45);
IBM Software Group | DB2 Information Management Software
15
Querying XML Data in DB2
The following options are supported:
XQuery/XPath as a stand-alone language
SQL embedded in XQuery
XQuery/XPath embedded in SQL/XML
Plain SQL for full-document retrieval
Compiler/Optimizer Details: Beyer et al. “System RX”, SIGMOD 2005 [2]
IBM Software Group | DB2 Information Management Software
16
Example: XQuery as a stand-alone Language in DB2
create table dept(deptID char(8) primary key, deptdoc xml);
for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/deptlet $emp := $d//employee/namewhere $d/@bldg = > 95 order by $d/@bldgreturn <EmpList> {$d/@bldg, $emp} </EmpList>
db2-fn:xmlcolumn returns the sequence ofall documents in the specified XML column
<dept bldg=101> <employee id=901> <name>John Doe</name>
<phone>408 555 1212</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
IBM Software Group | DB2 Information Management Software
17
Examples: SQL embedded in XQuery create table dept(deptID char(8) primary key, deptdoc xml);
Identify XML data by a SELECT statementLeverage predicates/indexes on relational columns
for $d in db2-fn:sqlquery('select deptdoc from dept (single document) where deptID = “PR27” ')…
for $d in db2-fn:sqlquery('select deptdoc from dept (some documents) where deptID LIKE “PR%” ')…
for $d in db2-fn:sqlquery('select dept.deptdoc from dept, unit (some documents) where dept.deptID=unit.ID and unit.headcount > 200’)…..
for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept , $e in db2-fn:sqlquery('select xmlforest(name, desc)
from unit u’)…
(constructed documents)(join & combine XML and relational data)
IBM Software Group | DB2 Information Management Software
18
Example: XQuery embedded in SQL/XML
SQL/XML Standard Functions: xmlexists, xmlquery, xmltable
create table dept(deptID char(8) primary key, deptdoc xml);
select deptID, xmlquery(‘for $i in $d/dept
let $j := $i//name return $j’ passing deptdoc as “d”) from deptwhere deptID LIKE “PR%”
and xmlexists(‘$d/dept[@bdlg = 101]’ passing deptdoc as “d“)
IBM Software Group | DB2 Information Management Software
19
Other Features in DB2 native XML
XML Text Search SupportXML Import/ExportXML Type in Stored ProceduresAPI Extensions (JDBC, CLI, .NET, etc.)XML Schema RepositoryFull SQL/XML supportVisual XQuery BuilderAnnotated schema shredding…and more
IBM Software Group | DB2 Information Management Software
20
Summary
CLOB and shredded XML storage restrict performance and flexibility
New native XML support in DB2:
Better Performance through Hierarchical & parsed XML representation at all layers Path-specific XML Indexing New XML join and query methods
Higher Flexibility through: Integration of SQL and XQuery Schemas are optional, per document, not per column Zero, one, or many XML schemas per XML column
IBM Software Group | DB2 Information Management Software
22
BACKUP CHARTS
IBM Software Group | DB2 Information Management Software
23
Shredding: A simple case
<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT> Department
DEPTID DEPTNAME15 Sales
EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750
IBM Software Group | DB2 Information Management Software
24
Shredding: A schema change…“Employees are now allowed to have multiple phone numbers…”
<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <PHONE>415-010-1234</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT>
PhoneEMPNO PHONE
27 406-463-123410 415-010-1234
10 408-463-4963
Requires:• Normalization of existing data !• Modification of the mapping• Change of applications
Costly!
DepartmentDEPTID DEPTNAME
15 Sales
EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750
IBM Software Group | DB2 Information Management Software
25
Native XML in DB2: Key Themes
Standards compliant + driving the standards XML, XQuery, SQL/XML, XML Schema…
Flexibility, because that is what XML is all about.. Zero, one, or many XML schemas per XML column
Native (hierarchical) Storage & Sophisticated XML Indexes New “pivot join” to evaluate many predicates concurrently
Integrated in DB2 Leveraging scalability, reliability, performance, availability…
Integrated with application APIs: JDBC, ODBC, .NET, embedded SQL, CLI,…
Integrated with SQL Access relational data and XML data in same statement
IBM Software Group | DB2 Information Management Software
26
XML Index DDL
create index idx1 on T(xmlcol) generate key using xmlpattern '/a/b/@c' as sql date;
Declaration & use of namespace prefix supported (not shown above)
AS SQL VARCHAR (integer)
CREATE index-name
ON table-name (xml-column-name) GENERATE KEY USING xmlpattern
UNIQUE
text()
@attribute-tag
@*
///
element-tag
*///
INDEX
DOUBLE
DATETIMESTAMP
VARCHAR (HASHED)
xmlpattern:
xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)