query processing with xml cse 350 – advanced database topics jeffrey r. ellis
TRANSCRIPT
Query Processing with XML
CSE 350 – Advanced Database Topics
Jeffrey R. Ellis
Query Processing Topics
Why? Java and Other Programming Languages XPath/XSLT XQuery (W3C-sponsored Query Language) Current Research
– Other Query Languages– XISS (XML Indexing and Storage System)
FIRST – Distinction between XML and HTML/Web Technologies
XML spotlight is analogous to Java– Immediate benefits applied to World Wide Web– Long-range, more exciting benefits in applications
XML IS NOT AN HTML REPLACEMENT– HTML marks pages up for presentation on the web– XML marks text for semantic information purposes
XML can encode HTML pages, but HTML works well on the Web
XML Data Storage
XML Documents– Data is delineated semantically– Schemas/DTDs control contents of elements– Semi-structured attitude allows flexibility– Text is human-readable and machine-parsable– Open standards work with common tools– File data storage allows for easy sharing– Can queries control access to data?
Traditional Database Storage
Databases– Data is delineated semantically– Schemas control contents of rows– No flexibility from semi-structured storage– Data is not human-readable, but only machine-
parsable– Proprietary standards prevent interoperability– Proprietary storage prevents data sharing– Queries control access to data
XML for Query Processing
If we can get efficient query processing, XML document storage provides many benefits over traditional database storage.
Sample application– Employee database document– XML Schema assumed to exist– Employee information queried as per standard HR
processing
<?xml version="1.0"?><!DOCTYPE employees SYSTEM "employee.xsd"><employees> <emp gender='m'> <name> <last>Bissell</last> <first>Brian</first> </name> <position>IT Specialist</position> <salary>35,000</salary> <location>CT</location> </emp> <emp gender='m'> <name> <last>Pham</last> <first>Hung</first> <mi>Q</mi> </name> <position>Senior IT Specialist</position> <salary>45,000</salary> <location>CT</location> </emp> …</employees>
Tree Structure of XML Document
Remember that XML documents are trees
emp
gender name position salary location
last first mi
Query Processing – Programming Languages
XML Documents are flat files Any language with file I/O can read XML
document Any language with string parsing capabilities
can use XML data Query processing done through language
syntax “Obvious” result different from traditional
databases
Query Processing – Programming Languages
Strategy– Basic File I/O through language– Basic String matching to identify elements– Processing possible, but not necessarily efficient
Languages have gathered XML processing tools in libraries– xerces – Apache library for Java and C++
Two methods for parsing XML data– DOM– SAX
DOM
Document Object Model Defined by W3C for XML, HTML, and
stylesheets Provides an hierarchical, object-view of the
document DOMParser parses through file, then provides
access to nodes Key: Every item in XML document is a node
DOM Example
Node (Element)name=“emp”attribute1child1
Node (Attr)name=“gender”value=“m”parent
Node (Element)name=“name”parentchild1
Node (Element)name=“last”parentchild1
Node (Text)value=“Bissell”parent
SAX
Simple API for XML Defined by XML-DEV mailing list Provides an event-driven processing of the
document XMLReader parses through file and activates
different methods and functions based on the elements retrieved
Key: Methods are defined in interface, implemented in user code
DOM versus SAX
SAX is primarily Java-based; DOM defined for most languages
DOM requires storage of entire document in memory; SAX processes as it reads
DOM mirrors a document that can be revisited; suited for document processing
SAX mirrors object lifecycles; suited for data processing
Query Processing - XPath/XSLT
Standard XML technologies XPath and XSLT provide a ready-made querying infrastructure
XPath identifies the location of various document elements
XSL Stylesheets provide methods for tranforming data from one format to another
Combining XPath and XSLT provides easy generation of result sets based on queries
XPath
Provides element, value, and attribute identification
employees/emp/name/first = “Brian”, “Hung”, “Sara”, “Brian”
//salary = “35,000”, “40,000”, “35,000”, “60,000”
count(/employees/emp) = 4
//mi = “Q”
XSLT
Stylesheet transforms data from one form into another
<xsl:template match=“name”>
<xsl:value-of select=“first”/>
<xsl:value-of select=“last”/>
</xsl:template>
= Brian Bissell, Hung Pham, Sara Menillo, Brian Chicos
Combine XPath and XSLT for Queries
Query: Find the last name and position of each employee named Brian
<xsl:template match='employees'> <xsl:for-each select='emp'> <xsl:if test='name/first="Brian"'> <xsl:value-of select='name/last'/> <xsl:text>:</xsl:text> <xsl:value-of select='position'/> <xsl:text>; </xsl:text> </xsl:if> </xsl:for-each> </xsl:template>
Combine XPath and XSLT for Queries
Query: Find the average salary of all non-managers
<xsl:template match='employees'>
<xsl:variable name='running_sum'>
<xsl:value-of select='sum(emp/salary[../position!="Manager"])'/>
</xsl:variable>
<xsl:variable name='running_count'>
<xsl:value-of select='count(emp[position!="Manager"])'/>
</xsl:variable>
<xsl:value-of select='$running_sum div $running_count'/>
</xsl:template>
Results XSLT/XPath
Many SQL queries can be accomplished– XPath provides element (data) access– XPath provides basic functions (e.g., sum() )– XPath provides WHERE functionality– XSLT provides SELECT functionality– XSLT provides ORDER BY functionality (sort)– XSLT provides result set formatting– UNION functionality provided ..?
Querying with XPath and XSLT
Important questions– Is it sufficient?– Is it efficient?– Is there a better way?
XML community has need to design a full query language
XQuery – Working draft published 7 June 2001
Query Processing - XQuery
XML provides flexibility in representing many kinds of information
Good query language must be likewise flexible– Pre-XQuery languages are good for specific types
of data
Goal: “[S]mall, easily implementable language in which queries are concise and easily understood.”
XQuery Forms
1. Path expressions
2. Element constructors
3. FLWR expressions
4. Operator/Function expressions
5. Conditional expressions
6. Quantified expressions
7. Data Type expressions
XQuery – Path Expressions
Contribution of XPath XQuery 1.0 and XPath 2.0 Data Model
document(“sample1.xml”)//emp/salary
/employees/emp/name[../@gender=‘f’]
//emp[1 TO 3]/name/first
XQuery – Element Constructors
Queries can generate new elements Similar to XSLT abilities
<worker>
{$name/last}
{$position}
</worker>
XQuery – FLWR Expressions
For clause/Let clause/Where clause/Return Similar to SQL
FOR $e IN document(“sample1.xml”)//emp
WHERE $e/salary > 38000
AND $e/@gender = ‘f’
RETURN $e/name
XQuery – Operator/Function Expressions
Pre-defined and user-defined operators and functions
Still under development: Union, Intersect, Except
FOR $e IN //employees/emp
WHERE not(empty($e//mi))
RETURN $e/name
XQuery – Conditional Expressions
If-then-else expressions are not yet limited to boolean (ongoing discussion)
FOR $e IN /employees/empRETURN<worker> {$name} IF ($e/position=“Manager”) THEN <manager /></worker>
Quanitifed Expressions
Some/Every conditions Some/Every evaluates to True or False
FOR $e IN //employees
WHERE SOME $p IN $e//emp/position = “Manager”
RETURN $e
Data Types
Data Types based on those available from XML Schema
Data types can be literal (“Brian”), from constructor functions (date(“2001-10-11”) ), or from casting ( CAST AS xsd:integer(24) )
User-defined data types are also allowable and parsable
XQuery
More choices than XSLT/XPath combination Work in progress Current W3C efforts into query language Influencing the future design of the core XML
technologies (XPath) Hopes to be fully flexible for all future XML
applications
Query Processing – Research
XQuery specification continues to undergo review and change– 6 of 7 specification documents released since June– All specifications released in 2001
Other avenues of research– Other Query languages– Indexing strategies– Implementation
Query Processing – Other Query Languages
Many query languages exist– Quilt (basis for XQuery)– W3C early languages (XML-QL, XQL)– Adopted traditional languages (OQL, XSQL)– Research papers (XML-GL, YATL, Lorel)
Other query languages often optimized for a particular subset of XML documents
Query language field *MAY* be standardizing to XQuery
Query Processing – Indexing Strategy
Query language less important; better indexing techniques lead to efficiency
XISS (XML Indexing and Storage System)– September 19, 2001 publishing– Builds sets of indexes on XML data elements and
attributes on initial parse of XML document– Lookup becomes constant-time through the various
built indexes– Demonstrated successes in test runs
Query Processing - Implementation
XML is currently in state of flux– Standards are still being revised– Industry cautious before embracing a new
technology– Economic slowdown may prevent new research and
development efforts
XML still waiting for its “Killer App”, application that forces immediate acceptance
XML Query Processing
XML is a functional database storage language Efficient query language needed to turn XML
into a viable database Query language solutions are being developed
– Java/C++ hooks first developed – OK– XSLT/XPath implemented – GOOD– XQuery being designed – GREAT?– Future additions – ????