query processing with xml cse 350 – advanced database topics jeffrey r. ellis

36
Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Upload: noreen-matthews

Post on 11-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing with XML

CSE 350 – Advanced Database Topics

Jeffrey R. Ellis

Page 2: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing Topics

Why? Java and Other Programming Languages XPath/XSLT XQuery (W3C-sponsored Query Language) Current Research

– Other Query Languages– XISS (XML Indexing and Storage System)

Page 3: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

FIRST – Distinction between XML and HTML/Web Technologies

XML spotlight is analogous to Java– Immediate benefits applied to World Wide Web– Long-range, more exciting benefits in applications

XML IS NOT AN HTML REPLACEMENT– HTML marks pages up for presentation on the web– XML marks text for semantic information purposes

XML can encode HTML pages, but HTML works well on the Web

Page 4: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XML Data Storage

XML Documents– Data is delineated semantically– Schemas/DTDs control contents of elements– Semi-structured attitude allows flexibility– Text is human-readable and machine-parsable– Open standards work with common tools– File data storage allows for easy sharing– Can queries control access to data?

Page 5: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Traditional Database Storage

Databases– Data is delineated semantically– Schemas control contents of rows– No flexibility from semi-structured storage– Data is not human-readable, but only machine-

parsable– Proprietary standards prevent interoperability– Proprietary storage prevents data sharing– Queries control access to data

Page 6: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XML for Query Processing

If we can get efficient query processing, XML document storage provides many benefits over traditional database storage.

Sample application– Employee database document– XML Schema assumed to exist– Employee information queried as per standard HR

processing

Page 7: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

<?xml version="1.0"?><!DOCTYPE employees SYSTEM "employee.xsd"><employees> <emp gender='m'> <name> <last>Bissell</last> <first>Brian</first> </name> <position>IT Specialist</position> <salary>35,000</salary> <location>CT</location> </emp> <emp gender='m'> <name> <last>Pham</last> <first>Hung</first> <mi>Q</mi> </name> <position>Senior IT Specialist</position> <salary>45,000</salary> <location>CT</location> </emp> …</employees>

Page 8: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Tree Structure of XML Document

Remember that XML documents are trees

emp

gender name position salary location

last first mi

Page 9: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing – Programming Languages

XML Documents are flat files Any language with file I/O can read XML

document Any language with string parsing capabilities

can use XML data Query processing done through language

syntax “Obvious” result different from traditional

databases

Page 10: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing – Programming Languages

Strategy– Basic File I/O through language– Basic String matching to identify elements– Processing possible, but not necessarily efficient

Languages have gathered XML processing tools in libraries– xerces – Apache library for Java and C++

Two methods for parsing XML data– DOM– SAX

Page 11: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

DOM

Document Object Model Defined by W3C for XML, HTML, and

stylesheets Provides an hierarchical, object-view of the

document DOMParser parses through file, then provides

access to nodes Key: Every item in XML document is a node

Page 12: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

DOM Example

Node (Element)name=“emp”attribute1child1

Node (Attr)name=“gender”value=“m”parent

Node (Element)name=“name”parentchild1

Node (Element)name=“last”parentchild1

Node (Text)value=“Bissell”parent

Page 13: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

SAX

Simple API for XML Defined by XML-DEV mailing list Provides an event-driven processing of the

document XMLReader parses through file and activates

different methods and functions based on the elements retrieved

Key: Methods are defined in interface, implemented in user code

Page 14: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

DOM versus SAX

SAX is primarily Java-based; DOM defined for most languages

DOM requires storage of entire document in memory; SAX processes as it reads

DOM mirrors a document that can be revisited; suited for document processing

SAX mirrors object lifecycles; suited for data processing

Page 15: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing - XPath/XSLT

Standard XML technologies XPath and XSLT provide a ready-made querying infrastructure

XPath identifies the location of various document elements

XSL Stylesheets provide methods for tranforming data from one format to another

Combining XPath and XSLT provides easy generation of result sets based on queries

Page 16: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XPath

Provides element, value, and attribute identification

employees/emp/name/first = “Brian”, “Hung”, “Sara”, “Brian”

//salary = “35,000”, “40,000”, “35,000”, “60,000”

count(/employees/emp) = 4

//mi = “Q”

Page 17: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XSLT

Stylesheet transforms data from one form into another

<xsl:template match=“name”>

<xsl:value-of select=“first”/>

<xsl:value-of select=“last”/>

</xsl:template>

= Brian Bissell, Hung Pham, Sara Menillo, Brian Chicos

Page 18: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Combine XPath and XSLT for Queries

Query: Find the last name and position of each employee named Brian

<xsl:template match='employees'> <xsl:for-each select='emp'> <xsl:if test='name/first="Brian"'> <xsl:value-of select='name/last'/> <xsl:text>:</xsl:text> <xsl:value-of select='position'/> <xsl:text>; </xsl:text> </xsl:if> </xsl:for-each> </xsl:template>

Page 19: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Combine XPath and XSLT for Queries

Query: Find the average salary of all non-managers

<xsl:template match='employees'>

<xsl:variable name='running_sum'>

<xsl:value-of select='sum(emp/salary[../position!="Manager"])'/>

</xsl:variable>

<xsl:variable name='running_count'>

<xsl:value-of select='count(emp[position!="Manager"])'/>

</xsl:variable>

<xsl:value-of select='$running_sum div $running_count'/>

</xsl:template>

Page 20: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Results XSLT/XPath

Many SQL queries can be accomplished– XPath provides element (data) access– XPath provides basic functions (e.g., sum() )– XPath provides WHERE functionality– XSLT provides SELECT functionality– XSLT provides ORDER BY functionality (sort)– XSLT provides result set formatting– UNION functionality provided ..?

Page 21: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Querying with XPath and XSLT

Important questions– Is it sufficient?– Is it efficient?– Is there a better way?

XML community has need to design a full query language

XQuery – Working draft published 7 June 2001

Page 22: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing - XQuery

XML provides flexibility in representing many kinds of information

Good query language must be likewise flexible– Pre-XQuery languages are good for specific types

of data

Goal: “[S]mall, easily implementable language in which queries are concise and easily understood.”

Page 23: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XQuery Forms

1. Path expressions

2. Element constructors

3. FLWR expressions

4. Operator/Function expressions

5. Conditional expressions

6. Quantified expressions

7. Data Type expressions

Page 24: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XQuery – Path Expressions

Contribution of XPath XQuery 1.0 and XPath 2.0 Data Model

document(“sample1.xml”)//emp/salary

/employees/emp/name[../@gender=‘f’]

//emp[1 TO 3]/name/first

Page 25: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XQuery – Element Constructors

Queries can generate new elements Similar to XSLT abilities

<worker>

{$name/last}

{$position}

</worker>

Page 26: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XQuery – FLWR Expressions

For clause/Let clause/Where clause/Return Similar to SQL

FOR $e IN document(“sample1.xml”)//emp

WHERE $e/salary > 38000

AND $e/@gender = ‘f’

RETURN $e/name

Page 27: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XQuery – Operator/Function Expressions

Pre-defined and user-defined operators and functions

Still under development: Union, Intersect, Except

FOR $e IN //employees/emp

WHERE not(empty($e//mi))

RETURN $e/name

Page 28: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XQuery – Conditional Expressions

If-then-else expressions are not yet limited to boolean (ongoing discussion)

FOR $e IN /employees/empRETURN<worker> {$name} IF ($e/position=“Manager”) THEN <manager /></worker>

Page 29: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Quanitifed Expressions

Some/Every conditions Some/Every evaluates to True or False

FOR $e IN //employees

WHERE SOME $p IN $e//emp/position = “Manager”

RETURN $e

Page 30: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Data Types

Data Types based on those available from XML Schema

Data types can be literal (“Brian”), from constructor functions (date(“2001-10-11”) ), or from casting ( CAST AS xsd:integer(24) )

User-defined data types are also allowable and parsable

Page 31: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XQuery

More choices than XSLT/XPath combination Work in progress Current W3C efforts into query language Influencing the future design of the core XML

technologies (XPath) Hopes to be fully flexible for all future XML

applications

Page 32: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing – Research

XQuery specification continues to undergo review and change– 6 of 7 specification documents released since June– All specifications released in 2001

Other avenues of research– Other Query languages– Indexing strategies– Implementation

Page 33: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing – Other Query Languages

Many query languages exist– Quilt (basis for XQuery)– W3C early languages (XML-QL, XQL)– Adopted traditional languages (OQL, XSQL)– Research papers (XML-GL, YATL, Lorel)

Other query languages often optimized for a particular subset of XML documents

Query language field *MAY* be standardizing to XQuery

Page 34: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing – Indexing Strategy

Query language less important; better indexing techniques lead to efficiency

XISS (XML Indexing and Storage System)– September 19, 2001 publishing– Builds sets of indexes on XML data elements and

attributes on initial parse of XML document– Lookup becomes constant-time through the various

built indexes– Demonstrated successes in test runs

Page 35: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

Query Processing - Implementation

XML is currently in state of flux– Standards are still being revised– Industry cautious before embracing a new

technology– Economic slowdown may prevent new research and

development efforts

XML still waiting for its “Killer App”, application that forces immediate acceptance

Page 36: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

XML Query Processing

XML is a functional database storage language Efficient query language needed to turn XML

into a viable database Query language solutions are being developed

– Java/C++ hooks first developed – OK– XSLT/XPath implemented – GOOD– XQuery being designed – GREAT?– Future additions – ????