數位圖書館 – xml 系統應用
DESCRIPTION
數位圖書館 – XML 系統應用. Jian-hua Yeh ( 葉建華 ) 真理大學資訊科學系助理教授 [email protected]. Outline. XML language introduction XML server architecture XML query language design issues. XML Introduction. What is XML? Why XML? The XML power XML and the enterprise. What is XML?. - PowerPoint PPT PresentationTRANSCRIPT
2
Outline
• XML language introduction
• XML server architecture
• XML query language design issues
3
XML Introduction
• What is XML?
• Why XML?
• The XML power
• XML and the enterprise
4
What is XML?
• Proposed by W3C at the end of 1996
• SGML-derived
• A meta-language for new tagging language
• XML1.0 Recommendation released at Feb. 1998
• Supporting
– Sun, Microsoft, Netscape, Adobe, ArborText, etc.
5
What is XML? (2)
• eXtensible Markup Language
• Tag-based
• Open and cross-platform
• Structural data representation
• As data and as document
• Suitable for data exchange
6
<?xml version="1.0"?> <invoicecollection> <invoice> <customer> Wile E. Coyote, Death Valley, CA </customer> <annotation> Customer asked that we guarantee return rights if these items should fail in desert conditions. This was approved by Marty Melliore, general manager. </annotation> <entries n=2> <entry quantity=2 total_price="134.00"> <product maker="ACME" prod_name="screwdriver" price="80.00"/> </entry> <entry quantity=1 total_price="20.00"> <product maker="ACME" prod_name="power wrench" price="20.00"/> </entry> </entries> </invoice> <invoice> <customer> Camp Mertz </customer> <entries n=2> <entry quantity=2 total_price="32.00"> <product maker="BSA" prod_name="left-handed smoke shifter" price="16.00"/> </entry> <entry quantity=1 total_price="13.00"> <product maker="BSA" prod_name="snipe call" price="13.00"/> </entry> </entries> </invoice> </invoicecollection>
7
Why XML?
• HTML is not enough, no structural data handling capability
• Recommended by W3C, an open standard
• The push of enterprise integration
• To break the stovepipe system, from vertical to horizontal
• The need of B2B, B2C integration
• Platform independent
8
Traditional Data Exchange Handling
• Private protocol for stovepipe system
• Open standard for data exchange
– RPC
– RMI
– CORBA
– COM
9
New Strategy of Data Exchange
• Text-based
• Tag-oriented
• Self-descriptive
• Data Type Definition
10
XML Details
• Components
– DTD
– XML content
• Processing models
– Event driven model: SAX
• A document is treated as a set of events
– Structural model: DOM
• A document is represented as a tree structure
11
XML Server Introduction
• Why XML server?
– Comply with enterprise service model: client/middle/EIS structure
• Common components can consists of 3rd party software vendors
– XML parser, XSL processor, etc.
12
XML Server Architecture
13
XML Server Architecture (2)
• Key aspects
– Client
• PDA, browser, Web server, other XML server, etc.
– Communication protocol
• Email, HTTP, FTP, EJB, RMI, IIOP, COM, etc.
– Key services
– Data object
• Relational database, object data source, etc.
14
XML Server Components
• Client
• Communication service
• Document handler
• Data object access module
• XML core service
15
An Operation Example
16
XML support in Java technology
• XML processing
• Data binding
• Remote communication
• Service registry
• Messaging
17
Java for XML Processing
• JAXP (Java API for XML Processing)
– SAX (Simple API for XML) parser
• Event-based XML parsing
– DOM (Document Object Model) parser
• Model-based XML parsing
– XSLT (XML Stylesheet Language for Transformations) processor
• Support SAX, DOM, stream-specific processing
18
Java for XML Data Binding
• JAXB (Java Architecture for XML Binding)
– Schema-based
– Validation
– Representing XML content
19
Java for XML Communication
• JAX-RPC (Java API for XML-based RPC)
– RPC-based Web service
– SOAP-based (Simple Object Access Protocol)
– Discoverable by using JAXR (*later*)
20
Java for XML Registries
• JAXR (Java API for XML Registries)
– Service registration
– Service lookup
21
Java for XML Messaging
• JAXM (Java API for XML Messaging)
– Message provider
• SAAJ (SOAP with Attachments API for Java)
– Message population with attachment
22
XML Processing, How?
• Locating: XPath
• Querying: XQL, XQuery
• Storage: XMLDB
23
What is XPath?
• W3C standard
• A syntax for defining parts of an XML document
• Uses paths to define XML elements
• Defines a library of standard functions
• A major element in XSLT
24
Sample XML
<?xml version="1.0" encoding="ISO-8859-1"?><catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd></catalog>
• Path
– /catalog/cd/price
• Function
– /catalog/cd[price>10.80]
25
XPath: The Syntax
26
Path Syntax: Locating Nodes
• /catalog/cd/price
• //cd
• /catalog/cd/*
• /catalog/*/price
• /*/*/price
• //*
<?xml version="1.0" encoding="ISO-8859-1"?><catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd></catalog>
27
Path Syntax: Selecting Branches
• /catalog/cd[1]
• /catalog/cd[last()]
• /catalog/cd[price]
• /catalog/cd[price=10.90]
• /catalog/cd[price=10.90]/
price
<?xml version="1.0" encoding="ISO-8859-1"?><catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd></catalog>
28
Path Syntax: Selecting Several Paths• /catalog/cd/title |
/catalog/cd/artist
• //title | //artist
• //title | //artist | //price
<?xml version="1.0" encoding="ISO-8859-1"?><catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd></catalog>
29
Path Syntax: Selecting Attributes• //@country
• //cd[@country]
• //cd[@*]
• //cd[@country='UK']
<?xml version="1.0" encoding="ISO-8859-1"?><catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd></catalog>
30
XPath: Location paths
31
Formal Syntax
• axisname::nodetest[predicate]
– child::price[price=9.90]
32
Axes and Node Tests
33
Abbreviated Syntax
34
Location Paths Examples
35
XPath: The expressions
36
Expression Types
• Numerical expressions
• Equality expressions
• Relational expressions
• Boolean expressions
37
Numerical Expressions
38
Equality Expressions
39
Relational Expressions
40
Boolean Expressions
41
XPath: The functions
42
XPath Function Library
• Node Set Functions
• String Functions
• Number Functions
• Boolean Functions
43
Node Set Functions
44
String Functions
45
Number Functions
46
Boolean Functions
47
XQL: XML Query Language
• XQL problem domains
• Queries, search contexts, and result sets
• Result sets vs. result documents
48
XQL Introduction
• Developers
– Texcel, webMethods, Microsoft
• Traditional query processing
• Features of XML documents
49
Traditional Query Processing
• Structured query
– For relational database
– For object-oriented database
• Unstructured full-text query
– For text documents
50
Features of XML Documents
• As documents
• As data sources
• With structure feature
51
XQL Problem Domains
• Queries within a single document(in a browser or editor)
• Queries in collections of documents(document assembly in an XML repository)
• Addressing within or across documents
• XSL Patterns
52
The Role of a Query Language
• Different problem domain has different input/output and processing model
• Common thing
– assertion(name,content,value,relationship)
• Tradeoff
– design separate query language for each problem domain
– a general-purpose query language for all problem domains
53
SQL vs. XQL
SQL XQL
The database is a set of tables. The database is a set of one or more XML documents.
Queries are done in SQL, a query language that uses tables as a basic model.
Queries are done in XQL, a query language that uses the structure of XML as a basic model.
The FROM clause determines the tables which are examined by the query.
A query is given a set of input nodes from one or more documents, and examines those nodes and their descendants.
The result of a query is a table containing a set of rows.
The result of a query is a set of XML document nodes, which can be wrapped in a root node to create a well-formed XML document.
54
Simple Query Example
Search Context
<novel> <front> <title>The Heart of Darkness</title> <author>Joseph Conrad</author> </front></novel>
Query novel
Result Set
<novel> <front> <title>The Heart of Darkness</title> <author>Joseph Conrad</author> </front></novel>
55
Why result documents?
• An XML document is easily parsed with a standard XML parser, so it can be transmitted as a single ASCII stream and parsed by the receiving application.
• An XML document can be displayed in a standard XML browser.
• An XML document can be stored in an XML repository.
• An XML document can be passed on to an XSL processor to perform transformations or do formatting.
56
XML Database
57
What is Native XML Database?
• Defines a (logical) model for an XML document
– The database is specialized for storing XML data
• Has an XML document as its fundamental unit of (logical) storage
– Documents in, documents out
• Is not required to have any particular underlying physical storage model
– May not actually be a standalone database at all
58
Native XML Database Features
• XML storage
• Collections
– Allow you to query and manipulate those documents as a set
• Queries
– XPath, XQL
• Updates
– Update portions of documents (XUpdate)
59
Native XML Database Products
• eXist: proprietary/relational
• ozone: object-oriented
• Tamino: proprietary/relational
• Xindice: proprietary
• X-Hive: object-oriented/relational