the semistructured-data model programming languages for xmlhkhosrav/db/slides/11.semistructure...

17
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi

Upload: others

Post on 16-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

The Semistructured-Data Model

Programming Languages for XML

Spring 2011

Instructor: Hassan Khosravi

Page 2: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.2

Semistructured Data

Another data model, based on trees.

Self-describing:

The data implicitly carries information about what its schema is.

May only carry the names of attributes (so possibly untyped), and

has a lower degree of organization than the data in a relational

database.

May have no associated schema (i.e. may be schema-less)

Motivation:

flexible representation of data.

sharing of documents among systems and databases.

Information integration

– E.g. want to “merge” or query two databases.

Data exchange

– E.g. two enterprises may want to exchange data (such as

buyers and sellers)

Page 3: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.3

Semistructured Data representation

Page 4: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.4

Relational Semistructured

Structure Tables Hierarchical tree,

graph

Schema Fixed in advance Flexible, self

describing

Queries Simple nice language Less so

Ordering None (has order by) Implied

Implementation Mature and native Add-on

Page 5: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.5

Comparison with Relational Data

Inefficient: tags, which in effect represent schema information, are

repeated

Access: data is structured hierarchically.

Better than relational tuples as a data-exchange format

Unlike relational tuples, semistructured data is self-documenting

due to presence of tags

Flexible, non-rigid format: tags can be added

Allows nested structures

Wide acceptance, not only in database systems, but also in

browsers, tools, and applications

Page 6: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.6

Flexibility in Schema

Page 7: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.7

XML

XML : Extensible Markup Language

A standard adopted in 1998

While HTML uses tags for formatting (e.g., “italic”), XML uses tags for

semantics (e.g., indicating “this is an address” or “this is a title”).

Key idea: create tag sets for a domain (e.g., genomics), and translate

all data into properly tagged XML documents.

There are two different modes of use of XML:

Well-Formed XML allows you to invent your own tags.

No predefined schema

Valid XML conforms to a certain DTD.

The DTD describes allowable tags and their nesting.

But still reasonably flexible – e.g. may allow optional or missing

fields

Page 8: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.8

Well-Formed XML

Begins with a declaration that it is XML

It has a root element that is the entire body of the text

Page 9: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.9

Well-Formed XML

Valid XML

Page 10: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.10

Valid XML Document Type Descriptor (DTD)

Grammar-like language for specifying elements, attributes,

nesting, ordering, #occurrences

Special attribute types ID and IDREF

Example

Page 11: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.11

QUERYING SEMISTRUCTURED

DATA

Page 12: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.12

Querying XML

Not nearly as mature as Querying relational

Newer

No underlying theory as in relational models

Sequence of development

Xpath – path expressions + conditions

Xquery – Xpath + full featured query language

Page 13: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.13

XPath

Think of XML as a tree

path expressions + conditions

Page 14: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.14

Xpath Syntax

/ root element

name of element “book”

Use name as * to match everything

@ISBN

// matches all descendant

conditions [@price < 50]

[N] nth child author [2]

Axes (to navigate around tree 13)

Parent::

Following-sibling::

Descendants::

Self::

Page 16: The Semistructured-Data Model Programming Languages for XMLhkhosrav/db/slides/11.semistructure model.pdf · 11.2 Semistructured Data Another data model, based on trees. Self-describing:

11.16

XQuery