naming in xml documents

40
Page 1 Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab Naming in XML Documents Dr. Ramon Lawrence Dr. Ramon Lawrence IDEA Lab IDEA Lab University of Iowa University of Iowa [email protected] [email protected]

Upload: gage-barnes

Post on 31-Dec-2015

30 views

Category:

Documents


1 download

DESCRIPTION

Naming in XML Documents. Dr. Ramon Lawrence IDEA Lab University of Iowa [email protected]. Outline. Motivation Overall Goals Background Naming and Ontologies Semantic Naming of XML Elements Semantic Querying of Named XML Documents Support for Document Evolution and Linking - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Naming in XML Documents

Page 1

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Naming in XML Documents Naming in XML Documents

Dr. Ramon LawrenceDr. Ramon LawrenceIDEA LabIDEA Lab

University of IowaUniversity of [email protected]@uiowa.edu

Dr. Ramon LawrenceDr. Ramon LawrenceIDEA LabIDEA Lab

University of IowaUniversity of [email protected]@uiowa.edu

Page 2: Naming in XML Documents

Page 2

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Outline

Motivation Overall Goals Background Naming and Ontologies Semantic Naming of XML Elements Semantic Querying of Named XML Documents Support for Document Evolution and Linking Future Work and Conclusions

Page 3: Naming in XML Documents

Page 3

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Motivation

Motivation #1 - Naming is important despite limited research focus.

Names are a gateway to structure, but can also be used to avoid structure.

Users understand names better than structure, but naming is not considered in many models.

Motivation #2 - XML querying can be improved by minimizing use of path expressions.

XML query languages are complex and highly structured-based (even more than SQL).

Path expressions are similar to navigating in hierarchical models which was proven undesirable.

Queries cannot adapt to document changes.

Page 4: Naming in XML Documents

Page 4

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Are Names Really That Important?

<!ELEMENT LM (M+)><!ELEMENT M (MN, MO+)><!ELEMENT MN (#PCDATA)><!ELEMENT MO (N, Y, F, S, R, V+)><!ELEMENT N (#PCDATA)><!ELEMENT Y (#PCDATA)><!ELEMENT F (#PCDATA)><!ELEMENT S (#PCDATA)><!ELEMENT R (#PCDATA)><!ELEMENT V (C, P, VN, O+)><!ELEMENT C (#PCDATA)><!ELEMENT P (#PCDATA)><!ELEMENT VN (#PCDATA)><!ELEMENT O (#PCDATA)>

Poorly Named DTD

<!ELEMENT list-manufacturer (manufacturer+)><!ELEMENT manufacturer (mn-name, model+)><!ELEMENT mn-name (#PCDATA)><!ELEMENT model (mo-name, year, front-ratingside-rating, rank, vehicle+)><!ELEMENT mo-name (#PCDATA)><!ELEMENT year (#PCDATA)><!ELEMENT front-rating (#PCDATA)><!ELEMENT side-rating (#PCDATA)><!ELEMENT rank (#PCDATA)><!ELEMENT vehicle (color, price, vendorName,

option+)><!ELEMENT color (#PCDATA)><!ELEMENT price (#PCDATA)><!ELEMENT vendorName (#PCDATA)><!ELEMENT option (#PCDATA)>

DTD with Decent Naming

Page 5: Naming in XML Documents

Page 5

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Overall Goal

The overall goal is to develop a naming methodology for XML tags that has two desirable properties:

1) Provides more semantics and context information to users.

2) Allows semantic querying of XML documents to simplify query formulation and handle document evolution.

The naming methodology must NOT enforce a strict standard on naming, but encourage better naming by providing a useful technique.

Page 6: Naming in XML Documents

Page 6

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA LabBackground

XML Tag Names and Standards

The development of standard tag sets for given problem domains has been the focus of many organizations.

ebXML, RosettaNet, CML, XFRML, MathML

Our goal is not to define THE tag set for all XML, but rather suggest a methodology for constructing tag sets.

Applicable to the Semantic Web effort.

Page 7: Naming in XML Documents

Page 7

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA LabBackground

XML Querying

There has been many XML query languages proposed:

LOREL, XML-QL, XML-GL, XSL, XQL

Even the graphical XML query language, XML GL, only supports querying with path expressions.

Why would we go back in time and make querying harder for the user?

The relational model replaced the hierarchical model because of its declarative query syntax.

Page 8: Naming in XML Documents

Page 8

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Running Example

Page 9: Naming in XML Documents

Page 9

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Converting the ER Model to XML

Modeling in XML requires a decision on how to hierarchically organize the information in the XML document.

Once selected, the hierarchical organization becomes the only view of the data and requires the user to formulate queries based on the hierarchy chosen.

Nesting of elements in XML has ambiguous semantics as the nesting may represent:

specialization/generalization (IS-A), Part-Of/HAS-A, ordering, grouping, general relationship (join)

Without tag names, impossible to determine relationship between nested elements.

Page 10: Naming in XML Documents

Page 10

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Two XML DTDs for ER Diagram (1)

<!ELEMENT list-manufacturer (manufacturer+)><!ELEMENT manufacturer (mn-name, model+)><!ELEMENT mn-name (#PCDATA)><!ELEMENT model (mo-name, year, front-rating, side-rating, rank, vehicle+)><!ELEMENT mo-name (#PCDATA)><!ELEMENT year (#PCDATA)><!ELEMENT front-rating (#PCDATA)><!ELEMENT side-rating (#PCDATA)><!ELEMENT rank (#PCDATA)><!ELEMENT vehicle (color, price, vendorName, option+)><!ELEMENT color (#PCDATA)><!ELEMENT price (#PCDATA)><!ELEMENT vendorName (#PCDATA)><!ELEMENT option (#PCDATA)>

DTD1

Page 11: Naming in XML Documents

Page 11

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Two XML DTDs for ER Diagram (2)

<!ELEMENT list-vendor (vendor+)><!ELEMENT vendor (vendorName, vehicle+)><!ELEMENT vendorName (#PCDATA)><!ELEMENT vehicle (color, price, op-name+,mn-name, model)><!ELEMENT color (#PCDATA)><!ELEMENT price (#PCDATA)><!ELEMENT op-name (#PCDATA)><!ELEMENT model (mo-name, year, front-rating, side-rating, rank)><!ELEMENT mo-name (#PCDATA)><!ELEMENT year (#PCDATA)><!ELEMENT front-rating (#PCDATA)><!ELEMENT side-rating (#PCDATA)><!ELEMENT rank (#PCDATA)>

DTD2 Differences:1) Different hierarchical organization

mn-name2) Different modeling of manufacturer

3) Naming differencesop-name

Page 12: Naming in XML Documents

Page 12

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

A Simple Query on Both DTDs Query:

Return the manufacturer name and vehicle price for all vehicles with price < $30,000 and the vehicle model is in the top 10 for safety tests.

DTD1:

DTD2:

select M.mn-name, M.model.vehicle.pricefrom list-manufacturer.manufacturer Mwhere M.model.rank <= 10 and

M.model.vehicle.price < 30000

select V.mn-name, V.price from list-vendor.vendor.vehicle V where V.model.rank <= 10 and V.price < 30000

Page 13: Naming in XML Documents

Page 13

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Ontologies and Naming

Assume the existence of some ontology to extract terms with definitions.

May use WordNet or problem-specific ontology.

Assumption: Human users have a “built-in” ontology, or view of the world, based on their experience and knowledge of the language.

By selecting common terms from a shared dictionary (language), both the producer (XML document source), and consumer (XML document user) will understand the semantics of a data element by terms used to defined the name.

Caveat: Understanding is to some degree of accuracy. (Hopefully >= 90%).

Page 14: Naming in XML Documents

Page 14

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Ontologies and Naming (2)

Assumption #2: As more context information is provided by the producer (in the form of additional terms), the consumer is more confident that their world view is consistent with that of the the producer.

Consumer understands the producers view even if they originally do not share the same view.

Important: At no time is their intelligence demonstrated by software. The intelligence is embedded into the names assigned by the producer, and extracted by the consumer.

The system never needs to build its own world view to aid the users in reconciling theirs.

Page 15: Naming in XML Documents

Page 15

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

A semantic name is a tag name for an XML element of the following form:

semantic_name ::= [CT_Term] | [CT_Term].PN CT_Term ::= CT | CT ; CT_Term | CT , CT_Term CT ::= <dictionary term> PN ::= <dictionary term>

A semantic name is intended to capture structure-independent semantics by combining multiple dictionary terms.

Structure of a Semantic Name

Page 16: Naming in XML Documents

Page 16

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

DTD1 with Semantic Naming

<!ELEMENT V (Manufacturer+)><!ELEMENT Manufacturer (Manufacturer--Name, Manufacturer-Model+)><!ELEMENT Manufacturer--Name (#PCDATA)><!ELEMENT Manufacturer-Model (Manufacturer-Model--Name, Manufacturer-Model--Year, Manufacturer-Model-NHSCTest--FrontRating, Manufacturer-Model-NHSCTest--SideRating, Manufacturer-Model-NHSCTest--Rank, Vehicle+><!ELEMENT Manufacturer-Model--Name (#PCDATA)><!ELEMENT Manufacturer-Model--Year (#PCDATA)><!ELEMENT Manufacturer-Model-NHSCTest--FrontRating (#PCDATA)><!ELEMENT Manufacturer-Model-NHSCTest--SideRating (#PCDATA)><!ELEMENT Manufacturer-Model-NHSCTest--Rank (#PCDATA)><!ELEMENT Vehicle (Vehicle--Color, Vehicle--Price,Vendor--Name,Vehicle-Option--Name+)><!ELEMENT Vehicle--Color (#PCDATA)><!ELEMENT Vehicle--Price (#PCDATA)><!ELEMENT Vehicle-Option--Name (#PCDATA)><!ELEMENT Vendor--Name (#PCDATA)>

<!ELEMENT Manufacturer-Model-NHSCTest--Rank (#PCDATA)>

<!ELEMENT Vehicle--Price (#PCDATA)> Name is context-independent.

Page 17: Naming in XML Documents

Page 17

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Semantic Querying

Using semantic tag names introduces a tradeoff between increased semantic description and longer tag names.

Path expressions are difficult to formulate and complicate XML querying.

Since semantic names are structure independent, queries can be posed without using path expressions.

Page 18: Naming in XML Documents

Page 18

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

A Context View A context view is a structure-independent

hierarchy of concepts in the XML document.

The hierarchy is constructed automatically from the tag names in the XML document/DTD.

User’s query on the context view, and their queries are mapped to LOREL queries on the XML documents.

Page 19: Naming in XML Documents

Page 19

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Building the Context View

[Manufacturer]

Vehicle Manufacturer

Vendor

Name

Name

Option

ColorPrice

RankFront Rating Side Rating

NHSC TestName Year

NameModel

[Manufacturer].Name[Manufacturer;Model][Vehicle]

Page 20: Naming in XML Documents

Page 20

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Querying the Context View

Vendor

Vehicle Manufacturer

Name

Name

Option

ColorPrice

RankFront Rating Side Rating

NHSC TestName Year

NameModel

Return the manufacturer name and vehicle price for vehicles with price < $30,000 and the vehicle model is in the top 10 for safety tests.

(return)

Manufacturer

Name

< 30000, (return)

Vehicle

Price

<= 10

Rank

NHSC Test

Model

Page 21: Naming in XML Documents

Page 21

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Mapping to DTD1

mn-name

list-manufacturer

manufacturer

vehicle

vendorName

model

front-ratingside-rating

mo-name rank

year

option

price

color

Page 22: Naming in XML Documents

Page 22

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Semantic Naming in DTD1

[Manufacturer].Name

V

[Manufacturer]

[Vehicle]

[Vendor].Name

[Manufacturer;Model]

*(FR)[Manufacturer;Model;NHSCTest].SideRating

[Manufacturer;Model].Name [Manufacturer;Model;NHSCTest].Rank

[Manufacturer;Model].Year

[Vehicle;Option].Name

[Vehicle].Price

[Vehicle].Color

Page 23: Naming in XML Documents

Page 23

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Query Mapping to DTD1

[Manufacturer].Name

V

[Manufacturer]

[Vehicle]

[Vendor].Name

[Manufacturer;Model]

*(FR)[Manufacturer;Model;NHSCTest].SideRating

[Manufacturer;Model].Name [Manufacturer;Model;NHSCTest].Rank

[Manufacturer;Model].Year

[Vehicle;Option].Name

[Vehicle].Price

[Vehicle].Color

[Manufacturer].Name

(return)[Manufacturer;Model;NHSCTest].Rank

[Vehicle].Price<30000, return

Page 24: Naming in XML Documents

Page 24

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Mapping to DTD2

vendorName

list-vendor

vendor

model

side-rating

vehicle

price

op-name mn-name

color

front-rating

year

mo-name

rank

Page 25: Naming in XML Documents

Page 25

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Mapping to DTD2

[Vendor].Name

V

[Vendor]

[Manufacturer;Model]

[Manufacturer;Model;NHSCTest].SideRating

[Vehicle]

[Vehicle].Price

[Vehicle;Option].Name [Manufacturer].Name

[Vehicle].Color

[Manufacturer;Model;NHSCTest].FrontRating

[Manufacturer;Model].Year

[Manufacturer;Model].Name

[Manufacturer;Model;NHSCTest].Rank

[Manufacturer].Name

(return)[Vehicle].Price

< 30000,return

[Manufacturer;Model;NHSCTest].Rank

<= 10

Page 26: Naming in XML Documents

Page 26

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Mapping Algorithm

Perform a breadth-first traversal of DTD x to build a mapping table T.

Each entry in T contains a tag name tn, and a set of path expressions P. Each p in P provides a path in DTD x to element named tn.

If DTD x is a tree, each tn has a unique path. If DTD x is a graph, there may be multiple possible paths.

Can return path union or get user input.

After all path mappings have been determined, build a spanning tree connecting paths.

Unique spanning tree for tree DTDs, may have multiple spanning trees for graph DTDs.

Page 27: Naming in XML Documents

Page 27

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Conclusions Naming is important because names for structures

are a user’s first contact with structural data representations.

Naming can be exploited to hide the structure by embedding more information into names.

The names assigned to XML elements have been standardized within organizations, but no work has been done on examining what constitutes good names.

By using names that are structure-independent, semantic querying is possible.

Semantic querying does not use path expressions. Semantic queries support document evolution.

Page 28: Naming in XML Documents

Page 28

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Future Work

Test performance and cost with renaming on real-world XML document sets.

Does the increased XML document size affect query performance?

Develop formal query algebra for semantic queries.

Page 29: Naming in XML Documents

Page 29

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

References Publications:

Using Unity to Semi-Automatically Integrate Relational Schema, Demonstration at ICDE’2002.

Querying Relational Databases without Explicit Joins, R. Lawrence and K. Barker, DASWIS 2001.

Integrating Relational Database Schemas using a Standardized Dictionary, SAC’2001 - ACM Symposium on Applied Computing, March, 2001.

Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pg 127-136, Oct. 2000.

Further Information: http://www.cs.uiowa.edu/~rlawrenc/ http://idealab.cs.uiowa.edu

Page 30: Naming in XML Documents

Page 30

Naming in XML Documents - ODBASE’02 Ramon Lawrence, IDEA Lab

Extra Slides

Extra Slides...

Page 31: Naming in XML Documents
Page 32: Naming in XML Documents
Page 33: Naming in XML Documents
Page 34: Naming in XML Documents
Page 35: Naming in XML Documents
Page 36: Naming in XML Documents
Page 37: Naming in XML Documents
Page 38: Naming in XML Documents
Page 39: Naming in XML Documents
Page 40: Naming in XML Documents