modeling languages for semi- structured documents · modeling languages for semi-structured...

28
Modeling languages for Semi- Structured Documents Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of Informatique 06/08/2009

Upload: others

Post on 08-Aug-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Modeling languages for Semi-Structured DocumentsStructured Documents

C O M P A R I S O N A N D T R A N S L A T I O N B E T W E E N D M L A N D I T S C O M P E T I T O R S

Yudan ZhaiDep. Of Informatiquep q

06/08/2009

Page 2: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Outline

Project Introduction Project Introduction

XML Modeling in General

Th C i f S h L The Comparison of Schema Languages

The Development of a Schema Translator

Conclusion

Page 3: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Project Introductionj

Goal of the projectGoal of the project

XML Modeling in general XML Modeling in general

Compare DML with other XML schema languages

Make a translation tool between DML and Relax NG Make a translation tool between DML and Relax NG

Page 4: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Project Introductionj

Mil t Milestones -

MS1 - Related study Understand XML modeling in general

DTD, XML Schema and Relax NG

Make a comparison among these three schemas

MS I d th t d f DML MS2 - In-depth study of DML YML, DML, DGL

Comparison: DML vs Relax NG / XML Schema / DTD Comparison: DML vs. Relax NG / XML Schema / DTD

MS3 - Implementation

Page 5: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

XML Modeling in Generalg

XML XML stands for - eXtensible Markup Language

Motivation – Exchange information

Valid document The document should be readable and understandable with XML- The document should be readable and understandable with XML-

aware software.

Sets of rules and constrains are defined. specified by XML schema languages

Page 6: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Four Schema Languages g g

DTD Document Type Definitions

Can be defined inline

XML Schema Published by the W3C More express power

Too complexity syntax

Page 7: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Four Schema Languages g g

Relax NG Being standardized in OASIS

Clean, simple and powerful

T ib l i d l Treat attributes as elements in content models

DML DML Document Modeling Language

Is a regular tree grammar-based schema language

Supports inheritance

Page 8: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Comparison of Schema LanguagesComparison of Schema Languages

The easiest syntax –DTDThe easiest syntax DTD

Richest build in data types XML Schema Richest build-in data types –XML Schema

Simple yet powerful enough –Relax NG

As a part integrated system – DML

Page 9: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

The Development of a Schema Translatorp

Project Introduction

Language: JAVA

D l i E i t JDK Developing Environment: JDK 5.0

Function: Function: Converting From RelaxNG to DML

Converting From DML to Relax NGg

Page 10: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

The Development of a Schema Translatorp

Abstract syntax

ASN.1(Abstract Syntax Notation One )

Standard and Notation

Describes data structures

Page 11: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Implementationp

Ab t t t f R l NG Abstract syntax for Relax NGGrammar : = srt : Start ; def : Define Start : = top : TopDefine : = name : Identifier; elt : Element Element : = nc : NameClass; top : TopTOP : = na : NotAllowed | pattern : PatternPattern : = empty : Empty | nep :NonEmptyPatternNonEmptyPattern : = txt : TEXT | data : Data

| value : Valueue | list : NGList| att : NGAttribute | ref : REF | att : NGAttribute | ref : REF | oom : OneOrMore | choice : Choice | group : Group | itl : Interleave

Text : = < text /> Data : = type : Identifier ; dtl : URI Value : = dtl : URI ; type : Identifier ; ns : String ;; yp ; g ;

content:StringList : = pattern : PatternNGAttribute : = name : String ; pattern : Pattern Ref : = name : IdenfifierOneOrMore : = nep : NonEmptyPattern

h iChoice : = nep : NonEmptyPatternGroup : = nep : NonEmptyPatternInterleave : = nep : NonEmptyPatternNameClass : = anyName : AnyName

| nsName : NsName| name : Name| name : Name

Identifier : = S

Page 12: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Implementationp

Ab t t t f DML Abstract syntax for DMLSCHEMA ::= ns:NS*;str:STRUCT*;type:TYPE*NS ::= id:ID;uri:URI|ns:NS*STRUCT ::= sim:SIMPLE |named:NAMED |der:DERIVED |str:STRUCT*STRUCT :: sim:SIMPLE |named:NAMED |der:DERIVED |str:STRUCTTYPE ::= id:ID;pattr:PATTERN |type:TYPE*SIMPLE ::= att:ATTRIBUTE;cnt:CONTENTCONTENT ::= item:ITEM |ref:REFITEM ::= seq:SEQ |choice:CHOICE |elt:ELT

|txt:TXT |any:ANY |item:ITEM*|txt:TXT |any:ANY |item:ITEM*REF ::= qn:QNAMESEQ ::= occ:OCC;item:ITEMCHOICE ::= occ:OCC;item:ITEMELT ::= val:VAL;occ:OCC;sim:SIMPLEATTRIBUTE ::= anyatt:ANYATT | use:USE;val:VAL* |att:ATTRIBUTE*TXT ::= val:VAL;occ:OCC;BANY ::= occ:OCC;sim:SIMPLEANYATT ::= use:USE;val:VALOCC ::= 1|?|+|*OCC :: 1|?| |USE ::= 1|?VAL ::= tref:TYPEREF |pattr:PATTERN |id:ID |APP |CPYNAMED ::= id:ID;sim:SIMPLEID ::= id:stringB ::= Boolean B ::= Boolean

Page 13: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Relax NG to DML

A hit t Architecture

Page 14: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Relax NG Tree Builder

Page 15: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Relax NG Tree Builder

E l Example<?xml version="1.0" encoding="ISO-8859-1"?>

<grammar><start>

<ref name="simple-elt"/></start></start><define name="simple-elt">

<element><name ns="">a</name>

ib<attribute><name ns="">id</name><text/>

</attribute></attribute></element>

</define></grammar>

Page 16: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Relax NG Tree Builder

Corresponding abstract tree

Page 17: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Converter(Relax NG to DML)( )

Basic Rules: Basic Rules: Grammar -------> Schema

TOP -------> SimpleStructure Attribute -------> Attribute Attribute > Attribute Reference -------> Reference Other Pattern -------> Item

• Empty -------> NULL• NonEmptyPattern -------> Item

Text -------> Text Data, Value -------> Value List,OneOrMore,Group -------> SEQ, , p Q Choice -------> Choice InterLeave -------> Choice and SEQ

Define -------> NamedStructure Element >SimpleStructure Element ------->SimpleStructure

• NameClass -------> Value• Top ------->SimpleStructure

Page 18: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Converter(Relax NG to DML)( )

R l Rules But Reference -------> Reference?

<Seq><ref name="elt-a"/><ref name="elt-b"/>

/S

<Seq><ref name="elt-a"/><ref name="elt-b"/>

/S

<start><oneOrMore>

<group><ref name="elt-a"/><ref name="elt-b"/>

<Seq><ref name="elt-a"/><ref name="elt-b"/>

/S</Seq></Seq><ref name elt b />

</group></start><define name="elt-a">

<element><name ns "">a</name>

</Seq><seq occ="many"><elt occ="once"><name content="a"/><text occ="once" eol="false"><value type="string"/><name ns= >a</name>

<text/></element>

</define><define name="elt-b">

<value type= string /></text></elt><elt occ="once"><name content="b"/>

t t " " l "f l "<element><name ns="">b</name><text/>

</element></define>

<text occ="once" eol="false"><value type="string"/></text></elt></seq>/de e

Page 19: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Result

<?xml version="1.0" encoding="ISO-8859-1"?> <?xml version="1 0" encoding="UTF-8"?><?xml version 1.0 encoding ISO 8859 1 ?><!-- TWO ELEMENTS --><grammar><start><oneOrMore><group>

<?xml version= 1.0 encoding= UTF-8 ?>

<yml><seq occ="many">

<group><ref name="elt-a"/><ref name="elt-b"/>

</group></oneOrMore>

<elt occ="once"><name content="a"/><text occ="once" eol="false"><value type="string"/>

</start><define name="elt-a"><element>

<name ns="">a</name><text/>

<value type= string /></text>

</elt><elt occ="once">

<text/></element>

</define><define name="elt-b"><element>

"" b /

<name content="b"/><text occ="once" eol="false"><value type="string"/></text><name ns="">b</name>

<text/></element>

</define></grammar>

</text></elt></seq></yml>

Page 20: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Reverse Convertingg

A hit t Architecture

Page 21: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

DML Tree Builder

Page 22: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Converter( DML to Relax NG)( )

B i l Basic rules: Schema -------> Grammar

SimpleStructure > Top SimpleStructure -------> Top

ATT -------> ATT

CNT-------> PatternC• Item ------->Pattern

• Ref -------> Ref

N dS D fi NamedStructure -------> Define

SimpleStructure ------->TOP

Page 23: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Converter( DML to Relax NG)( )

E ti l Exception rules:

Simple structure contains Element Element -------> Reference

Add the ne Define Str ct re Add the new Define Structure

Element contains Element Element contains Element Element -------> Reference

Add the new Define Structure

Page 24: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Converter( DML to Relax NG)( )

E ti l Exception rules

<yml version="1.0" type="dml">

<grammar xmlns=http://relaxng.org/ns/structure/1.0>y y

<elt><name content="addressbook"/><elt occ="many">

<name content="contact"/>

<start><ref name="addressbook-NC"/>

</start><define name "addressbook NC" ><name content= contact />

<ref name="contact-content"/></elt>

</elt>

<define name="addressbook-NC" ><element><name ns="">addressbook</name><oneOrMore>

<Structure>……..

</structure></yml>

<ref name="contact-NC"/></oneOrMore>

</element></define></yml> </define><define>

……</define><grammar>

/<grammar>

Page 25: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Converter( DML to Relax NG)( )

How to deal with Occurrence? How to deal with Occurrence?

Many <OneOrMore> P <OneOrMore>

Free <Choice>Free <Choice><OneOrMore>P<OneOrMore><empty/>

<Choice> <Choice>

Optional

<Choice>P <empty/><Choice>

Once P

Page 26: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Result (DML->RelaxNG)( )

Add b k d l Addressbook.dml

Addressbook.rng

Page 27: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of

Conclusion

DML DML is a part of integration system for the management of semi-structured

documentsh i h DTD has a stronger expressive power than DTD

Reduce the complexity as XML Schema Is very comparable to Relax NG but provides an inheritance mechanism

Implementation Based on Abstract Syntax Notation Based on Abstract Syntax Notation Good expansibility for the program Limitations:

Weak for data type conversion Syntax of RelaxNG is limited to Simple syntax only Did not consider inheritance in DML

Page 28: Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured Documents COMPARISON AND TRANSLATION BETWEEN DML AND ITS COMPETITORS Yudan Zhai Dep. Of