1 “universal data-speak”: the extensible markup language zack ives cse 590db, winter 2000...
TRANSCRIPT
![Page 1: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/1.jpg)
1
“Universal Data-Speak”: The eXtensible Markup Language
Zack IvesCSE 590DB, Winter
2000
University of Washington
3 January 2000
![Page 2: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/2.jpg)
2
What Is XML?
eXtensible Markup Language for data Standard for publishing and interchange “Cleaner” SGML for the Internet
Applications: Data exchange over intranets, between
companies E-business Native file formats (Word, SVG) Publishing of data Storage format for irregular data …
![Page 3: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/3.jpg)
3
What’s Special about XML?
Supported by almost everyone Easy to parse (even with no info about the doc) Can encode data with little or much structure Supports data references inside & outside
document Presentation layer for publishing (XSL) Document Object Model (DOM) for
manipulating Many, many tools
![Page 4: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/4.jpg)
4
Basic XML Structures
Elements: Open & close
tags or “empty tag”
Ordered, nestable
Attributes: Single-valued,
unordered Special types:
ID, IDREF, IDREFS
PCDATA/CDATA
<?xml version="1.0" encoding="UTF-8"?><paper keywords="XML XML-QL"><title>Publishing Object Data</title><author name=“Michael Carey” > <affiliation>IBM</affiliation> <email>[email protected]</email> <pcmember/></author><abstract><p>Since its…</p></abstract><body><section ID=“I”> <heading>Intro</heading> <p>XML, … <ref name=“I”>…</ref>…</p></section></body></paper>
![Page 5: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/5.jpg)
5
Other XML Structures
Processing instructions: instructions for applications<?xml version=“1.0”?>
CDATA sections: treat content as char data<![CDATA[<tag>Whatever!!!</tag><whatever>]]>
Comments: just like HTML<!-- Comments -->
Entities: external resources and macros &my-entity; (non-parameter entity) %param-entity; (parameter entity for DTD
declarations)
![Page 6: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/6.jpg)
6
Document Type Descriptor
Inherited from SGML DTD standard BNF grammar establishing constraints on
element structure and content Specification of attributes and their types Definitions of entities
![Page 7: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/7.jpg)
7
Example DTD
<!ELEMENT paper(author*, date, abstract?, body> <!ATTLIST paper keywords CDATA #IMPLIED><!ELEMENT author(affiliation?, email, pcmember?)> <!ATTLIST author name CDATA #REQUIRED> <!ELEMENT affiliation (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT pcmember EMPTY><!ELEMENT abstract (p*|#PCDATA)><!ELEMENT body (section*)> <!ELEMENT section (heading, (p|fig|section)*)> <!ELEMENT p ((b|ref|#PCDATA)*)> <!ELEMENT b (#PCDATA)> <!ELEMENT ref (#PCDATA)> <!ATTLIST ref name IDREF #REQUIRED> <!ELEMENT fig (#PCDATA)> <!ATTLIST fig caption CDATA #IMPLIED>
![Page 8: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/8.jpg)
8
Shortcomings of DTDs
Useful for documents, but not so good for data: No support for structural re-use
Object-oriented-like structures aren’t supported
No support for data types Can’t do data validation
Can have a single key item (ID), but: No support for multi-attribute keys No support for foreign keys (references to other keys) No constraints on IDREFs (reference only a Section)
![Page 9: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/9.jpg)
9
XML Schema
In XML format Includes primitive data types (integers,
strings, dates, etc.) Supports value-based constraints (integers
> 100) User-definable structured types Inheritance (extension or restriction) Foreign keys Element-type reference constraints
![Page 10: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/10.jpg)
10
Sample XML Schema
<schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”><element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>
![Page 11: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/11.jpg)
11
Subtyping in XML Schema
<schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”><type name=“person”> <attribute name=“ssn”> <element name=“title” minOccurs=“0” maxOccurs=“1” /> <element name=“surname” /> <element name=“forename” minOccurs=“0” maxOccurs=“*” /></type><type name=“extended” source=“person” derivedBy=“extension”> <element name=“generation” minOccurs=“0” /></type><type name=“notitle” source=“person” derivedBy=“restriction”> <element name=“title” maxOccurs=“0” /></type><key name=“personKey”> <selector>.//person[@ssn]</selector> <field>@ssn</field></key></schema>
![Page 12: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/12.jpg)
12
Important XML Standards
XSL/XSLT*: presentation and transformation standards
RDF: resource description framework (meta-info such as ratings, categorizations, etc.)
Xpath/Xpointer/Xlink*: standard for linking to documents and elements within
Namespaces: for resolving name clashes DOM: Document Object Model for manipulating
XML documents SAX: Simple API for XML parsing
![Page 13: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/13.jpg)
13
Some Key XML Resources
www.w3.org/XML: W3C XML standards www.oasis-open.org: SGML, XML standards www.xml.org: XML portal xml.apache.org: Apache XML tools (Cocoon,
Xerces, Xalan, etc.) java.sun.com/xml: Sun Java tools alphaworks.ibm.com: IBM tools www.ibm.com/developer/xml: tools, xCentral
search www.xmltree.com: XML directory
![Page 14: 1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000](https://reader035.vdocuments.mx/reader035/viewer/2022081603/5697bf851a28abf838c876f5/html5/thumbnails/14.jpg)
14
Conclusions
XML is emerging as the standard for data publishing and exchange
Based on nested elements, references DTDs and XML Schema provide
constraints on structure Later in this quarter:
Querying, presenting XML Storing XML Integrating XML