xml tutorial. today’s web: created by hand for-eyes-only can html become smarter? sgml -> xml...

45
XML Tutorial

Upload: byron-lambert

Post on 19-Jan-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

XML Tutorial

Page 2: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

•Today’s web: Created by hand for-eyes-only

•Can HTML become smarter? •SGML -> XML •The next generation web: XML and component-based commerce

•Prologue: XML and EDI

Outline

Page 3: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

A Web Created by Hand for Eyes

•Much of the web is “hand-crafted”•HTML often exploited and extended to achieve specific layout and formatting

•HTML has too low an “Information IQ” to enable many desirable applications

Page 4: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

The Limits of “Hand-crafting”

Numberof Pages

Time to Convert Word Processing Documentand Apply HTML Markup (minutes/page)

10

10000

1000

100

100000

1 10 60

10 minutes 100 minutes 10 hours

100 minutes 16.67 hours 12.5 days

16.67 hours 20.83 days 4.17 months

20.83 days 6.94 months 3.47 years

6.94 months 5.79 years 34.72 years

Page 5: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Low vs. High “IQ” Encoding

•What information can be encoded?How adaptable or flexible is the format for

encoding style, structure, or markup?Can the format tell you what it encodes?

•ASCII is very low IQ: only character info•SGML is highest IQ: encodes anything and completely specifies the encoding rules

•PDF? HTML?

Page 6: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

HTML is too low in IQ

•HTML was designed as a simple markup language

simple structures: headings, lists, linksstrong emphasis on formattingweak for encoding content

•HTML wasn’t designed to encode the structure and semantics needed for complex applications

Page 7: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Web Applications That Need “Smarter” Data

•Data interchange between Web clients•Moving processing from server to client•Multiple client-side views w/o new data•“Information push” from personalized applications

Page 8: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Can HTML be made smarter?

•Create new tags used by your application, or use <META>, DIV, and CLASS (and hope they don’t interfere elsewhere)

•Use a “standard” metadata model (but which one? Dublin Core, PICS, P3, OPS,…)

•Hide applet code in comments (platform dependent?)

•Hack, hack, hack...

Page 9: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Inherent Limitations of HTML

•Not extensible•Limited capability to encode structure •No validation •Lossy interchange

Page 10: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

XML

•Extensible Markup Language - a standard way of creating markup languages for the Web

a file format for data representationa schema for describing data or message

structuresa mechanism for extending and annotating

HTML with semantic information

•XML is a simplification of SGML, the Standard Generalized Markup Language

easier to understand and implement

Page 11: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

HTML Apartment Listing<HTML>

<HEAD>

<TITLE>An Apartment For Rent</TITLE>

</HEAD>

<BODY>

<H1>Apartment</H1>

<P>1800 square feet, 3 bedrooms, 7 baths.

<H2>No pets, smoking forbidden!</H2>

<H3>Amenities:</H3>

<P>

Sunny location, good view, has air-conditioner.

<H3>Location</H3>

<P>2008 South E. Avenue, Eureka, CA

<H3>Cost, Etc.</H3>

<P>Price: $3600 a month

<P>Contact: (415) 123-4567

<P>Available immediately

<P>This offer posted 1 August 1997 in the Eureka Daily Times

</BODY>

</HTML>

Page 12: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

An XML Apartment Listing<?XML VERSION=“1.0”?>

<!DOCTYPE APTLISTING SYSTEM “APTLISTING.DTD”>

<LISTING>

<ADINFO>

<POSTED>March 26, 1997</POSTED>

<WHERE_POSTED>Belmont Courier</WHERE_POSTED>

<CONTACT>(650) 111-2222</CONTACT>

</ADINFO>

<DESCRIPTION>

<AREA>1400 SQUARE FEET</AREA>

<AMENITIES>1 bedroom, 1 bathroom</AMENITIES>

<COMMENT>Small cottage in a big forest</COMMENT>

</DESCRIPTION>

<POLICIES>

<PETS>Not allowed</PETS>

<BOZOS>Not allowed</BOZOS>

</POLICIES>

<COST>$875</COST>

</LISTING>

Page 13: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

But First: One Minute SGML

•Standard Generalized Markup Language, ISO 8879

•SGML defines the “markup language” that specifies the logical rules for a given type of document

•Markup transforms a flat stream of text into a set of objects or elements that can be manipulated by other applications

•Since there is no “universal tag set” that can describe all documents, SGML provides the means for defining the tag set that meets your needs

Page 14: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

SGML’s Big Idea: Document Types

•Idea of document type easy to understand •The Document Type Definition or DTD defines:

the class of documents that shares a common information model

permissible elements and attributes, their contents, the order in which they occur

•The DTD is the “document schema” that makes an instance “self-describing”

•From a DTD a parser can be generated to test any document for conformance

Page 15: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Examples of Document Types

•User manuals•Reference manuals•Directories•Newsletters•Brochures•Catalogs•Datasheets•Proposals•Dictionaries

•Technical reports•Contracts•Regulations•Policies and procedures• Journal Articles•Textbooks•Purchase Orders• Invoices•Recipes

Page 16: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

HTML as a Document Type

•HTML can be described as an application of SGML - the HTML document type

Simple structures: headings, lists, linksStrong emphasis on formatting, weak for

encoding contentNot designed to encode the content

distinctions for any particular industry or application

•But most HTML doesn’t conform to the HTML DTD

Page 17: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Designing a DTD

•Determine information requirements, purposes, uses (and their priorities)

deliver in one or more print and online formatscreate new information productsinterchange with other authors or publishersintegrate information into equipmentmeet company, industry, customer standards

Page 18: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Designing a DTD

•Determine process, tool, external constraints or standards

•Identify and name information components and component containers

•Create categories to organize the components

•Determine when, where, how often components appear

Page 19: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Designing a DTD

•Identify “meta-information” to augment the information components

bibliographic informationprocess and workflow-related information

•Describe the component hierarchy in a graphic notation to visualize it

•Transcribe the graphic notation into formal syntax

•Test the analysis on sample documents•Document the process and the results

Page 20: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

SGML: Close, but no Cigar

•SGML has been successful in niches, but hasn’t been adopted by rank-and-file Web publishers

“the quiet revolution”“the million dollar secret”

•Perceived as too complex (because of features dating from keystroke-minimizing origins)

•Small vendors didn’t have the clout to legitimize SGML in the mass market (but some of them cleverly “dumbed-down” their tools for HTML)

Page 21: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

XML: Right Place, Right Time

•Looks like HTML++, but acts like SGML--•Backed by:

World Wide Web Consortium (W3C)Sun - “give Java something to do”Microsoft - with great enthusiasmNetscape - with less enthusiasmSGML tool vendors and consultantsInnovators in EDI community

Page 22: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Specific XML Proposals to Simplify SGML

•All elements have start and end tags•All attributes are: name=“value”•Changed syntax for EMPTY elements

<toc> => <toc/><graphic file=“x.gif”> => <graphic

file=“x.gif”/>

•No & connector in content models•No inclusions and exclusions•DTD not necessary because it can be inferred if instance is “well-formed”

Page 23: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

XML Adoption Scenarios

•The transition from the “Web for eyes” to the “automated Web”

1st generation: XML leaves HTML alone2nd generation: HTML as output format

created from XML instance3rd generation: XML repositories

Page 24: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

1st Generation XML

•No disruption of existing HTML production processes

•XML production process may have nothing to do with HTML production process

•XML for processes, HTML for eyes, but XML and HTML can be linked together

Page 25: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

1st Generation XML Leaves HTML as is

CREATION DELIVERY

conversion toHTML

datasource

XML

HTML “for eyes”

conversion toXML

Page 26: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

2nd Generation XML

•Creation of XML is primary process•Replace “hand-crafted” HTML with automated down-translation

•Alternatively, use XML style sheet to create HTML-like presentation(s)

•“instance at a time” retargeting

Page 27: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Up & Down Translation

Content/structure-based text objects:SGML, XML, databases

Formatted electronic text:HTML, word processing files

Unstructured electronic text:ASCII

Printed text

More

str

uct

ure

(en

erg

y)

Easie

r to tra

nsla

te to

Page 28: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

2nd Generation XML Restores Order

XMLsource

conversion toXML

datasource

“HTML-like”

XML

downtranslate

downtranslate

HTML

HDML

XML stylesheet(s)

downtranslate

Page 29: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

HTML as an Output Format

•Treating HTML as an output format generated from an SGML source repository insulates you from ongoing changes to HTML and the latest proprietary extensions

•HTML created by “down translation” can be richer in structure and more consistent that HTML created by hand at many times the cost

Page 30: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

3rd Generation XML

•reuse, not just retargeting•XML a first-class citizen from the start•content-oriented DTD•native authoring, or enhanced markup by editorial or production staff

•no longer file at a time, create db and work on it

•support for custom applications

Page 31: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

3rd Generation XML Repository

Input 1

Input 2

Input 3

Output 1

Output 2

Output 4

Output 3

Input 4

“up-translation”or decom-position

“down-translation”or assembly

X

M

L

Page 32: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Retargeting and Reuse Requirements

•different delivery channelsWebCD-ROM, CD-ROM + Web hybridsBraille, large print, voice synthesis (ICADD)

•different “dialects” of HTML for different browsers or bandwidths or as HTML changes

•different applications (“slice and dice”)reference manual vs help vs tutorial

Page 33: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

XML for the Web’s “Little Languages”

•CDF -- “channel definition format”, eliminates need for proprietary “push” plug-in

•OSD -- “open software description”, for describing configurations for automated distribution of software

•PICS -- for content ratings•RDF -- “resouce description framework”, merging Netscape and Microsoft metadata initiatives

•CBL -- common business language in eCo framework

Page 34: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

The Next-Generation Web

The Web is eyeballs-only

Metadata and Object APIs -- “self-describing smart Web”No content encoding

Distributed registries and structure-based retrieval

Things can’t be found

Web catalogs and documents in their “native schema”

No automation of tasks

Agent-based run-time environment

PROBLEMS SOLUTIONS

Page 35: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

The Internet Today

FTP Server

Web ServerDocuments

Database

Database

Application

Application

Web ServerDocuments

Web ServerDocuments

Page 36: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

A Commerce Type Definition (CTD)<!Doctype Taxonomy public "-//CommerceNet//DTD Taxonomy V1.0//EN">

<Taxonomy>

<Head>

<Label>United Airlines</Label>

<Version>1.0</Version>

<Base>World Airline Registry:1.1:2.3.7</Base>

<Registry>toe.commerce.net:2111</Registry>

</Head>

<Body>

<Services>

<Passenger_Flight_Information>

<Flight_Number>UA #200</Flight_Number>

<Flight_Price US>$168.50</Flight_Price US>

<Flight_Dest>Honolulu, Hawaii</Flight_Dest>

</Passenger_Flight_Information>

<Cargo_Flight_Information>

</Cargo_Flight_Information>

</Services>

</Body>

</Taxonomy>

Page 37: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Step 1: XML Metadata

FTP Server

Web ServerDocuments

Database

Database

Application

Application

Web ServerDocuments

Web ServerDocuments

CTDCTD

CTD

CTD

CTD

CTD

CTD

CTD

Page 38: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Step 2: Registries

FTP Server

Web ServerDocuments

Database

Database

Application

Application

Web ServerDocuments

Web ServerDocuments

CTDCTD

CTD

CTD

CTD

CTD

CTD

CTD

Registry

Registry

Registry

Registry

Page 39: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Common Business Language (CBL)

•Who am I?Company name, contact, public key

certificates

•What am I?Agent/object (API), document (DTD),

database (schema)

•Available dataProduct list, price list, terms and conditions,

catalog, order form

•Available servicesBuy, sell, RFQ, search catalog

Page 40: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Step 3: CBL Components

FTP Server

Web ServerDocuments

Database

Database

Application

Application

Web ServerDocuments

Web ServerDocuments

CTDCTD

CTD

CTD

CTD

CTD

CTD

CTD

Registry

Registry

Registry

Registry

Page 41: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Step 4: Agents

FTP Server

Web ServerDocuments

Database

Database

Application

Application

Web ServerDocuments

Web ServerDocuments

CTDCTD

CTD

CTD

CTD

CTD

CTD

CTD

Registry

Registry

Registry

Registry

Agent

Agent

Agent

Page 42: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Step 5: Business Services

FTP Server

Web ServerDocuments

Database

Database

Application

Application

Web ServerDocuments

Web ServerDocuments

CTDCTD

CTD

CTD

CTD

CTD

CTD

CTD

Registry

Registry

Registry

Registry

Agent

Agent

Agent

Trust Intermediaries

Matchmaking Services

Page 43: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Wrapping Up

•HTML will continue to exist, but most serious publishers will produce HTML and XML versions of their content from the same “smarter” source

•XML unifies document and database perspectives and tools for Web publishing and lets them be automated in the same way

Page 44: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Prologue: XML and EDI

•XML appeals to the EDI community because:

it reinforces the move to Internet EDIit suggests a way to make transaction

sets easier to define and “self-describing”

•But which kind of XML/EDI?incremental strategy of wrapping existing

EDI transactions in XML syntaxradical re-thinking of EDI to create XML

“fragments” for transaction components that are dynamically combined as needed

Page 45: XML Tutorial. Today’s web: Created by hand for-eyes-only Can HTML become smarter? SGML -> XML The next generation web: XML and component-based commerce

Learning More

•The “mother of all information” about XML is the “SGML Home Page” - www.sil.org/sgml/xml.html

•Best overall book for managers to get started with SGML and XML is ABCD…SGML by Liora Alschuler

•Best overall book for HTML-savvy types is SGML on the Web by Yuri Rubinsky & Murray Maloney