xml for information management – day 2 airi salminen university of erlangen-nuremberg...

23
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/ 12.1.-16.1. 2009 XML for Information Management

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

XML for Information Management – Day 2Airi Salminen

University of Erlangen-NurembergComputational Linguistics

Instructor: Professor Airi Salminenhttp://users.jyu.fi/~airi/

12.1.-16.1. 2009

XML for Information Management

2XML for Information Management – Day 2Airi Salminen

1. Markup languages2. Structured documents3. World Wide Web Consortium

Day 2: Background of XML

Outline

3XML for Information Management – Day 2Airi Salminen

1. Markup languages

•intended for human readers

•intended for computers

Markup

4XML for Information Management – Day 2Airi Salminen

•punctuational

•presentational

Markup for human readers

Texthasalwaysincludedsomekindofmarkupalsobeforethetimeofcomputers

to clarify the written expression

Text has always included some kind of markup, also before the time of computers.

Text has always included some kind of markup, also before the time of computers.

1. Markup languages

5XML for Information Management – Day 2Airi Salminen

• presentational

• procedural

• descriptive

Markup for computers

to provide information for a software module

In markup languages clear separation of markup and primary content. Markup is metadata, adding some information to the primary data.

1. Markup languages

6XML for Information Management – Day 2Airi Salminen

Presentational markup

information about the way the software module should present the primary content to the human perceiver

In <i>markup languages</i> there is clear separation of <i>markup</i> and <i>primary content</i>. Markup is <i>metadata</i>, adding some information to the primary data.

The tags <i> and </i> represent presentational markup in HTML.

1. Markup languages

The markup in an HTML file

7XML for Information Management – Day 2Airi Salminen

Procedural markup

a processing instruction for the software module

<![CDATA[<element>Example of an XML element</element>]]>

The strings <![CDATA[ and ]]> represent procedural markup in XML.

<![CDATA[ instructs the XML processor to regard all text before ]]> as character data

]]> instructs the XML processor to to continue normal identification of markup

<![CDATA[<element>Example of an XML element</element>]]>

1. Markup languages

The markup in an XML file

8XML for Information Management – Day 2Airi Salminen

Declarative markup

describes the content of a piece of primary content, what it is, or declares that the piece is a member of a particular class<student><first_name>Steve</first_name><last_name>Chung</last_name><email>[email protected]</email></student>

XML is primarily for declarative markup.

1. Markup languages

The markup in an XML file

9XML for Information Management – Day 2Airi Salminen

Markup in XML

‣All markup delivers information to XML Processor. DTD represents metamarkup, facilitating the definition of the markup vocabulary.

‣Markup in an XML document is usually classified in respect to the application.

‣Processing instructions represent procedural markup.

‣Element tags represent declarative markup.

‣ In the specification of an XML application different kinds of meanings can be given to element names, they can be processing instructions to the application or instructions about the way the content should be presented by the application.

1. Markup languages

10XML for Information Management – Day 2Airi Salminen

Example of HTML markup

<html><head><title>University of Jyv&auml;skyl&auml; </title></head><body><h2>Faculties</h2><ul><li>Humanities<li>Information Technology <li>Social Sciences</ul><br><address>[email protected]</address></body></html>

The element markup describes the structure for WWW publishing.

1. Markup languages

11XML for Information Management – Day 2Airi Salminen

<university><name>University of Jyväskylä</name><faculties>Faculties<faculty>Humanities</faculty><faculty>Information

Technology</faculty><faculty>Social Sciences</faculty></faculties><contact_email>[email protected]</

contact_email></university>

The same primary content with markup describing the content of elements by means of XML markup.

1. Markup languages

12XML for Information Management – Day 2Airi Salminen

1. Markup languages

Logical structure of the HTML document

html

body

Faculties

University of Jyväskylä

Humanitieshead

[email protected]

br

title

h2

ul

Social Sciences

Information Technology

li

li

li

address

Logical structure of the XML document

university

faculties

Faculties

University of Jyväskylä

Humanitiesname

[email protected]

Social Sciences

Information Technology

faculty

contact_email faculty

faculty

13XML for Information Management – Day 2Airi Salminen

2. Structured documents

Structured document

‣ structure, content, and external presentation can be separated from each other and processed separately

‣ structural components have names

‣ structural components can be recognized by software modules

‣ possible to define the structure

14XML for Information Management – Day 2Airi Salminen

Structured document

Structure

Content

Layout

2. Structured documents

an open language standard,

e.g. SGML, XML

different languages for defining the layout, e.g., CSS and XSL for XML

different languages for defining the structure,

e.g., DTD, XML Schema, RELAX NG for XML

15XML for Information Management – Day 2Airi Salminen

Structured document

Structure

Content

Layout

2. Structured documents

Example

DTD.txt

rhymes.txt rhymes.xml

style.txt style.css

rhymes with style attachment.xml

rhymes with style attachment.txt

16XML for Information Management – Day 2Airi Salminen

Management of structured documents

‣ document management

‣ management of the data contained in documents

2. Structured documents

17XML for Information Management – Day 2Airi Salminen

Characteristics in the management of structured documents

‣ Design. Adopting the approach of structured document management in an environment often requires careful planning before the creation of documents. Includes schema design and layout design.

‣ Content production. Content can be produced by different types of software, e.g. by a syntax-directed editor. Checking the validity against the schema.

‣ Evolution. Schema versioning, layout versioning.

‣ Operations. Most typical operation is some kind of transformation.

‣ Software. Many kinds of software systems used.

2. Structured documents

18XML for Information Management – Day 2Airi Salminen

2. Structured documents

Traditional document management

Structured document management

- No schema design.

- Processing applied to a document.

- Content, structure, and layout together.

- Schema design important. Also layou designed.

- Schemas can be utilized in various ways. Semantic information attached in the schemas.

- Processing of document parts.

- Content, structure, and layout can be processed separately.

- Management required for content schema, and stylesheet items and their different versions.

19XML for Information Management – Day 2Airi Salminen

2. Structured documents

Database management Structured document management

- Database often the information repository of one software system called Database Management System (DBMS), data processed by the operations of the DBMS.

- Design divided into schema design and view design.

- Content produced gradually, by the operations of the DBMS.

- Queries are the most important operations.

- Different software systems used to manipulate data.

- Schema design often related to extensive sectoral standard development. Layout requires design as well.

- Content produced by different kinds of programs, e.g. interactively by structure editors or automatically.

- Transformations most important operations.

20XML for Information Management – Day 2Airi Salminen

Database languages

‣ definition languages‣ query languages

Structured document languages

‣ definition languages‣ style languages‣ various manipulation, transformation

and query languages

2. Structured documents

21XML for Information Management – Day 2Airi Salminen

3. World Wide Web Consortium

‣W3C developes specifications to support the use of the web, publicly available at http://www.w3.org/TR/

‣Development is systematic

‣Development process is specified and published

22XML for Information Management – Day 2Airi Salminen

‣Working Draft: represents work in progress.

‣Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback.

‣Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review.

‣Recommendation: represents consensus within W3C, widespread implementation encouraged.

Phases of the development process

3. World Wide Web Consortium

23XML for Information Management – Day 2Airi Salminen

3. World Wide Web Consortium

‣Remains as a Recommendation indefinitely.

‣W3C rescinds the recommendation. A report called Rescinded Recommendation is published.

‣A new version of the Recommendation is developed.

‣Minor modifications are done. A report called Proposed Edited Recommendation is published.

What happens to a W3C Recommendation?