schematron and other useful tools

15

Upload: stuart-myles

Post on 11-Jul-2015

2.072 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Schematron and Other Useful Tools
Page 2: Schematron and Other Useful Tools

Schematron(and other useful tools)

Stuart Myles

[email protected]

Page 3: Schematron and Other Useful Tools
Page 4: Schematron and Other Useful Tools

An Aside: AP’s Ingestion Pipleline

This is greatly simplified, obviously.

ATOM + XHTML

XSLT Transform

APPL + NITF

One way we ingest content:

we transform ATOM and XHTML into

our internal XML (APPL) and NITF

Page 5: Schematron and Other Useful Tools

Converting from HTML to XML

<p>The budget was just &pound;100.</p>

<p>How could it be done for so little money?

<p>Luckily open source tools were available.</p>

These are not new problems.</p>

The solutions were even standardized.<p/>

Page 6: Schematron and Other Useful Tools

Hard to enforce rules in the spec

“HeadLine - this element must contain the same

value as the entry’s <title> element”

“summary is required for non-text content items,

such as news photos and video. This element is

optional for text story content items.”

XML structure complies with XSD…

…but can fail in downstream systems

Page 7: Schematron and Other Useful Tools
Page 8: Schematron and Other Useful Tools

Validate and Fix Prior to Ingestion

Original ATOM + XHTML

Tidy fixes sloppy HTML

Custom XSLT tidies up XML

W3C schema validates structure & syntax

Schematron schema validates business rules

Valid ATOM + XHTML, ready for ingestion

Page 9: Schematron and Other Useful Tools

HTML Tidy

Fix sloppy HTML

HTML -> XHTML

Page 10: Schematron and Other Useful Tools

Schematron

Fact checker for XML documents

Business rules that can’t be expressed in W3C XSD schema

• MediaType="Video"

• Format="ANPA1312"

Previously, we had to inspect new feeds to catch errors

The risk is that feeds are approved but errors appear later

(Not to mention manual checking of XML is tedious)

Page 11: Schematron and Other Useful Tools

Schematron

Small, powerful, lightweight fact-checker for XML documents

Schematron Schema

Validate

Specify constraints using XPATH rules

You write the error messages

One time compile

into an XSLT

ReportsValidation reports

Validation as an

XSLT transform

Presence or absence of

specific content

Relationships between

elements and attributes

Page 12: Schematron and Other Useful Tools

Anatomy of a Schematron Rule

<sch:rule context="atom:feed/atom:link">

<sch:assert test="starts-with(@href, 'http://')">

The feed/link/@href must contain an http url

</sch:assert>

</sch:rule>

Establish the context of the rule

with an XPATH expression XSLT-style test establishes

the constraint for each assert

You write the error message to be

used if the assert fails

Page 13: Schematron and Other Useful Tools

DSDL – Pipeline Validation

XSD RELAX NG

Schematron

NVDL

DTTL

CRSL

DSRL

Grammar

Rules

Namespace dispatch

Datatype

Character repertoire

Document Semantic Renaming

Still under development

Page 14: Schematron and Other Useful Tools

Declaratively specify a pipeline (using XML, naturally)

Similar in concept to

Yahoo! Pipes

BizTalk

But XML specific and a W3C standard

Page 15: Schematron and Other Useful Tools

Thanks!