on-the-fly validation of xml markup languages using off-the-shelf tools mikko saesmaa pekka...
TRANSCRIPT
On-the-fly Validation of XML On-the-fly Validation of XML Markup Languages using Markup Languages using off-the-shelf Toolsoff-the-shelf Tools
Mikko SaesmaaMikko Saesmaa
PekkaPekka KilpeläinenKilpeläinen
Dept of Computer ScienceDept of Computer Science
University of Kuopio, FinlandUniversity of Kuopio, Finland
EML' 07 Montreal On-the-fly Validation of XML 2
The Talk in a NutshellThe Talk in a Nutshell
How to support continuous validation in an How to support continuous validation in an XML editor... XML editor... – easilyeasily– efficientlyefficiently– against different schema languages?against different schema languages?
» DTD, XML Schema, Relax NG; ..., NVDL for compound DTD, XML Schema, Relax NG; ..., NVDL for compound docsdocs
Lesson: straightforward application of Java-Lesson: straightforward application of Java-XML APIs is effective, and quite efficient, tooXML APIs is effective, and quite efficient, too
EML' 07 Montreal On-the-fly Validation of XML 3
Intended AudienceIntended Audience
Suitable for Java/XML developersSuitable for Java/XML developers– emphasis on practical use of technologyemphasis on practical use of technology
EXTREMEEXTREME enough? enough?– Nothing extraordinary; some ideas non-Nothing extraordinary; some ideas non-
obviousobvious
Why Java?Why Java?– built-in XML and GUI facilities (JAXP + Swing)built-in XML and GUI facilities (JAXP + Swing)– general and portablegeneral and portable
EML' 07 Montreal On-the-fly Validation of XML 4
Look & Feel of ”Xeditor”Look & Feel of ”Xeditor”
- off- off- WF check, as - WF check, as XML or DTDXML or DTD- validate - validate using DTD, using DTD, or against schemaor against schema
EML' 07 Montreal On-the-fly Validation of XML 5
An An ExperimentalExperimental Editor Editor
Not production quality (yet) Not production quality (yet) – UI unfinishedUI unfinished– useful editor functionality missinguseful editor functionality missing– slightly unstableslightly unstable
Purpose to experiment with XML Purpose to experiment with XML technologytechnology– examine, learn, and explainexamine, learn, and explain
EML' 07 Montreal On-the-fly Validation of XML 6
Background and Related Background and Related WorkWork
Travis, <TAG> 1999: “Real-time XML Editor”Travis, <TAG> 1999: “Real-time XML Editor”– > Architag XRay2> Architag XRay2
Oxygen, XMLMind, XMLSpy, ...Oxygen, XMLMind, XMLSpy, ...– internal solutions of commercial editors?internal solutions of commercial editors?
Clark’s nXML (on GNU Emacs)Clark’s nXML (on GNU Emacs)– ~17,000 lines of Emacs Lisp (vs. ~1000 of ours) ~17,000 lines of Emacs Lisp (vs. ~1000 of ours)
DB research on DB research on incrementalincremental XML validation XML validation (Barbosa et al. 2004 & 2006, Balmin et al. 2004)(Barbosa et al. 2004 & 2006, Balmin et al. 2004)
EML' 07 Montreal On-the-fly Validation of XML 7
Architectural solutions of Architectural solutions of XeditorXeditor
Separation of Concerns: Editor (Swing) ignorant Separation of Concerns: Editor (Swing) ignorant of XML, of XML, XMLHandlerXMLHandler (JAXP) ignorant of editing (JAXP) ignorant of editing
After each edit, the doc is re-parsed completelyAfter each edit, the doc is re-parsed completely Pro: modularity and simplicityPro: modularity and simplicity Cons: Cons:
– limited functionality (e.g. markup completion and limited functionality (e.g. markup completion and syntax highlighting) syntax highlighting)
– some inefficiency, but not too badsome inefficiency, but not too bad
EML' 07 Montreal On-the-fly Validation of XML 8
Basic TechnicalitiesBasic Technicalities
Event-driven document checking/validation Event-driven document checking/validation – modifications caught by a Swing modifications caught by a Swing DocumentListenerDocumentListener
– errors caught as SAX parse exceptionserrors caught as SAX parse exceptions
EML' 07 Montreal On-the-fly Validation of XML 9
Error reporting in XeditorError reporting in Xeditor
EML' 07 Montreal On-the-fly Validation of XML 10
Basic Technicalities (2)Basic Technicalities (2)
JAXP is based on a Factory Design PatternJAXP is based on a Factory Design Pattern– Initialization of a Factory selects appropriate Initialization of a Factory selects appropriate
implementation of a parser, schema language, implementation of a parser, schema language, XSLT transformer, ... XSLT transformer, ... (Implementation-independence, (Implementation-independence, pluggabilitypluggability))
– > > extensibilityextensibility by new schema languages by new schema languages
EML' 07 Montreal On-the-fly Validation of XML 11
Basic Technicalities (3)Basic Technicalities (3)
Well-formedness checking and DTD Well-formedness checking and DTD validation based on JAXP SAX interfaces: validation based on JAXP SAX interfaces:
SAXParserFactorySAXParserFactory SAXParserSAXParser – Validating/non-validating parser selected based Validating/non-validating parser selected based
on editing modeon editing mode– SAX instead of DOM relevant for efficiencySAX instead of DOM relevant for efficiency– Only errors are observed; other events ignoredOnly errors are observed; other events ignored
Schema-based validation using JAXP Schema-based validation using JAXP Validators (later)Validators (later)
EML' 07 Montreal On-the-fly Validation of XML 12
Efficiency ConcernsEfficiency Concerns
Rule of thumb: access via memory much Rule of thumb: access via memory much faster than via disk, which again much faster than via disk, which again much faster than over networkfaster than over network
Document passed to parser/validator as an Document passed to parser/validator as an in-memory streamin-memory stream– > normally no observable delays> normally no observable delays
– Could cache external entities, like DTDs, tooCould cache external entities, like DTDs, too
EML' 07 Montreal On-the-fly Validation of XML 13
Different Schemas and Schema Different Schemas and Schema LanguagesLanguages
A Validator created when the user selects A Validator created when the user selects SchemaSchema
EML' 07 Montreal On-the-fly Validation of XML 14
Validation against the Validation against the SchemaSchema
As already shown:As already shown:
EML' 07 Montreal On-the-fly Validation of XML 15
Editing Schema DocumentsEditing Schema Documents
Public schema files for schema Public schema files for schema languages used for validation (as languages used for validation (as external resource files)external resource files)– xsd.rngxsd.rng or or XMLSchema.xsdXMLSchema.xsd for XML for XML
SchemaSchema– relaxng.rngrelaxng.rng for Relax NG for Relax NG
Pro: uniformity; efficiencyPro: uniformity; efficiency Con: imprecision (vs validating by Con: imprecision (vs validating by
appropriate schema compiler)appropriate schema compiler)
EML' 07 Montreal On-the-fly Validation of XML 16
Editing DTDsEditing DTDs
How to check the How to check the nonnon-XML syntax of -XML syntax of DTDs easily?DTDs easily?
EML' 07 Montreal On-the-fly Validation of XML 17
Editing DTDs (2)Editing DTDs (2)
Soln 1: Wrap the DTD in the prolog of a Soln 1: Wrap the DTD in the prolog of a dummy docdummy doc– how to check stand-alone DTDs (“external how to check stand-alone DTDs (“external
subsets”)?subsets”)? Soln 2: Parse a fixed dummy doc that refers Soln 2: Parse a fixed dummy doc that refers
to the DTD:to the DTD:<!DOCTYPE foo PUBLIC "Xeditor DTD" "IN-MEMORY" <!DOCTYPE foo PUBLIC "Xeditor DTD" "IN-MEMORY" [<!ELEMENT foo EMPTY> ]> <foo/>[<!ELEMENT foo EMPTY> ]> <foo/> randomize ”foo”, randomize ”foo”,
to avoid conflict to avoid conflict with actual DTDwith actual DTD
EML' 07 Montreal On-the-fly Validation of XML 18
Editing DTDs (3)Editing DTDs (3)
SYSTEPUBLIC
EML' 07 Montreal On-the-fly Validation of XML 19
Efficiency and ScalabilityEfficiency and Scalability
Is brute-force re-validation too inefficient?Is brute-force re-validation too inefficient? No: Delays normally unnoticeable No: Delays normally unnoticeable
times for validatingtimes for validatingXMLSchema.xsdXMLSchema.xsd
EML' 07 Montreal On-the-fly Validation of XML 20
Validating a Larger InstanceValidating a Larger Instance
appr. speed / secappr. speed / sec
28,000 lines, 1.2 MB28,000 lines, 1.2 MB
52,000 lines, 2.3 MB52,000 lines, 2.3 MB
180,000 lines, 8.1 MB180,000 lines, 8.1 MB
An XML Schema ofAn XML Schema of~ 18,000 lines/800 KB~ 18,000 lines/800 KB
EML' 07 Montreal On-the-fly Validation of XML 21
ConclusionsConclusions
Immediate validation against schemas Immediate validation against schemas in different languages (DTD, XML in different languages (DTD, XML Schema, Relax NG) easy to supportSchema, Relax NG) easy to support
Efficiency of brute-force application Efficiency of brute-force application sufficient for moderate-sized sufficient for moderate-sized documentsdocuments
EML' 07 Montreal On-the-fly Validation of XML 22
Further WorkFurther Work
Potential application: teaching of XML Potential application: teaching of XML Practical enhancementsPractical enhancements
– markup completion, highlighting etc.markup completion, highlighting etc. Incremental validationIncremental validation
– to remove dependency of document lengthto remove dependency of document length– How to apply/modify standard interfaces?How to apply/modify standard interfaces?
Thank you! Questions? Comments?Thank you! Questions? Comments?