validating xml data with an xml schema · 1 validating xml data with an xml schema date: may 2007...
TRANSCRIPT
2
Contents1. XML Validation Concepts
a. Conceptsb. Errorsc. Resources
2. Example: Validation with XMLSpya. Downloading Spyb. Creating a new XMLSpy Projectc. Associate the homestead XML Schema with a folderd. Open the file in XMLSpye. Add the active file to the folderf. Click the "Validate" button
3. Example: Manipulating Large XML Data Sets with Ant & Eclipsea. Tools for Records and Metadata vs. Tools for Datab. Apache Ant – DOS command linec. Eclipse – GUI interfaced. V – The File Viewer – Viewing large filese. XML databases
3
Disclaimer• The information and examples in this document are for
demonstration purposes only.
• The information and examples presented are for your information to assist in enhancing the abilities of counties to work with and validate XML datasets with Minnesota Revenue XML schemas.
• The Minnesota Department of Revenue does not endorse nor support any products mentioned in this presentation. It is beyond the scope of the mission of the Property Tax Division to support tools within each county.
• Your staff is responsible for assuring that your tools match you business requirements.
4
XML Validation Concepts
<XML File/> Validation errors
Validates<XML Schema/>
XML Validator
If you have: 1) A valid XML file. And 2) a well defined XML Schema, you can 3) check the XML file to see if it is XML and has all the
required tags defined by the schema with any standard XML validation program.
This is called validation.
5
XML Validation Concepts• XML is a text file where well defined tags surround
each data value.
• An XML Schema describes what tags are needed and where they need to be for a particular file.
Tag example: <Zip_Code>55101</Zip_Code>
<xs:element name="Zip_Code"><xs:simpleType>
<xs:restriction base="xs:string"><xs:pattern value=“[0-9]{5}"/>
</xs:restriction></xs:simpleType>
</xs:element>
This fragment from an XML Schema defines a tag for Zip_Code
6
XML Validation Errors
If you have:
1) An invalid XML file: You get an invalid XML, malformed XML or content error. Examples are missing tag brackets or other syntaxerrors.
2) A valid XML file with tag errors: You get a reasonable list of XML tag errors found that are inconsistent with the specific XML Schema being validated against.
<XML File/> Validation errors
Validates<XML Schema/>
XML Validator
Tag example: <Zip_Code>55101</Zip_Code>
7
double quote
single quote or apostrophe
ampersand
greater than
less than
Name
""
''
&&
>>
<<EscapeCharacter
There are five characters are used in XML syntax that cannot be used directly in a data value. They must be “escaped” by representing the character using the ampersand representation
XML Validation Errors for XML Escape Characters
8
10 Common XML Transmission Errors1. Mal-formed XML2. Missing namespace declarations3. Invalid document structure4. Missing required element5. Missing data in element6. Invalid document type code values7. Invalid property type code value8. Invalid character values9. Incorrect number of repeating fields10. Incorrect tax year
For more information about XML Errors, please also refer to the document: XML and XML Errors
9
XML & Validation Resources• W3C XML Standards Page – http://www.w3.org/XML/
• OASIS XML Cover Pages –http://xml.coverpages.org/xml.html#xmlValResources (lots of references)
• XML.com – http://www.xml.com (up-to-date XML information)
• XML.com Schema Tools –http://www.xml.com/pub/a/2000/12/13/schematools.html (older list of schema tools)
• XMLSpy – http://www.Altova.com (free 30 day eval xml tools and validation)
• XMLStar – http://xmlstar.sourceforge.net (free tools and validation)
11
Validating with XMLSpy Steps1. Download XML Spy (30 day free eval)
and homestead zip file2. Create a new XML Spy Project3. Associate the homestead XML Schema
with a folder4. Open the file in XMLSpy5. Add the active file to the folder6. Click the "Validate" button
12
Download XML Spy• http://www.altova.com/products/xmlspy/xml_editor.html
Altova will e-mail you a 30 day license key
13
Download Homestead Files
15
New Project Window
• Note: if the window is not visible use the Window/Project menu to show the project window
16
Set the Properties of the XML Folder• Right click over the XML
files folder in the project view
• NOTE: RIGHT CLICK not left click
18
Browse… to homestead schema
• Click OK and then double click on yourxml data file to be validated
21
View Results in Validation View
• If your file is valid a green check will appear in the validation view
• Error message will appear in this same window
22
File Size Limitations• XMLSpy tends to have problems validating
files over about 25MB on a system with 1GB of RAM
• Use Apache Ant and/or Eclipse if you want to validate larger files
24
Agenda• Tools for Records and Metadata vs. Tools
for Data• Apache Ant
– DOS command line• Eclipse
– GUI interface• V – The File Viewer
– Viewing large files• XML databases
25
Records vs. Databases• XML File Viewers (like XML Spy) are ideal
for viewing single records and metadata (XML Schemas)
• Visual editing tools tend stop working when file sizes exceed about 25MB (given 2GB of RAM) (e.g. We don't use MS-Word to edit 100,000 records in a database)
• Other tools are more appropriate for debugging large data sets
26
In Memory vs. Streaming • There are several different approaches to
checking large files– Load the entire file into memory (DOM)– Stream the file through memory (SAX)– Page only relevant sections into memory
(Chunking – used in V-The-File-Viewer)
27
Apache Ant• Open source build manager• User give ant a high-level description of a task• Ant executes task using dependency analysis
(only validate after extract)• Called from shell (DOS or UNIX)• Called from Integrated Development
Environment (IDE)
See Wikipedia "Apache Ant"
http://www.uniontransit.com/apache/ant/binaries/apache-ant-1.7.0-bin.zipDownload Link
30
Adding tools.jar• Apache ant needs one missing jar file call
"tools.jar" that is free with Sun's Software Development Tools
• It is freely available from the Java download as part of the JavaSDK 1.4+ (but not the JDK)
• Temporary file is on the Java Open Source User Group JOSUG web site (www.josug.org/tools.jar)
• File is about 6MB!• This must be in your build "Classpath"
31
Apache Ant 1.7• Many new features• Simple <schemavalidate> target• Faster execution
<schemavalidatenoNamespaceFile="homestead-data_v0.28.xsd"file="my-homestead-data.xml">
</schemavalidate>
path to your xml schema
path to your xml data
32
<?xml version="1.0" encoding="UTF-8"?><project default="validate-homestead">
<property name="SrcDir" value="C:/homestead/stress-test"/><property name="SchemaDir" value="C:/homestead/schemas"/>
<target name="validate-homestead"><schemavalidate
noNamespaceFile="${SchemaDir}/homestead-data_v0.28.xsd"file="${SrcDir}/100MB-test.xml">
</schemavalidate></target>
</project>
Ant From DOS Command Line
1. Download Apache Ant version 1.7.02. Copy the build.xml into a directly3. Change file locations in properties of the build file to match your local files4. Run ant.bat (using the full path name) in folder that build file is located in
Change theseto match yourlocal system
build.xml
33
Apache Ant Tasks• schemavalidate
– New Ant 1.7 optional task just for XML Schema• xmlvalidate
– very general Ant 1.6 task for validation of XML files– check for well-formed files– check for validation against an XML Schema
• xslt – transforms XML files• replace
– replace specific text in large files
34
schemavalidate options
http://ant.apache.org/manual/OptionalTasks/xmlvalidate.htmlhttp://ant.apache.org/manual/OptionalTasks/schemavalidate.html
36
Sample Ant 1.6 Validate Script
• This will validate only the 100MB-test.xml file• Replace this with *.xml and all XML files in the source directory will be validated
37
Eclipse
• OpenSource Integrated development environment originally sponsored by IBM
• "GUI" front end to Apache Ant
See http://www.eclipse.org/
39
Complete Ant 1.7 Build File<?xml version="1.0" encoding="UTF-8"?><project default="validate-homestead">
<property name="DataDir" value="C:/homestead/data-files"/><property name="SchemaDir" value="C:/homestead/schemas"/>
<target name="validate-homestead"><schemavalidate
noNamespaceFile="${SchemaDir}/homestead-data_v0.28.xsd"file="${DataDir}/my-data-file.xml">
</schemavalidate></target>
</project>
Properties can be set once in the file and reference many times.This makes your build files easier to maintain.
41
XML Transform• View a homestead record of a specific
parcel ID
Big File(Gigabytes)
XMLTransform
With MatchingRules
VerySmallFile
match
no match
42
Sample XML Transform<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:mn="http://data.state.mn.us"xmlns:c="http://niem.gov/niem/common/1.0"xmlns:u="http://niem.gov/niem/universal/1.0"xmlns:mnr="http://revenue.state.mn.us"xmlns:mnr-ptx="http://propertytax.state.mn.us"><xsl:output indent="yes" exclude-result-prefixes="mn mnr c u mnr-ptx"/>
<!-- only display the homestead record for this parcel ID --><xsl:template
match="/HomesteadRecordsDocument/CountyHomesteadRecord/HomesteadParcels/HomesteadParcel/CountyPropertyTaxStatement[mn:ParcelID='1234567']">
<!-- copy the CountyHomesteadRecord that matched this parcel ID to the output --><xsl:copy-of select="../../.."/>
</xsl:template>
<!-- do not output anything else --><xsl:template match="@*|node()">
<xsl:apply-templates select="@*|node()"/></xsl:template>
</xsl:stylesheet>
43
V-The File Viewer• $20 application (less
in quantity)• Easily allows viewing
of files greater than 1GB (uses file "chunking" technology)
• Note: read-only toolSee http://www.fileviewer.com/
Opens multi-gigabyte files in a few seconds
45
XML Databases• XML databases store XML in its native
format• You can associate a column in your
databases or a "collection" with the homestead XML Schema
• This allows you to have the database itself validate data before transmission to the state
46
Example of XML Databases• IBM DB2 version 9 "PureXML"
– free and low-cost "express" versions for development and testing
• eXist (open source)– native XML database with XML Schema validation
• Over 50 other free and low-cost solutions with 30, 60 or 90 day evaluation periods
http://www.rpbourret.com/xml/XMLDatabaseProds.htm
47
DB2• IBM DB2 version 9 supports fast searches
on complex XML data sets• Load records into XML datatype• Records are quickly validated using an
XML Schema• Searching is very fast
48
eXist• Open source• Built in web-administration• Easy to setup and configure• Allows data to be validated on insert• Fast searches• Every XQuery IS a REST web service
49
Microsoft SQL Server 2005• Supports native XML datatype• Supports fast indexing• Add SOAP services to XML documents• Support for XQuery and XQuery updates