Download - Satisfy Your Technical Curiosity Open XML Deep Dive Doug Mahugh Technical Evangelist, Microsoft
Satisfy Your Technical Curiosity
Open XML Deep DiveOpen XML Deep Dive
Doug MahughDoug MahughTechnical Evangelist, MicrosoftTechnical Evangelist, Microsoft
http://blogs.msdn.com/dmahugh
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Session ObjectivesSession Objectives
Satisfy your curiosity about Open XML:Satisfy your curiosity about Open XML:ArchitectureArchitectureThe three main Open XML schemasThe three main Open XML schemasDevelopment optionsDevelopment optionsCustom XML supportCustom XML support
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Diverse EnvironmentsDiverse EnvironmentsAll you need is ZIP and XML supportAll you need is ZIP and XML support
Linux Java Microsoft COM
ZIP LibraryMinizip
zLib
J2SEjava.util.zip
.NET Framework 3.0
System.IO.Packaging *
Xceed .NET controls
Xceed ActiveX controls
XML Library Apache Xerces JAXP .NET Framework 3.0System.Xml MSXML
* Also includes abstractions for OPC concepts (Open Packaging Convention)
Satisfy Your Technical Curiosity
Scenario ExampleDocument AssemblyServer-based or user-assisted construction of documents from archived content or database content.
Create sales reports from financial and forecast data stored in a CRM system.
Integration & Content ReuseMuch easier to move content between documents, including different document types.
Quickly and efficiently apply content stored in Word documents to Web pages.
Document SanitizationRemove unwanted content like comments, embedded code or potentially sensitive items from your document when appropriate.
Remove all tracked changes and comments from a Word document before it is published.
Document InterrogationQuery document repositories based on custom data, content types or document metadata.
Search for all documents containing a specific company name or sales contact.
Content TaggingAdding a tagging schema to content can dramatically improve content searches and the value of the data stored in documents.
Organizations can create their own smart tags then use them as the basis for searches.
Document ArchivalEnsuring document formats can be consumed long into the future without vendor-specific clients or applications.
XML-based document archives include the data and presentation information.
Development ScenariosDevelopment Scenarios
Satisfy Your Technical Curiosity
Application type: Document AssemblyServer environment: Linux, Java, Apache, MySqlDesktop environment: Office 2007
Satisfy Your Technical Curiosity
Architectural OverviewArchitectural Overview
Satisfy Your Technical Curiosity
XML in Office: the last 10 XML in Office: the last 10 yearsyears
Office 2000Early InnovationXML Document Properties
Office 97Existing binary file formats designed in 1994, launched in Office 97
Office XPFirst XML FormatsSpreadsheet XML
Office 2003Breakthrough XML SupportWordProcessingML, SpreadsheetMLCustom-defined schema
2007 Office systemNew XML-based FormatsXML File format DefaultXML PowerPoint Format
Satisfy Your Technical Curiosity
Open XML ArchitectureOpen XML Architecture
WordprocessingMLWordprocessingML SpreadsheetMLSpreadsheetML PresentationMLPresentationML
ZIPZIP XML + UnicodeXML + Unicode
DrawingMLDrawingML
Content TypesContent Types
Custom XMLCustom XML BibliographyBibliography
Shared Vocabularies
RelationshipsRelationships
MetadataMetadata
DigitalSignatures
DigitalSignatures
VML (legacy)VML (legacy) EquationsEquations
Markup Languages
Open Packaging Convention
Core Technologies
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Open Packaging ConventionOpen Packaging ConventionLow-level conventions that define the structure of Low-level conventions that define the structure of an Office Open XML documentan Office Open XML document
Also used by XPS, and some third-party Also used by XPS, and some third-party implementations are under developmentimplementations are under development
Key concepts: package, parts, relationships, and Key concepts: package, parts, relationships, and content typescontent types
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
The PackageThe PackageLogical entity that holds a collection of partsLogical entity that holds a collection of parts
OPC does not require a specific physical OPC does not require a specific physical representation, but in the case of Open XML representation, but in the case of Open XML documents a ZIP package is useddocuments a ZIP package is used
Package-level entities:Package-level entities:Core propertiesCore propertiesThumbnail partsThumbnail partsDigital signaturesDigital signatures
Satisfy Your Technical Curiosity
Core PropertiesCore PropertiesCommon to all OPC documentsCommon to all OPC documents
CategoryCategoryContentTypeContentTypeCreatorCreatorDescriptionDescriptionIdentifierIdentifierKeywordsKeywords
ModifiedModifiedRevisionRevisionLastModifiedByLastModifiedBySubjectSubjectTitleTitleVersionVersion
Based on the Dublin Core properties
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PartsPartsBuilding blocks of an OPC packageBuilding blocks of an OPC package
Stored inside the package in a specific locationStored inside the package in a specific locationReachable via a URIReachable via a URIAssociated with a specific content typeAssociated with a specific content type
Often XML, but can be of any defined content type (including custom Often XML, but can be of any defined content type (including custom types)types)
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
CCoontent Typesntent TypesTell the consumer what to do with each partTell the consumer what to do with each part
Every part must have a content typeEvery part must have a content typeMost OXML parts are content type XMLMost OXML parts are content type XMLConsumers support a specific set of content Consumers support a specific set of content typestypes
You can define custom content types, and You can define custom content types, and consumers will ignore them – this is a key area consumers will ignore them – this is a key area of opportunity for developer innovationof opportunity for developer innovation
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
RelationshipsRelationshipsDefine how parts are interrelatedDefine how parts are interrelated
Tie elements inside the package to each otherTie elements inside the package to each other
Allow you to step through the document without Allow you to step through the document without parsing partsparsing parts
Are required: a part without a relationship is not Are required: a part without a relationship is not part of the package, and may be discardedpart of the package, and may be discarded
Satisfy Your Technical Curiosity
OPC is a OPC is a LogicalLogical Structure StructurePhysical implementation details may varyPhysical implementation details may vary
Files and folders – NO!Files and folders – NO!These details may vary.These details may vary.
Parts should be referenced by Parts should be referenced by their their relationship type.relationship type.
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Open XML DocumentOpen XML Document
File ContainerFile Container
Document PropertiesDocument Properties
CommentsComments
WordML / Spreadsheet ML
WordML / Spreadsheet ML
Custom XMLCustom XML
Embedded CodeEmbedded Code
Images / Video / SoundImages / Video / Sound
User view:a single document
Document parts: Most parts are XML Each XML part is a discrete component Can add, extract and modify individual parts with any ZIP library Corruption of any part should not prohibit the file from opening
Developer view:modular file, many parts
Satisfy Your Technical Curiosity
Reference SchemasReference SchemasDisplay-orientedDisplay-orientedEnables Enables technicaltechnical interoperability interoperability
Custom-defined SchemasCustom-defined SchemasData-orientedData-orientedEnables Enables semanticsemantic interoperability interoperability
Brian Jones, ODC2006
The Role of XMLThe Role of XML
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Interoperability StrategiesInteroperability Strategies
Interoperability between desktop applications Interoperability between desktop applications requires requires technicaltechnical interoperability interoperability
Applications: Office, OpenOffice, WordperfectApplications: Office, OpenOffice, WordperfectSolutions: Open XML, ODF, PDF, HTML, RTFSolutions: Open XML, ODF, PDF, HTML, RTF
Interoperability with other types of systems Interoperability with other types of systems requires requires semanticsemantic interoperability interoperability
Applications: LOBs, web services, databasesApplications: LOBs, web services, databasesSolutions: custom schemas, metadata standardsSolutions: custom schemas, metadata standards
Microsoft Confidential
Custom-defined XML isCustom-defined XML isstored in its own discrete partstored in its own discrete partAny XML can be stored, withAny XML can be stored, withor without a schemaor without a schemaOnly one requirement:Only one requirement:must be well-formed XMLmust be well-formed XML
External applications (client/server) can process External applications (client/server) can process the store or populate the storethe store or populate the store
Document Template
Visualdocument
partsXMLdata
External System
Microsoft Confidential
Link content controls to nodes in the XML data Link content controls to nodes in the XML data storestore
Mappings are created using standard XPath Mappings are created using standard XPath expressionsexpressions
Customers
Satisfy Your Technical Curiosity
How custom XML parts enable round-trip interoperability
Satisfy Your Technical Curiosity
XML Deep DiveXML Deep DiveExploring the details of Open XML markupExploring the details of Open XML markup
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
WordprocessingMLWordprocessingMLDocument Document aarchitecturerchitecture
DocumentDocument
bodybodypropertiesproperties
fontTablefontTable
headers/footersheaders/footers
imagesimages
numberingDefinitions
numberingDefinitions
stylesstyles
customXMLcustomXML
footnotes/endnotesfootnotes/endnotes
commentscomments
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraphs, Runs and TextParagraphs, Runs and TextHow text is stored in wordprocessingMLHow text is stored in wordprocessingML
The document elementThe document element• Contains a body elementContains a body element
• Contains paragraphsContains paragraphs• Contains runsContains runs
• Contains text elementsContains text elements<document> <body> <p> <r> <t>HELLO!</t> </r> </p> </body></document>
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraph ExampleParagraph ExampleSimple formatting at paragraph/run levels:Simple formatting at paragraph/run levels:
Paragraph properties specify bold (default for the entire paragraph)
<w:p> <w:pPr> <w:b/> </w:pPr> <w:r> <w:t>The quick</w:t> </w:r> <w:r> <w:rPr> <w:i/> </w:rPr> <w:t>brown</w:t> </w:r> <w:r> <w:t>fox.</w:t> </w:r></w:p>
Run properties specify italics (override for this run)
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraph PropertiesParagraph PropertiesCan be set directly or in a paragraph styleCan be set directly or in a paragraph style24 total property settings24 total property settings
<w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content …</w:p>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Runs Runs <w:r><w:r>
A run is a region of text with a common set of A run is a region of text with a common set of propertiespropertiesAll text must be contained within runsAll text must be contained within runsAll runs must be contained within paragraphsAll runs must be contained within paragraphsA run contains three types of information:A run contains three types of information:
Run propertiesRun propertiesRun content (text, fields, line breaks, images)Run content (text, fields, line breaks, images)Optional revision IDs for document comparisonOptional revision IDs for document comparison
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Run/Text StructureRun/Text StructureProducers may break runs/text arbitrarilyProducers may break runs/text arbitrarily
<w:p> <w:r> <w:t xml:space=“preserve”>These examples are functionally identical.</w:t> </w:r></w:p>
<w:p> <w:r> <w:t xml:space=“preserve”>These </w:t> <w:t xml:space=“preserve”>examples </w:t> </w:r> <w:r> <w:t xml:space=“preserve”>are </w:t> <w:t xml:space=“preserve”>functionally </w:t> </w:r> <w:r> <w:t>identical.</w:t> </w:r></w:p>
These examplesare functionallyand logicallyequivalent
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Define formatting forDefine formatting forindividual charactersindividual charactersFont attributes, size/position,Font attributes, size/position,other settingsother settings24 total properties24 total properties
Run PropertiesRun Properties
<w:r> <w:rPr> <w:rFonts w:ascii=“Arial” w:hAnsi=“Arial” w:cs=“Arial” /> <w:b/> <w:i/> <w:sz w:val=“11” /> <w:dstrike w:val=“true” />
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Text Text <w:t><w:t>
The only element in the main story that can The only element in the main story that can contain text contain text – all other text is in attributes– all other text is in attributesThree other types of text are allowed in runs:Three other types of text are allowed in runs:
Deleted text Deleted text <w:delText><w:delText>Field code Field code <w:instrText><w:instrText>Deleted field codes Deleted field codes <w:delInstrText><w:delInstrText>
By looking to <w:t> nodes, you can be sure By looking to <w:t> nodes, you can be sure you’re seeing only displayed textyou’re seeing only displayed text
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Revision IDs (RSIDs)Revision IDs (RSIDs)RSID values are used to identify a set of RSID values are used to identify a set of changes that were made during the same changes that were made during the same editing sessionediting sessionFound in many elements:Found in many elements:
Paragraphs, runs, sections, stylesParagraphs, runs, sections, stylesTable rows, table properties, charts, diagramsTable rows, table properties, charts, diagrams
Allows for merging revisions, without the Allows for merging revisions, without the privacy and security issues involved in tracking privacy and security issues involved in tracking who who changed changed whatwhatOptional, but recommended for applications Optional, but recommended for applications that modify existing documentsthat modify existing documents
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
ImagesImagesAn image is a An image is a w:pictw:pict element inside a run element inside a run <w:r><w:r>The The v:imagedatav:imagedata element is defined in VML: element is defined in VML:
xmlns:v="urn:schemas-microsoft-com:vml"xmlns:v="urn:schemas-microsoft-com:vml"
The actual image is referenced via a relationship:The actual image is referenced via a relationship:
The relationship points to an image part in the package:The relationship points to an image part in the package:
<w:pict> <v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:250; height:200"> <v:imagedata r:id="rId4"/> </v:shape></w:pict>
<Relationship Id="rId4” Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image” Target="image1.jpg"/>
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
HyperlinksHyperlinksA hyperlink is nested inside a paragraph, A hyperlink is nested inside a paragraph, outside a run:outside a run:
The destination is stored in a relationship:The destination is stored in a relationship:
<w:p> <w:hyperlink r:id=“linkRel1"> <w:r> <w:rPr> <w:color w:val="0000FF" w:themeColor="hyperlink" /> <w:u w:val="single" /> </w:rPr> <w:t>Click here for OpenXmlDeveloper.org.</w:t> </w:r> </w:hyperlink></w:p>
<Relationship Id=“linkRel1“ Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink” Target="http://www.openxmldeveloper.org" TargetMode="External" />
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
TablesTablesTables are a set of paragraphs which are Tables are a set of paragraphs which are arranged into rows and columnsarranged into rows and columns
In WordprocessingML, tables are block level In WordprocessingML, tables are block level content, and are specified using the content, and are specified using the tabletable elementelement
Analogous to the HTML <table> elementAnalogous to the HTML <table> element
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
What’s in a table?What’s in a table?
PropertiesPropertiesGridGridRowsRowsCellsCells
<w:tbl>
<w:tblPr> <w:tblStyle w:val=“TableGrid”/> <w:tblW w:w=“0” w:type=“auto”/> <w:tblLook w:val=“01E0”/> </w:tblPr>
<w:tblGrid> <w:gridCol w:w=“2952”/> <w:gridCol w:w=“2952”/> <w:gridCol w:w=“2952”/> </w:tblGrid>
<w:tr>
<w:tc> <w:tcPr> <w:tcW w:w=“2952” w:type=“dxa”/> </w:tcPr> <w:p> <w:r> <w:t>1,1</w:t> </w:r> </w:p> </w:tc> <w:tc> <w:tcPr> <w:tcW w:w=“2952” w:type=“dxa”/> </w:tcPr> <w:p> <w:r> <w:t>1,2</w:t> </w:r> </w:p> </w:tc> </w:tr></w:tbl>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
StylesStylesA A style style defines a specific set of values for formatting properties that may be applied as a single logical unitdefines a specific set of values for formatting properties that may be applied as a single logical unit
For example, the Normal style in Word 2007 defines these formatting properties:For example, the Normal style in Word 2007 defines these formatting properties:Font = Calibri (body)Font = Calibri (body)Font Size = 11 pointFont Size = 11 pointFont Language = Word default (as configured by user)Font Language = Word default (as configured by user)Justification = LeftJustification = LeftLine Spacing = SingleLine Spacing = SingleWidow/Orphan controlWidow/Orphan control
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Styles StorageStyles StorageWithin a WordprocessingML package, styles are Within a WordprocessingML package, styles are stored in a unique partstored in a unique part
The styles part is the target of an implicit The styles part is the target of an implicit relationship from the document part, with this relationship from the document part, with this relationship type:relationship type:http://schemas.openxmlformats.org/wordprocessingml/2006/styleshttp://schemas.openxmlformats.org/wordprocessingml/2006/styles
The styles part has this content type:The styles part has this content type:vnd-openxmlformats.officedocument.wordprocessingml-styles+xmlvnd-openxmlformats.officedocument.wordprocessingml-styles+xml
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Style DefinitionsStyle DefinitionsA style is defined using the A style is defined using the stylestyle element elementThe style definition contains three pieces of The style definition contains three pieces of information:information:1.1. Common style properties. Common style properties. These properties apply to These properties apply to
all styles, regardless of their type.all styles, regardless of their type.2.2. The style type. The style type. Six types of styles may be defined: Six types of styles may be defined:
paragraph, character, linked (paragraph+character), paragraph, character, linked (paragraph+character), table, list (numbering), and default.table, list (numbering), and default.
3.3. Type-specific properties. Type-specific properties. Example: cell spacing in a Example: cell spacing in a table style.table style.
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Style TypesStyle TypesWordprocessingML supports six style types:WordprocessingML supports six style types:
Paragraph stylesParagraph stylesCharacter stylesCharacter stylesLinked stylesLinked stylesTable stylesTable stylesList stylesList stylesDefault style (linked type, but applies when no style Default style (linked type, but applies when no style specified)specified)
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraph Styles ExampleParagraph Styles ExampleStep 1: define a paragraph styleStep 1: define a paragraph style
Styles are defined in the style part:Styles are defined in the style part:
Paragraph Properties
Character (Run) Properties
Common Properties
<w:style w:type=“paragraph” w:styleid=“TestParagraphStyle”>
<w:name w:val=“Test Paragraph Style”/> <w:qformat/> <w:rsid w:val=“009E253E”/>
<w:pPr> <w:pStyle w:val=“TestParagraphStyle”/> <w:spacing w:line=“480” w:lineRule=“auto”/> <w:ind w:firstLine=“1440”/> </w:pPr>
<w:rPr> <w:rFonts w:ascii=“Algerian” w:hAnsi=“Algerian”/> <w:b/> <w:color w:val=“ED1C24”> <w:sz w:val=“40”/> </w:rPr>
</w:style>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraph Styles ExampleParagraph Styles ExampleStep 2: apply the style to a paragraphStep 2: apply the style to a paragraph
The pStyle element associates a style with a The pStyle element associates a style with a paragraph:paragraph:
The paragraph is displayed with the style applied:The paragraph is displayed with the style applied:
<w:p> <w:pPr> <w:pStyle w:val=“TestParagraphStyle”/> </w:pPr> <w:r> <w:t>Text</w:t> </w:r></w:p>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Character Styles ExampleCharacter Styles ExampleApplied to runs instead of paragraphsApplied to runs instead of paragraphsCan be applied to any run in the documentCan be applied to any run in the documentCan only specify Can only specify run run propertiesproperties
Not paragraph properties such as indentationNot paragraph properties such as indentation<w:style w:type=“character” w:styleId=“TestCharacterStyle”> w:name w:val=“Test Character Style”/> <w:priority w:val=“99”/> <w:qformat/> <w:rsid w:val=“00E07041”> <w:rPr> <w:rfonts w:ascii=“Courier New” w:hAnsi=“Courier New”/> <w:color w:val=“FFF200”/> <w:u w:val=“single”/> </w:rPr></w:style>
Character (Run) Properties
Common Properties
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Numbering StylesNumbering StylesFlexible hierarchical definitionFlexible hierarchical definition
Numbering styles are styles which define the Numbering styles are styles which define the structure of a multi-level numbering formatstructure of a multi-level numbering formatNumbering definition instances are based on an Numbering definition instances are based on an abstract numbering definitionabstract numbering definitionAbstract numbering definitions define paragraph Abstract numbering definitions define paragraph properties for up to 9 hierarchical levelsproperties for up to 9 hierarchical levelsNOTE: items in a list are simply paragraphs. There NOTE: items in a list are simply paragraphs. There is no list “container” as in HTML.is no list “container” as in HTML.
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Table StylesTable StylesA A table style is associated with a table via the table style is associated with a table via the tblStyle tblStyle element in the table properties:element in the table properties:
<w:tbl> <w:tblPr> <w:tblStyle w:val=“Style20”/> <w:tblW w:w=“5000” w:type=“pct”/> <w:tblLook w:val=“0220”/> </w:tblPr> … tblGrid, table rows and cells …</w:tbl>
Table style Style20 is applied to the table
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Style Application HierarchyStyle Application HierarchyDirect formatting overrides style settingsDirect formatting overrides style settings
Table
Paragraph
Character
Direct Formatting
Numbering
Document Defaults
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SubdocumentsSubdocumentsMechanism for “rolling up” documentsMechanism for “rolling up” documents
Subdocuments are well-formed Open XML Subdocuments are well-formed Open XML documents and can be edited independentlydocuments and can be edited independentlySubdocuments don’t know they’re part of Subdocuments don’t know they’re part of something bigger – they’re just stand-alone something bigger – they’re just stand-alone documentsdocuments
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SubdocumentsSubdocumentsImplementation detailsImplementation details
Main document part contains Main document part contains subDocsubDoc elements that indicate where to elements that indicate where to insert subdocumentsinsert subdocumentsThe subdocument’s location is stored in a relationshipThe subdocument’s location is stored in a relationship
<w:body> <w:subDoc r:id=“rId1”/> <w:subDoc r:id=“rId2”/> <w:subDoc r:id=“rId3”/>
<Relationship Id=“rId1” Type=“…/subDocument” Target=“Part1.docx” TargetMode=“external”/><Relationship Id=“rId2” Type=“…/subDocument” Target=“Part2.docx” TargetMode=“external”/><Relationship Id=“rId3” Type=“…/subDocument” Target=“Part3.docx” TargetMode=“external”/>
Main document part:
Relationships:
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Document SectionsDocument Sections
A document may be divided into sectionsA document may be divided into sectionsAllows formatting at a higher level than Allows formatting at a higher level than paragraphs:paragraphs:
Landscape/portrait orientationLandscape/portrait orientationPage margins, etc.Page margins, etc.
Section properties are defined in Section properties are defined in sectPrsectPr::<w:sectPr> <w:pgSz w:w="12240" w:h="15840"/> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440“ w:left="1800“ w:header="720" w:footer="720" w:gutter="0"/> <w:cols w:space="720"/> <w:docGrid w:linePitch="360"/></w:sectPr>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Section PropertiesSection PropertiesExampleExample
In Word, section properties are In Word, section properties are specified in the Page Setup dialogspecified in the Page Setup dialog
<w:sectPr> <w:pgSz w:w="12240" w:h="15840" /> <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0" /> <w:cols w:space="720" /> <w:docGrid w:linePitch="360" /> </w:sectPr>
Satisfy Your Technical Curiosity
Custom XML SupportCustom XML Support
Merging the worlds of documents and dataMerging the worlds of documents and data
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Why Custom XML?Why Custom XML?Enables Enables semantic semantic interoperabilityinteroperability
Documents can provide a rich view of back-end dataDocuments can provide a rich view of back-end dataDocuments can update back-end data sourcesDocuments can update back-end data sources
Exposes business data within documents to Exposes business data within documents to heterogenous systemsheterogenous systemsBusiness-specific semantics can be applied to Business-specific semantics can be applied to document datadocument dataSeparates presentation and dataSeparates presentation and data
Custom XML schema support was a key design Custom XML schema support was a key design objective for Open XML: objective for Open XML: any schema any schema can be used can be used in Open XML documents.in Open XML documents.
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XMLCustom XMLDeveloper options for custom XML supportDeveloper options for custom XML support
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XML PartsCustom XML PartsPure custom data: only Pure custom data: only your datayour data appears in the appears in the partpartFormat-agnosticFormat-agnostic
WordprocessingML: relationship from main documentWordprocessingML: relationship from main documentSpreadsheetML: relationship from workbookSpreadsheetML: relationship from workbookPresentationML: relationship from presentation or slide, or explicit relationship from PresentationML: relationship from presentation or slide, or explicit relationship from a shape on a slidea shape on a slide
No content restrictions: any schema (or none), any No content restrictions: any schema (or none), any number of partsnumber of partsSyntactical restriction: must contain well-formed Syntactical restriction: must contain well-formed XMLXML
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XML Parts Custom XML Parts Properties/MetadataProperties/Metadata
Information about a custom XML part is stored Information about a custom XML part is stored in a in a custom XML properties custom XML properties partpartStored via an implicit Stored via an implicit customXmlProps customXmlProps relationship from the custom XML partrelationship from the custom XML partContains two types of information:Contains two types of information:
Part IDPart IDUniquely identifies a part within a documentUniquely identifies a part within a documentMaintained through editing sessionsMaintained through editing sessions
XML Schema referencesXML Schema references
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Structured Document TagsStructured Document TagsKnown as "content controls" in MS-OfficeKnown as "content controls" in MS-Office
Smart tags and custom XML markup add semantics, Smart tags and custom XML markup add semantics, but do not have any effect on presentationbut do not have any effect on presentationSometimes you Sometimes you want want to affect presentationto affect presentation
Data-entry restrictions, multi-select, etc.Data-entry restrictions, multi-select, etc.
Solution: the structured document tag Solution: the structured document tag <sdt><sdt>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Structured Document TagsStructured Document TagsSeven types supported in Open XMLSeven types supported in Open XML
Plain textPlain textComboboxComboboxDropdown listDropdown listDocument building blockDocument building blockDate pickerDate pickerRich textRich textPicturePicture
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Data BindingData Binding
2-way synchronization between:2-way synchronization between:Content controls (structured document tags)Content controls (structured document tags)Custom XML nodes (data in Custom XML nodes (data in your schemayour schema))
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Data Binding BasicsData Binding BasicsHow to bind xml nodes to structured document tagsHow to bind xml nodes to structured document tags
Add a Add a <dataBinding> <dataBinding> element to the structured element to the structured document tag properties document tag properties <sdtPr><sdtPr><dataBinding><dataBinding> specifices a custom Xml part (by Custom specifices a custom Xml part (by Custom XML Data Identifier) and an Xpath to a specific node XML Data Identifier) and an Xpath to a specific node within that partwithin that part
Custom XML Data Identifier? What’s that?Custom XML Data Identifier? What’s that?The custom XML part has a properties partThe custom XML part has a properties part
Implicit relationship in Implicit relationship in customXmlPart.xmlcustomXmlPart.xml.rels.relsThe properties part specifies a Custom XML Data IdentifierThe properties part specifies a Custom XML Data Identifier
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Content Control ToolkitContent Control Toolkit
Open-source developer toolOpen-source developer toolhttp://www.codeplex.com/Wiki/View.aspx?ProjectName=dbe
Automatically generates Automatically generates parts, relationships, and parts, relationships, and markup to bind custom XML markup to bind custom XML parts to content controlsparts to content controls
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XML MarkupCustom XML MarkupTagging document content with custom semanticsTagging document content with custom semantics
Allows embedding the structure from any XML schema into a WordprocessingML Allows embedding the structure from any XML schema into a WordprocessingML documentdocument
Schema not requiredSchema not requiredXML doesn’t have to validate against your schemaXML doesn’t have to validate against your schemaCustom XML elements may have custom attributesCustom XML elements may have custom attributesConsumers/producers preserve your attributesConsumers/producers preserve your attributes
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XML MarkupCustom XML MarkupExampleExample
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
XML Mapping OverviewXML Mapping OverviewSpreadsheetML support for linking to external XMLSpreadsheetML support for linking to external XML
XML elements and attributes may be mapped to XML elements and attributes may be mapped to cells and tablescells and tablesStore a copy of the schema in the workbookStore a copy of the schema in the workbookTop level map objectTop level map objectCustom properties on each cell or columnCustom properties on each cell or column
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetMLSpreadsheetMLDocument architectureDocument architecture
Workbook properties
table
chart
styles
calcChain
sharedStrings
sheet1..Nsheet1..Nsheet1..Nsheet1..N
sheet1..Nsheet1..Nsheet1..Ndrawing
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetMLSpreadsheetMLPerformance optimizationsPerformance optimizations
SpreadsheetML has been optimized based on SpreadsheetML has been optimized based on analysis of typical spreadsheet usage patterns:analysis of typical spreadsheet usage patterns:
Small tag size (often a single character)Small tag size (often a single character)Shared stringsShared stringsShared formulasShared formulasSparse table markup allowedSparse table markup allowedOptional r=“A1” attribute for faster loadingOptional r=“A1” attribute for faster loading
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetML StringsSpreadsheetML StringsTwo alternatives for storing text stringsTwo alternatives for storing text strings
1.1. Inline stringsInline strings• Provided for ease of translation/conversionProvided for ease of translation/conversion• Useful in XSLT scenariosUseful in XSLT scenarios• Excel and other consumers may convert to shared Excel and other consumers may convert to shared
strings on document savestrings on document save
2.2. An entry in the shared-strings tableAn entry in the shared-strings table• May be either a simple string or formatted textMay be either a simple string or formatted text
These approaches may be mixed/combinedThese approaches may be mixed/combined
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Shared StringsShared StringsRepetitive strings are common in typical spreadsheetsRepetitive strings are common in typical spreadsheets
Strings are stored in a shared-strings part:Strings are stored in a shared-strings part:Each unique string is stored onceEach unique string is stored onceCells store the index (0-based) of the stringCells store the index (0-based) of the string
Benefits:Benefits:Users: reduced file size, improved performanceUsers: reduced file size, improved performanceDevelopers: all strings are in one part, simplifying Developers: all strings are in one part, simplifying search, localization, and other common string-handling search, localization, and other common string-handling taskstasks
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Shared StringsShared StringsSampled shared-strings tableSampled shared-strings table
<sst xmlns="..." count="6" uniqueCount="4"> <si> <t>Paris</t> </si> <si> <t>Seattle</t> </si> <si> <t>London</t> </si> <si> <t>Copenhagen</t> </si></sst>
6 string references, 4 unique strings
Paris = string 0
<row r="1" spans="1:1"> <c r="A1" t="s"> <v>0</v> </c></row>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Inline StringsInline StringsSimplest option for generating spreadhseetsSimplest option for generating spreadhseets
No shared-strings part requiredNo shared-strings part requiredEspecially useful in XSLT scenariosEspecially useful in XSLT scenariosIf you’re consuming Open XML documents, you must If you’re consuming Open XML documents, you must handle both cases: inline strings and/or shared stringshandle both cases: inline strings and/or shared stringsExcel 2007 converts to shared strings on saveExcel 2007 converts to shared strings on save
<sheetData> <row><c t="inlineStr"><is><t>Paris</t></is></c></row> <row><c t="inlineStr"><is><t>Seattle</t></is></c></row> <row><c t="inlineStr"><is><t>London</t></is></c></row> <row><c t="inlineStr"><is><t>Copenhagen</t></is></c></row> <row><c t="inlineStr"><is><t>Paris</t></is></c></row> <row><c t="inlineStr"><is><t>London</t></is></c></row></sheetData>
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetML TablesSpreadsheetML TablesDesign goals for SpreadsheetML tables:Design goals for SpreadsheetML tables:1.1. Separate presentation and dataSeparate presentation and data
Data stays in the worksheetData stays in the worksheetTable definition is in a separate part (referenced via a relationship)Table definition is in a separate part (referenced via a relationship)
2.2. Cell definition lightweight but extensibleCell definition lightweight but extensibleComplex type with future storage capabilitiesComplex type with future storage capabilitiesNamed ranges written in their own collection instead of on each cellNamed ranges written in their own collection instead of on each cell
Open XML has different types of tables for each Open XML has different types of tables for each document type, optimized for different scenarios:document type, optimized for different scenarios:
WordprocessingML has its WordprocessingML has its tbltbl element elementSpreadsheetML has its SpreadsheetML has its tabletable element elementPresentationML uses DrawingML tables (PresentationML uses DrawingML tables (tbl tbl inside inside graphicDatagraphicData))
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetML TablesSpreadsheetML TablesBasic conceptsBasic concepts
<sheetData> <row r="1" spans="1:2"> <c r="A1" t="s"><v>0</v></c> <c r="B1" t="s"><v>1</v></c> </row> <row r="2" spans="1:2"> <c r="A2"><v>1</v></c> <c r="B2"><v>4</v></c> </row> <row r="3" spans="1:2"> <c r="A3"><v>2</v></c> <c r="B3"><v>5</v></c> </row> <row r="4" spans="1:2"> <c r="A4"><v>3</v></c> <c r="B4"><v>6</v></c> </row></sheetData>...<tableParts count="1"> <tablePart r:id="rId2"/></tableParts>
Headings = shared strings
Worksheet part:
Table-definition part:<table … ref="A1:B4” …> <autoFilter ref="A1:B4”/> <tableColumns count="2"> <tableColumn id="1" name="Column1" /> <tableColumn id="2" name="Column2" /> </tableColumns> <tableStyleInfo …/> </table>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
AutoFilter ExampleAutoFilter ExampleDefined explicitly in the worksheetDefined explicitly in the worksheet
Satisfy Your Technical Curiosity
DrawingMLDrawingML
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
DrawingML vs. VMLDrawingML vs. VMLPer the Ecma spec: Per the Ecma spec: “VML should be considered “VML should be considered a deprecated format included in Office Open a deprecated format included in Office Open XML for legacy reasons only.”XML for legacy reasons only.”VML was not entirely replaced by DrawingML VML was not entirely replaced by DrawingML before submission to Ecmabefore submission to Ecma
Main remaining uses of VML:Main remaining uses of VML:WordprocessingML: OfficeArt shapes, textboxesWordprocessingML: OfficeArt shapes, textboxesSpreadsheetML/PresentationML: comments, SpreadsheetML/PresentationML: comments, embedded OLE objectsembedded OLE objects
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
3-D Effects3-D Effects
3-D Scene Definition
Before Apply 3-D Scene
Apply 3-D Bevels
Adjust Material types
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
DrawingMLDrawingMLImplementation varies for each document typeImplementation varies for each document type
The core DrawingML definition is the same in each caseThe core DrawingML definition is the same in each caseLocation varies (main body, drawing part, slide)Location varies (main body, drawing part, slide)Packaging (“shim”) variesPackaging (“shim”) varies
WordprocessingML(in Word):
SpreadsheetML(in Excel):
PresentationML(in PowerPoint):
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
WordprocessingMLWordprocessingMLDrawingML is stored in the DrawingML is stored in the document bodydocument body
Shim defines graphic frame and locked canvas
Shape definition uses DrawingMLnamespace for all elements
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetMLSpreadsheetMLDrawing is defined in a separate Drawing is defined in a separate drawing partdrawing part
Shim defines anchorposition and type
Shape definition usesspreadsheetDrawing namespacefor non-visual properties
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PresentationMLPresentationMLDrawingML is stored in the slide partDrawingML is stored in the slide part
No shim – the shape is in the shape tree
Shape definition is DrawingML
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Charts and GraphsCharts and Graphs
Worksheet DrawingsWorksheet Drawings<xdr:wsDr/><xdr:wsDr/>
Anchoring properties (3 types)Anchoring properties (3 types)Absolute Absolute <xdr:absoluteAnchor/><xdr:absoluteAnchor/>One cell One cell <xdr:oneCellAnchor/><xdr:oneCellAnchor/>Two cell Two cell <xdr:twoCellAnchor/><xdr:twoCellAnchor/>
Drawing elementsDrawing elements
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Example: 3D Chart PartExample: 3D Chart Part<c:chartSpace> <c:pivotSource/> <c:chart> <c:title/> <c:view3D/> <c:plotArea> <c:pie3DChart> <c:ser> <c:cat/> <c:val/> </c:ser> <c:dLbls/> </c:pie3DChart> </c:plotArea> <c:legend/> </c:chart></c:chartSpace>
Element Description
chartSpace Root node includes chart and print definition.
pivotSource If pivot table, identifies source table.
chart Root element for the chart.
view3D The chart is 3D, specifies the 3D view.
plotArea Defines a layout and contains an element that defines the type of chart.
pie3DChart This is a 3D pie chart.
ser Specifies a series with categories and values.
cat Category axis data (string cache).
val Values, numbers shown (num cache).
dLbls Settings for the data labels.
Legend Specifies the legend.
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PresentationMLPresentationMLDocument architectureDocument architecture
View Properties
PresentationProperties
Code
Themes
Fonts
Notes Masters
Slides
HandoutMasters
Slide Masters
Notes Slides
Slide Layouts
Presentation
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Sample SlideSample SlideTypical presentationML contentTypical presentationML content
Shape ChartTextbox
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Slide PartSlide PartShape tree contains slide content definitionsShape tree contains slide content definitions
<p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr> <p:cNvPr id="2" name="7-Point Star 1” /> … <p:sp> <p:nvSpPr> <p:cNvPr id="3" name="TextBox 2” /> … <p:graphicFrame> <p:nvGraphicFramePr> <p:cNvPr id="4" name="Chart 3” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr></p:sld>
Shape
Chart
Textbox
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Shape ChartTextbox
Chart Part (chart1.xml)
Data source
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PresentationML TablesPresentationML TablesSlide part contains table definitionSlide part contains table definitionIn a graphicFrame elementIn a graphicFrame elementAll DrawingML is in the slide – no separate “table part”All DrawingML is in the slide – no separate “table part”
Table position
Table definition
Header-row formatting
Banded-row formatting
TableStyleID = GUID
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
OpenXmlDeveloper.orgOpenXmlDeveloper.orgFormed by 40 companies to share developer Formed by 40 companies to share developer information about the Office Open XML file formatsinformation about the Office Open XML file formats
Articles with source code for C#, VB, Java, PHP, XSLTArticles with source code for C#, VB, Java, PHP, XSLT
Forums for posting technical questionsForums for posting technical questions
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
The Ecma SpecThe Ecma Spec1. Fundamentals1. Fundamentals2. Open Packaging Convention2. Open Packaging Convention3. Primer 3. Primer (start here)(start here)4. Markup Language Reference 4. Markup Language Reference (huge!)(huge!)5. Markup Compatibility and Extensibility5. Markup Compatibility and ExtensibilityReference Schemas (XSD, RelaxNG)Reference Schemas (XSD, RelaxNG)
Tips:Tips:• Start with part 3, PrimerStart with part 3, Primer• Use the PDF version of part 4 to look up elements/attributesUse the PDF version of part 4 to look up elements/attributes
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Open XML BlogsOpen XML Blogs
Brian Jones: Brian Jones: http://blogs.msdn.com/brian_jonesDoug Mahugh: Doug Mahugh: http://blogs.msdn.com/dmahughKevin Boske: Kevin Boske: http://blogs.msdn.com/kboskeWouter Van Vugt: Wouter Van Vugt: http://blogs.infosupport.com/woutervErika Ehrli: Erika Ehrli: http://blogs.msdn.com/erikaehrli
See complete list on www.OpenXmlDeveloper.orgSee complete list on www.OpenXmlDeveloper.org
Satisfy Your Technical Curiosity