open xml formats fabio santini.net developer evangelist microsoft italy
TRANSCRIPT
Open XML FormatsOpen XML Formats
Fabio SantiniFabio Santini.NET Developer Evangelist.NET Developer EvangelistMicrosoft ItalyMicrosoft Italy
New community formed to help bring New community formed to help bring developers together developers together
Currently sponsored by almost 40 institutions Currently sponsored by almost 40 institutions from around the worldfrom around the world
Community and website for Community and website for information exchangeinformation exchangeFree of Charge: Available to everyone that Free of Charge: Available to everyone that
wants to participate; encourage wants to participate; encourage development on all platformsdevelopment on all platformsBe one of the first to join the community!Be one of the first to join the community!http://openxmldeveloper.orghttp://openxmldeveloper.org
FF301 AgendaFF301 Agenda
Overview of the new formatsOverview of the new formats
Demo: New Office documentDemo: New Office document
Additional benefits of the new formatsAdditional benefits of the new formats
Role of XML in Office documentsRole of XML in Office documents
Structural details of the new formatsStructural details of the new formats
Custom defined schema supportCustom defined schema support
File format benefits for developersFile format benefits for developers
Solution capabilitiesSolution capabilities
Open XML FormatsOpen XML Formats
New XML file formats for Word, Excel and New XML file formats for Word, Excel and PowerPointPowerPoint
New formats will be the New formats will be the defaultdefault file formats, with new file formats, with new file type extensions (.docx; .pptx; .xlsx)file type extensions (.docx; .pptx; .xlsx)Fully 100% compatible with existing formatsFully 100% compatible with existing formats
Open, transparent format improves Open, transparent format improves interoperabilityinteroperability
XMLXML - Transparent, XML format enables new integration - Transparent, XML format enables new integration scenarios for documents and LOB systemsscenarios for documents and LOB systemsZIP containerZIP container - allows for standard compression on all - allows for standard compression on all files without user effortfiles without user effortLicensingLicensing - Removed need for license by providing a - Removed need for license by providing a Covenant Covenant that says we won’t enforce IP against folks implementing that says we won’t enforce IP against folks implementing the format (100% royalty free)the format (100% royalty free)
StandardizationStandardizationEcma InternationalEcma International - created TC45 to fully document the - created TC45 to fully document the Open XML formatsOpen XML formats
Members include: Apple, Barclays Capital, BP, the British Members include: Apple, Barclays Capital, BP, the British Library, Library, Essilor, Intel Corporation, NextPage Inc., Statoil ASA and Essilor, Intel Corporation, NextPage Inc., Statoil ASA and Toshiba Toshiba Current spec is already over 2000 pagesCurrent spec is already over 2000 pages
Office Open XML Office Open XML FormatsFormats
Microsoft Office Open Microsoft Office Open XML FormatsXML Formats
Added Benefits: Added Benefits: compact and robustcompact and robustZIP container allows for standard compression on all ZIP container allows for standard compression on all files without user effort (Dramatic file size files without user effort (Dramatic file size improvements)improvements)Significantly more robust files to help minimize data Significantly more robust files to help minimize data lossloss
Backward Compatible: Backward Compatible: Office 2000, Office XP, Office 2000, Office XP, Office 2003 will all support the new formatsOffice 2003 will all support the new formats
Patches for compatibility available by launchPatches for compatibility available by launchOpen, edit and save new formatsOpen, edit and save new formats
Legacy support: Legacy support: Current Office 97-2003 binary file Current Office 97-2003 binary file formats supportedformats supported
Support for XML formats from Office 2003, Office XP Support for XML formats from Office 2003, Office XP continuedcontinued
Developers: Developers: Endless potential for developers Endless potential for developers Build solutions to read, write, and modify Office files Build solutions to read, write, and modify Office files (without the need to run Office APIs)(without the need to run Office APIs)
Benefits of the Benefits of the Office Open XML Office Open XML FormatsFormats
Evolution Of File FormatsEvolution Of File Formats
Office 2000Office 2000Early InnovationEarly Innovation
XML document XML document propertiesproperties
Office 97Office 97Existing binary file formats Existing binary file formats designed in 1994, launched designed in 1994, launched in Office 97in Office 97
Office XPOffice XPFirst XML First XML FormatFormat
Spreadsheet Spreadsheet XMLXML
Office 2003Office 2003Breakthrough XML Breakthrough XML SupportSupport
WordML, SpreadsheetMLWordML, SpreadsheetML
Custom-defined schemaCustom-defined schema
Office 2007Office 2007New XML FormatsNew XML Formats
XML file format defaultXML file format default
XML PowerPoint formatXML PowerPoint format
““Wave 12”Wave 12”
The Role Of XML With The Role Of XML With DocumentsDocumentsScenarioScenario ExampleExampleDocument AssemblyDocument AssemblyServer-based or user-assisted Server-based or user-assisted construction of documents from archived construction of documents from archived content or database contentcontent or database content
Create sales reports from financial and Create sales reports from financial and forecast data stored in a CRM systemforecast data stored in a CRM system
Content ReuseContent ReuseMuch easier to move content between Much easier to move content between documents, including different document documents, including different document typestypes
Apply content stored in Word documents Apply content stored in Word documents to Web pages quickly and efficientlyto Web pages quickly and efficiently
Content TaggingContent TaggingAdd domain-specific metadata to Add domain-specific metadata to document content to enable custom document content to enable custom solutions solutions
Tag presentations using a specific Tag presentations using a specific taxonomy to improve knowledge taxonomy to improve knowledge management efficiency management efficiency
Document InterrogationDocument InterrogationQuery document repositories based on Query document repositories based on custom data, content types or document custom data, content types or document metadatametadata
Search for all documents containing a Search for all documents containing a specific company name or sales contactspecific company name or sales contact
Document SanitizationDocument SanitizationRemove unwanted content like Remove unwanted content like comments or embedded code from your comments or embedded code from your document when appropriatedocument when appropriate
Remove all tracked changes and Remove all tracked changes and comments from a Word document before comments from a Word document before it is publishedit is published
Open XML Formats Open XML Formats ArchitectureArchitecture
User view: single Office “file”
Questionnaire.docx
File ContainerFile Container
Document PropertiesDocument Properties
CommentsComments
ChartsCharts
Embedded code / macrosEmbedded code / macros
Images, video, soundImages, video, sound
Custom-defined XMLCustom-defined XML
WordML / SpreadsheetML, etc.WordML / SpreadsheetML, etc.Document PartsDocument Parts
Most parts are XMLMost parts are XML
Each XML part is a discreet, Each XML part is a discreet, compressed componentcompressed component
Can add, extract and modify Can add, extract and modify individual parts without using individual parts without using Office programsOffice programs
Corruption or absence of any part Corruption or absence of any part would not prohibit the file from would not prohibit the file from being openedbeing opened
Developer view: modular file
Modifying An Modifying An Excel SpreadsheetExcel Spreadsheet
Components Of Components Of The New FormatsThe New Formats
Package – ZIP ContainerPackage – ZIP ContainerPart – The “files” inside the ZIPPart – The “files” inside the ZIPContent Types – Each part has a content type that is Content Types – Each part has a content type that is enforced enforced on openon openRelationships – Any part that references another part Relationships – Any part that references another part must do so via a relationshipmust do so via a relationship
Document Properties
Application Properties
Custom Doc. Props.
Workbook
Sheet 2
Sheet 3
Sheet 1 Styles
Chart
Strings
Relationship
...
...
Create A Document Create A Document From ScratchFrom Scratch
The Role Of XMLThe Role Of XMLReference and custom-defined schemasReference and custom-defined schemas
Custom-defined Custom-defined SchemasSchemas
Data-oriented (e.g., Price, Data-oriented (e.g., Price, Invoice)Invoice)
Represents the business Represents the business information stored in the information stored in the documentdocument
Enable System IntegrationEnable System Integration
XML Reference SchemasXML Reference SchemasDisplay-oriented (e.g. Bold, Display-oriented (e.g. Bold, Italics, Tables, Paragraphs, Italics, Tables, Paragraphs, Styles)Styles)
Open Document FormatOpen Document Format
Enable Archival & File Enable Archival & File Formats InteroperabilityFormats Interoperability
The Role Of XMLThe Role Of XMLReference and custom-defined schemasReference and custom-defined schemas
XML Reference SchemasXML Reference SchemasDisplay-oriented (e.g. Bold, Display-oriented (e.g. Bold, Italics, Tables, Paragraphs, Italics, Tables, Paragraphs, Styles)Styles)
Open Document FormatOpen Document Format
Enable Archival & File Enable Archival & File Formats InteroperabilityFormats Interoperability
<w:p><w:p> <w:r><w:r> <w:rPr><w:rPr><w:b /><w:b /></w:rPr></w:rPr> <w:t>John Doe</w:t><w:t>John Doe</w:t> </w:r></w:r> <w:r><w:r> <w:rPr><w:rPr><w:i /><w:i /></w:rPr></w:rPr> <w:t>Health Agency</w:t><w:t>Health Agency</w:t> </w:r></w:r></w:p></w:p>
<ConferenceReport> <Date>3/24/2004</Date> <Attendees> <Attendee Name=“John Doe”> <Department>
Health Agency </Department> <Potential> <Sales>100</Sales> <Growth>25%</Growth> … </Attendee>
Custom-defined Custom-defined SchemasSchemas
Data-oriented (e.g., Price, Data-oriented (e.g., Price, Invoice)Invoice)
Represents the business Represents the business information stored in the information stored in the documentdocument
Enable System IntegrationEnable System Integration
The Role Of XMLThe Role Of XMLReference and custom-defined schemasReference and custom-defined schemas
Custom Defined Custom Defined SchemaSchema
Developing With The Developing With The FormatsFormats
More Reliable SolutionsMore Reliable SolutionsThird-party tools were main cause of document corruptionsThird-party tools were main cause of document corruptions
Fully Documented FormatsFully Documented FormatsFreely available for download with a royalty free licenseFreely available for download with a royalty free licenseOffice file format schemas - Office file format schemas - Used to validate content for a given part Used to validate content for a given part
Samples, samples, samples Samples, samples, samples In the form of code “snippets” for easier use and integration into your VSTO In the form of code “snippets” for easier use and integration into your VSTO solutions solutions
WinFx Package APIs WinFx Package APIs Access/maintain parts and relationships within a fileAccess/maintain parts and relationships within a fileTakes care of all ZIP level functionalityTakes care of all ZIP level functionality
XPath XPath Navigation within content Navigation within content
XML DOM XML DOM Manipulating content Manipulating content
Office Open XML Resource KitOffice Open XML Resource KitTools for constructing and deconstructing the new file formatsTools for constructing and deconstructing the new file formatsDesign time Validation toolDesign time Validation tool
Parses a file and reports on schema, relationship errors and warnings Parses a file and reports on schema, relationship errors and warnings
Runtime serialization tool Runtime serialization tool Flattens package into a single file for ease of development in simple construction scenarios Flattens package into a single file for ease of development in simple construction scenarios
Real World ExampleReal World ExampleSchema documentationSchema documentation
Initial submission to Ecma International Initial submission to Ecma International TC45 was about 2500 pagesTC45 was about 2500 pages
~25 namespaces~25 namespaces~94 XSD files~94 XSD files622 simple types622 simple types1444 complex types1444 complex types3299 attributes3299 attributes3371 enumerations3371 enumerations3464 elements3464 elements
Obviously, there is a huge amount of Obviously, there is a huge amount of documentation to be donedocumentation to be done
Sample Solution ScenariosSample Solution Scenarios
Data interoperabilityData interoperability
Content manipulationContent manipulation
Content sharing and reuseContent sharing and reuse
Document assemblyDocument assembly
Document securityDocument security
Managing sensitive informationManaging sensitive information
Document stylingDocument styling
Document profiling Document profiling
ResourcesResources
Office Preview Site http://www.microsoft.com/office/preview/
Brian Jones’s Blog http://blogs.msdn.com/Brian_Jones/
Kevin Boske’s Blog http://blogs.msdn.com/KevinBoske/
Office 2003 Reference Schema Information http://www.microsoft.com/office/xml/
© 2006 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
Demo TitleDemo Title
NameNameTitleTitleGroupGroup