demystifying pd fs
DESCRIPTION
TRANSCRIPT
Demystifying PDFsBetsy Fanning
AIIM Nashville 2010
Introduction to PDF Overview of PDF Standards Adoption of PDF Standards
Agenda
951,000,000 PDF pages on Google
How Many PDF Files Are There?
Introduction of PDF◦ Portable Document Format
Digital format for representing documents
PDF Files created Natively Converted from other electronic
formats Digitized from paper, microform, or
other format A specification for electronic files
representing documents specification for electronic files representing documents
Digital Documents
Portable Document Format◦ Widely used world wide Business Government Libraries and archives
◦ Information must be kept for long periods of time◦ Must remain useable and accessible across
multiple generations of technology
Reliable consistent viewing and printing Mix text, raster images, lineart, color Basic unit is the page Easy navigation, fast access to any page Small file size Dynamic
◦ Digital signatures◦ Forms
What is PDF?
ISO 32000-1:2008, Document management – Portable Document Format – Part 1: PDF 1.7
2007 Adobe contacted AIIM to request assistance in taking PDF Specification to ISO
Exact replication of PDF Specification 1.7 including changes and amendments
ISO 32000-1: 2008 (PDF)
Adds support for geospatial data Supports flash Added collections (portfolios) Allows for bar codes to be used with form fields Added structure elements for MathML Enhanced accessibility Incorporated ETSI TS 102 778 for digital
signatures Future – Reader improvements and possible
merging of PDF streams
ISO/CD 32000-2 (PDF 2.0)
PDF is powerful and flexible May be too flexible for some applications Restrict subset of PDF Need higher degree of reliability May want standard in hands of neutral non-
commercial body – Internationally recognized standards body such as ISO
Focus on archive needs of government, corporations, libraries
Resolve issues with font embedding replacement
Why Standardize a Version of PDF
Joint sponsors of the US PDF/A and PDF/E committees◦ AIIM, Association for Information and Image Management
Secretariat to ISO/TC 171 and ISO/TC 171/SC2 Secretariat to US Technical Advisory Group (TAG) for
ISO/TC 171
◦ NPES, The Association for Suppliers of Printing, Publishing, and Converting Technologies Secretariat to ANSI Committee for Graphic Arts
Technologies Standards (CGATS) Secretariat to US TAG for ISO/TC 130
Joint sponsors of PDF Healthcare committee◦ ASTM International
Role of AIIM and Partners
ISO Joint Working Groups (JWG) for PDF Standards
◦ ISO/TC 171/SC 2, Document management applications – Application issues
◦ ISO/TC 130, Graphic technology
◦ ISO/TC 46/SC 11, Information and documentation – Archives/records management
◦ ISO/TC 42, Photography
◦ ISO TC 184/SC4, Automation systems and integration, Industrial data
◦ ETSI, European Telecommunications Standards Institute
◦ PDF/A Competence Center
Role of ISO
Multi-part ISO International Standard◦ ISO 19005-1:2005, Document management –
Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1)
◦ Part 2 (19005-2) intended to bring PDF/A into conformance with ISO 32000
◦ Part 3 (19005-3) Embedded documents
◦ And additional future parts, as necessary
The PDF standard
PDF/X, ISO 15930 *◦ Pre-press data exchange
PDF/A, ISO 19005 (Parts 1, 2 and 3)◦ Archiving electronic documents
PDF/E (Engineering), ISO 24517-1◦ For engineering, architectural, and GIS documents
PDF/E (Engineering), ISO/NWP 24517-2◦ Archive engineering, architectural, and GIS documents
PDF/UA (Universal Access), ISO/CD 14289-1◦ Intended to address Section 508 concerns
PDF Healthcare◦ Exchange of electronic health records (CDA and CCR)
PDF, ISO 32000-1 (ISO/CD 32000-2) PDF/VT, ISO 16612 (2 parts) *
◦ Variable data exchange PRC, Product Representation Compact (ISO/CD 14739-1)
* Not AIIM Responsibility
PDF Standards
Graphic technology – Prepress digital data exchange – Use of PDF (PDF/X)
Specifies the use of PDF for the dissemination of complete digital data, in a single exchange, that contains all elements for final print reproduction.
ISO 15930 (PDF/X)
Specifies how to use PDF to define and exchange all content elements and supporting metadata to produce predictable output for variable or transactional document content
ISO 16612 (PDF/VT)
“This International Standard specifies how to use the Portable Document Format (PDF) 1.4 for long-term preservation of electronic documents”
◦ Applicable to documents containing character, raster, and vector data
◦ The standard does not address: Processes for generating PDF/A files Specific implementation details of rendering PDF/A files Methods for storing PDF/A files Hardware and software dependencies
ISO 19005-1:2005
Court documents protect citizen’s rights Access is assured in trial courts for 20 to 40 years
for the Judiciary Access is often time sensitive On-site courthouse storage not cost effective Court decisions are permanent records held “until
the end of the republic” by the National Archives Document format conveys critical information,
which must be rendered accurately Cases – New York Southern, Enron, etc. 20 years of filings are in PDF
Background for PDF/AJudiciary Use Case
6 years
12 years
20 years
50 years
Person lifetimes
Life of legal business entity
Forever/historical
0% 10% 20% 30% 40% 50% 60% 70% 80%
Page 18
Records ArchiveDo you have electronic records that need to be retained for: (check all that apply)
Most organizations will be keeping some records for a very long time.
N=144, all respondents .
Native format PDF PDF/A TIFF XML XPS JPEG Digital video/ audio
Print and archive in hard copy
Not archived
at all
Scanned documents 3% 48% 8% 29% 0% 0% 5% 1% 1% 4%
Electronic documents 48% 27% 7% 6% 1% 0% 2% 0% 4% 5%
Photo images 20% 4% 0% 6% 0% 1% 59% 2% 1% 8%
Email 64% 7% 4% 2% 4% 0% 0% 1% 4% 14%
Video/CCTV recordings 23% 0% 0% 0% 0% 0% 1% 35% 1% 39%
Audio recordings 23% 0% 0% 0% 1% 0% 0% 35% 1% 40%
Web pages 33% 7% 0% 1% 13% 0% 0% 1% 1% 42%
Telephone recordings 15% 0% 0% 0% 0% 1% 0% 17% 0% 67%
Instant messages 15% 0% 1% 1% 1% 0% 0% 1% 1% 79%
Page 19
Archive File TypesHow are the following content types mostly archived in your organization?
N=139, all
NARA defines:“…the ability to access an electronic record throughout its lifecycle, regardless of the technology used when it was originally created”
Characteristics of Sustainable Formats◦ Published documentation and open disclosure◦ Widespread adoption and use ◦ Self-describing formats◦ External Dependency◦ Impact of Patents◦ Technical Protection Mechanism
Sustainable Formats
TIFF◦ Well known◦ Difficult to create digitally born documents◦ Indexing documents may be difficult
XML◦ Many schema exist◦ Preserves content not the structure
Native File Formats◦ Several file formats ◦ May render differently depending on the device or platform
used PDF
◦ Widely adopted◦ Feature rich◦ Reliable and secure
File Formats
PDF/A is intended to address three primary issues:◦ Define a file format that preserves the static
visual appearance of electronic documents over time
◦ Provide a framework for recording metadata about electronic documents
◦ Provide a framework for defining the logical structure and semantic properties of electronic documents
PDF/A
Guarantees the secure reproduction of documents◦ No technology requirements
Ensures an homogeneous archive◦ Digital born and scanned documents in same
archive Valid throughout the world
◦ ISO maintained standard Sustainable file format
◦ Standards exist, files are self-documenting, adoption
Why PDF/A?
37% still have separate image and electronic archives
Native (eg, DOC, XLS)
HTML (eg, emails, web)
TIFF
JPEG
PDF/A
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Page 24
Records ArchiveDo you store a significant proportion of your records in any of the following formats?
PDF/A making some ground at 30%.
Native formats still very prevalent.
N=144, all respondents .
Still using PDF
Mostly using native formats
File size is too large
Files contain multimedia
Files contain digital signatures
Files contain XML
We don't have any documents worth archiving
0% 10% 20% 30% 40% 50% 60% 70% 80%
Page 25
PDF/AWhat are the main reasons you are not using PDF/A?
PDF/A benefits still not understood
N=102, Non-PDF/A Users.
“This International Standard specifies how to use the Portable Document Format (PDF) ISO 32000-1 for long-term preservation of electronic documents”
◦ Applicable to documents containing character, raster, and vector data
◦ The standard does not address: Processes for generating PDF/A files Specific implementation details of rendering PDF/A files Methods for storing PDF/A files Hardware and software dependencies
ISO/DIS 19005-2
Additional features in ISO 32000-1 (PDF 1.7)◦ PDF/A-1 based on PDF 1.4
JPEG 2000 Image Conversion◦ Added compression process (PDF 1.5)◦ Higher compression rates, better quality
Embedding PDF/A within Collection◦ Compile PDF/A collections
Transparency◦ Permitted in PDF/A-2
Digital Signatures◦ Follow ETSI/PadES Standard
PDF Layers (“Optional Content”)◦ Helpful for technical drawings◦ Multilingual content
What is in PDF/A-2?
Two Conformance Levels◦ PDF/A-1a and PDF/A-2a
Compliance with all requirements of 19005-1 Including those regarding structural and semantic tagging
◦ PDF/A-1b and PDF/A-2b Compliance with all requirements of 19005-1 minimally
necessary to preserve the visual appearance of a PDF/A file◦ PDF/A-2u
Compliance with all requirements of 19005-2 except those requirements for logical structure of the document
Preserves the visual appearance of the file and ensures any text in the document can be reliably extracted as a series of Unicode code points.
PDF/A Conformance
Will not replace or supersede PDF/A-1 Few tools will be available initially Look at new features Understand your requirements – then
decide PDF/A-1 is and will remain a valid file type
Considerations PDF/A-2
Centralized resource
Outsource service provider (BPO) onshore
Offshore service provider
Distributed to point of use/line of business
No plans to back-file convert
0% 10% 20% 30% 40% 50% 60%
Page 30
Backfile Conversion to PDF/AHow would you characterize your strategy to convert
your existing documents to PDF/A?
32% driving back-conversion centrally
N=40, PDF/A users.
Within 1 year
Within 2 years
Within 3 years
Within 5 years
Unlikely
I’ve not heard of PDF/A-2
0% 5% 10% 15% 20% 25% 30% 35% 40%
How soon do you plan to converge to PDF/A-2, when it is published?
Backfile Conversion to PDF/A-2
One third of PDF/A users have not heard of PDF/A-2
Another third will converge to PDF/A-2 in 3 years or less.N=40, PDF/A users.
Software used to view documents
Software used to create documents
Electronic document files
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Always Probably Possibly No
Would you reject a tool or application that was not tested to a conformance standard ?
PDF/A-2 Tools
80% expect to use conformance certified creation tools.
N=40, PDF/A users.
Document management – Electronic document file format for long-term preservation including embedded files – Part 3: Use of ISO 32000-1 (PDF/A-3)
Specifies the use of PDF for preserving the static visual representation of page based electronic documents over time in addition to allowing any type of other content to be included as an embedded file or attachment
ISO/NWI/CD 19005-3 (PDF/A-3)
AccessibilityDoes your content need to be accessible (able to be accessed and read by assistive technologies)?
There is a recognition of accessibility regulations.
N=144, all respondents .
Always
Some of it
Should be but isn't
No
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Document management applications – Electronic document file format enhancement for accessibility (PDF/UA) – Use of ISO 32000-1 (PDF/UA-1)
Specifies how to use PDF to produce electronic documents which are accessible
Does not specify:◦ Processes for converting paper or electronic
documents◦ Storage of PDF/UA documents◦ Specific design, user interface, implementation or
other details for rendering
ISO/CD 14289-1 (PDF/UA)
Document management – Engineering document format using PDF – Part 1: Use of PDF 1.6 (PDF/E-1)
Specifies the use of PDF for the creation of documents used in engineering workflows. It does not define:◦ Method of electronic distribution◦ Method of creation or conversion from paper or
electronic documents to the PDF/E format◦ Specific technical design, user interface, or
implementation◦ Required hardware or methods for validation
ISO 24517-1:2008 (PDF/E)
Addresses need for reliable exchange of engineering documentation◦ Secure distribution of intellectual property◦ Reliable exchange and change management (multiple
application types and platforms)◦ Reduces costs associated with paper (distribution as well as
storage/archive) Covers 3 primary areas:
◦ Compact, accurate printing of engineering drawings◦ Support for exchanging/managing annotation and comment
data◦ Incorporation of complex data into PDF (3D, object level data,
etc.) Part 2 – Update to ISO 32000-1 and archive capabilities
ISO 24517-1: 2008 (PDF/E)
Document management – 3D use of Product Representation Compact (PRC) format – Part 1: PRC 10001
Describes a file format for 3D content data for the purposes of 3D visualization and exchange.
Used for creating, viewing and distributing 3D data in a document exchange workflow
ISO/CD 14739-1 (PRC)
39
What is PDF Healthcare? A “Best Practices Guide” describing
attributes of the Portable Document Format (PDF) to facilitate the capture, exchange, preservation and protection of healthcare information◦ Share data easily between healthcare
institutions◦ Ease the transition into digital health records
for information exchange and sharing◦ Bridge the gap between healthcare providers
and consumers
40
PDF Healthcare Background eHealthcare is a reality in today’s environment PDF advantages in healthcare
◦ Long-standing success and adoption of PDF◦ PDF provides a secure and universal container for
multiple data types regardless of data source or destination
◦ PDF is platform- and system-neutral◦ PDF allows for interoperability and bi-directional
information exchange◦ Selected records can be easily and quickly printed from
PDF when necessary
41
Initial PDF Healthcare Offering Best Practices Guide
◦ Describes the attributes of the Portable Document Format (PDF) that are relevant to facilitate the capture, exchange, preservation and protection of healthcare information
Implementation Guide / Use Cases◦ Supplemental information that will provide
examples of interoperability with existing healthcare standards such as ASTM’s Continuity of Care Record (CCR)
42
Additional PDF Healthcare Offering
PDF Healthcare Supporting the Clinical Document Architecture: White Paper◦ Discusses the implementation of PDF Forms in
support of the HL7 Clinical Document Architecture (CDA) to simplify, secure, and speed transactions between entities with varying levels of automation
Creating PDF Forms for the CDA: Implementation guide◦ Supplemental information that will provide
examples of various forms, i.e., Emergency Information Form for Children with Special Needs that support a subset of the CDA schema
Proposed Legislation – PDF/A
Alabama Alaska California (Repealed
10/19/2010) Connecticut Florida Idaho Kentucky
Missouri Nevada New York Ohio Wisconsin
PDF/A Adoption Europe
◦ Standard eBilling (Organisation for Promotion of Automated Accounting)
◦ Germany, France, Austria, Switzerland, Poland, Norway
Brazil China MoREQ2
U.S. Nuclear Regulatory Commission
U.S. District Courts NARA Library of Congress
PDF/A-1 compliance is not enough◦ Comply with NARA’s transfer instructions for
records in PDF◦ Provide transfer documentation◦ Must comply with image quality specifications for
transfer of permanent records◦ Must use OCR processes that do not alter the
original bit-mapped image
NARA Guidelines
Opportunities Conversion
◦ Paper based ◦ Electronic files to PDF subsets
Validation◦ Isartor Test Suite◦ Bavaria Report (PDFLib)◦ Adobe Acrobat Preflight
Data cleanup◦ Metadata◦ Embedding Fonts and images◦ Tagging
Consulting and recommending use of PDF/A Conversion of Healthcare records
Betsy Fanning Ph: +1.301.755.2682 Skype: betsy.fanning Email: [email protected] Twitter: bfanning LinkedIn: www.linkedin.com/in/betsyfanning PDF Standards – www.aiim.org/standards Get involved – Service Companies still
needed for AIIM’s National Standards Council (NSC)
Questions/Contact
http://www.mach2solutions.net/pdf/pdf.html
PDF Demonstration URL