demystifying pd fs

48
Demystifying PDFs Betsy Fanning AIIM Nashville 2010

Upload: betsy-fanning

Post on 16-Jan-2015

1.432 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Demystifying pd fs

Demystifying PDFsBetsy Fanning

AIIM Nashville 2010

Page 2: Demystifying pd fs

Introduction to PDF Overview of PDF Standards Adoption of PDF Standards

Agenda

Page 3: Demystifying pd fs

951,000,000 PDF pages on Google

How Many PDF Files Are There?

Page 4: Demystifying pd fs

Introduction of PDF◦ Portable Document Format

Digital format for representing documents

PDF Files created Natively Converted from other electronic

formats Digitized from paper, microform, or

other format A specification for electronic files

representing documents specification for electronic files representing documents

Digital Documents

Page 5: Demystifying pd fs

Portable Document Format◦ Widely used world wide Business Government Libraries and archives

◦ Information must be kept for long periods of time◦ Must remain useable and accessible across

multiple generations of technology

PDF

Page 6: Demystifying pd fs

Reliable consistent viewing and printing Mix text, raster images, lineart, color Basic unit is the page Easy navigation, fast access to any page Small file size Dynamic

◦ Digital signatures◦ Forms

What is PDF?

Page 7: Demystifying pd fs

ISO 32000-1:2008, Document management – Portable Document Format – Part 1: PDF 1.7

2007 Adobe contacted AIIM to request assistance in taking PDF Specification to ISO

Exact replication of PDF Specification 1.7 including changes and amendments

ISO 32000-1: 2008 (PDF)

Page 8: Demystifying pd fs

Adds support for geospatial data Supports flash Added collections (portfolios) Allows for bar codes to be used with form fields Added structure elements for MathML Enhanced accessibility Incorporated ETSI TS 102 778 for digital

signatures Future – Reader improvements and possible

merging of PDF streams

ISO/CD 32000-2 (PDF 2.0)

Page 9: Demystifying pd fs

PDF is powerful and flexible May be too flexible for some applications Restrict subset of PDF Need higher degree of reliability May want standard in hands of neutral non-

commercial body – Internationally recognized standards body such as ISO

Focus on archive needs of government, corporations, libraries

Resolve issues with font embedding replacement

Why Standardize a Version of PDF

Page 10: Demystifying pd fs

Joint sponsors of the US PDF/A and PDF/E committees◦ AIIM, Association for Information and Image Management

Secretariat to ISO/TC 171 and ISO/TC 171/SC2 Secretariat to US Technical Advisory Group (TAG) for

ISO/TC 171

◦ NPES, The Association for Suppliers of Printing, Publishing, and Converting Technologies Secretariat to ANSI Committee for Graphic Arts

Technologies Standards (CGATS) Secretariat to US TAG for ISO/TC 130

Joint sponsors of PDF Healthcare committee◦ ASTM International

Role of AIIM and Partners

Page 11: Demystifying pd fs

ISO Joint Working Groups (JWG) for PDF Standards

◦ ISO/TC 171/SC 2, Document management applications – Application issues

◦ ISO/TC 130, Graphic technology

◦ ISO/TC 46/SC 11, Information and documentation – Archives/records management

◦ ISO/TC 42, Photography

◦ ISO TC 184/SC4, Automation systems and integration, Industrial data

◦ ETSI, European Telecommunications Standards Institute

◦ PDF/A Competence Center

Role of ISO

Page 12: Demystifying pd fs

Multi-part ISO International Standard◦ ISO 19005-1:2005, Document management –

Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1)

◦ Part 2 (19005-2) intended to bring PDF/A into conformance with ISO 32000

◦ Part 3 (19005-3) Embedded documents

◦ And additional future parts, as necessary

The PDF standard

Page 13: Demystifying pd fs

PDF/X, ISO 15930 *◦ Pre-press data exchange

PDF/A, ISO 19005 (Parts 1, 2 and 3)◦ Archiving electronic documents

PDF/E (Engineering), ISO 24517-1◦ For engineering, architectural, and GIS documents

PDF/E (Engineering), ISO/NWP 24517-2◦ Archive engineering, architectural, and GIS documents

PDF/UA (Universal Access), ISO/CD 14289-1◦ Intended to address Section 508 concerns

PDF Healthcare◦ Exchange of electronic health records (CDA and CCR)

PDF, ISO 32000-1 (ISO/CD 32000-2) PDF/VT, ISO 16612 (2 parts) *

◦ Variable data exchange PRC, Product Representation Compact (ISO/CD 14739-1)

* Not AIIM Responsibility

PDF Standards

Page 14: Demystifying pd fs

Graphic technology – Prepress digital data exchange – Use of PDF (PDF/X)

Specifies the use of PDF for the dissemination of complete digital data, in a single exchange, that contains all elements for final print reproduction.

ISO 15930 (PDF/X)

Page 15: Demystifying pd fs

Specifies how to use PDF to define and exchange all content elements and supporting metadata to produce predictable output for variable or transactional document content

ISO 16612 (PDF/VT)

Page 16: Demystifying pd fs

“This International Standard specifies how to use the Portable Document Format (PDF) 1.4 for long-term preservation of electronic documents”

◦ Applicable to documents containing character, raster, and vector data

◦ The standard does not address: Processes for generating PDF/A files Specific implementation details of rendering PDF/A files Methods for storing PDF/A files Hardware and software dependencies

ISO 19005-1:2005

Page 17: Demystifying pd fs

Court documents protect citizen’s rights Access is assured in trial courts for 20 to 40 years

for the Judiciary Access is often time sensitive On-site courthouse storage not cost effective Court decisions are permanent records held “until

the end of the republic” by the National Archives Document format conveys critical information,

which must be rendered accurately Cases – New York Southern, Enron, etc. 20 years of filings are in PDF

Background for PDF/AJudiciary Use Case

Page 18: Demystifying pd fs

6 years

12 years

20 years

50 years

Person lifetimes

Life of legal business entity

Forever/historical

0% 10% 20% 30% 40% 50% 60% 70% 80%

Page 18

Records ArchiveDo you have electronic records that need to be retained for:  (check all that apply)

Most organizations will be keeping some records for a very long time.

N=144, all respondents .

Page 19: Demystifying pd fs

  Native format PDF PDF/A TIFF XML XPS JPEG Digital video/ audio

Print and archive in hard copy

Not archived

at all

Scanned documents 3% 48% 8% 29% 0% 0% 5% 1% 1% 4%

Electronic documents 48% 27% 7% 6% 1% 0% 2% 0% 4% 5%

Photo images 20% 4% 0% 6% 0% 1% 59% 2% 1% 8%

Email 64% 7% 4% 2% 4% 0% 0% 1% 4% 14%

Video/CCTV recordings 23% 0% 0% 0% 0% 0% 1% 35% 1% 39%

Audio recordings 23% 0% 0% 0% 1% 0% 0% 35% 1% 40%

Web pages 33% 7% 0% 1% 13% 0% 0% 1% 1% 42%

Telephone recordings 15% 0% 0% 0% 0% 1% 0% 17% 0% 67%

Instant messages 15% 0% 1% 1% 1% 0% 0% 1% 1% 79%

Page 19

Archive File TypesHow are the following content types mostly archived in your organization?

N=139, all

Page 20: Demystifying pd fs

NARA defines:“…the ability to access an electronic record throughout its lifecycle, regardless of the technology used when it was originally created”

Characteristics of Sustainable Formats◦ Published documentation and open disclosure◦ Widespread adoption and use ◦ Self-describing formats◦ External Dependency◦ Impact of Patents◦ Technical Protection Mechanism

Sustainable Formats

Page 21: Demystifying pd fs

TIFF◦ Well known◦ Difficult to create digitally born documents◦ Indexing documents may be difficult

XML◦ Many schema exist◦ Preserves content not the structure

Native File Formats◦ Several file formats ◦ May render differently depending on the device or platform

used PDF

◦ Widely adopted◦ Feature rich◦ Reliable and secure

File Formats

Page 22: Demystifying pd fs

PDF/A is intended to address three primary issues:◦ Define a file format that preserves the static

visual appearance of electronic documents over time

◦ Provide a framework for recording metadata about electronic documents

◦ Provide a framework for defining the logical structure and semantic properties of electronic documents

PDF/A

Page 23: Demystifying pd fs

Guarantees the secure reproduction of documents◦ No technology requirements

Ensures an homogeneous archive◦ Digital born and scanned documents in same

archive Valid throughout the world

◦ ISO maintained standard Sustainable file format

◦ Standards exist, files are self-documenting, adoption

Why PDF/A?

37% still have separate image and electronic archives

Page 24: Demystifying pd fs

Native (eg, DOC, XLS)

PDF

HTML (eg, emails, web)

TIFF

JPEG

PDF/A

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Page 24

Records ArchiveDo you store a significant proportion of your records in any of the following formats?

PDF/A making some ground at 30%.

Native formats still very prevalent.

N=144, all respondents .

Page 25: Demystifying pd fs

Still using PDF

Mostly using native formats

File size is too large

Files contain multimedia

Files contain digital signatures

Files contain XML

We don't have any documents worth archiving

0% 10% 20% 30% 40% 50% 60% 70% 80%

Page 25

PDF/AWhat are the main reasons you are not using PDF/A?

PDF/A benefits still not understood

N=102, Non-PDF/A Users.

Page 26: Demystifying pd fs

“This International Standard specifies how to use the Portable Document Format (PDF) ISO 32000-1 for long-term preservation of electronic documents”

◦ Applicable to documents containing character, raster, and vector data

◦ The standard does not address: Processes for generating PDF/A files Specific implementation details of rendering PDF/A files Methods for storing PDF/A files Hardware and software dependencies

ISO/DIS 19005-2

Page 27: Demystifying pd fs

Additional features in ISO 32000-1 (PDF 1.7)◦ PDF/A-1 based on PDF 1.4

JPEG 2000 Image Conversion◦ Added compression process (PDF 1.5)◦ Higher compression rates, better quality

Embedding PDF/A within Collection◦ Compile PDF/A collections

Transparency◦ Permitted in PDF/A-2

Digital Signatures◦ Follow ETSI/PadES Standard

PDF Layers (“Optional Content”)◦ Helpful for technical drawings◦ Multilingual content

What is in PDF/A-2?

Page 28: Demystifying pd fs

Two Conformance Levels◦ PDF/A-1a and PDF/A-2a

Compliance with all requirements of 19005-1 Including those regarding structural and semantic tagging

◦ PDF/A-1b and PDF/A-2b Compliance with all requirements of 19005-1 minimally

necessary to preserve the visual appearance of a PDF/A file◦ PDF/A-2u

Compliance with all requirements of 19005-2 except those requirements for logical structure of the document

Preserves the visual appearance of the file and ensures any text in the document can be reliably extracted as a series of Unicode code points.

PDF/A Conformance

Page 29: Demystifying pd fs

Will not replace or supersede PDF/A-1 Few tools will be available initially Look at new features Understand your requirements – then

decide PDF/A-1 is and will remain a valid file type

Considerations PDF/A-2

Page 30: Demystifying pd fs

Centralized resource

Outsource service provider (BPO) onshore

Offshore service provider

Distributed to point of use/line of business

No plans to back-file convert

0% 10% 20% 30% 40% 50% 60%

Page 30

Backfile Conversion to PDF/AHow would you characterize your strategy to convert

your existing documents to PDF/A?

32% driving back-conversion centrally

N=40, PDF/A users.

Page 31: Demystifying pd fs

Within 1 year

Within 2 years

Within 3 years

Within 5 years

Unlikely

I’ve not heard of PDF/A-2

0% 5% 10% 15% 20% 25% 30% 35% 40%

How soon do you plan to converge to PDF/A-2, when it is published?

Backfile Conversion to PDF/A-2

One third of PDF/A users have not heard of PDF/A-2

Another third will converge to PDF/A-2 in 3 years or less.N=40, PDF/A users.

Page 32: Demystifying pd fs

Software used to view documents

Software used to create documents

Electronic document files

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Always Probably Possibly No

Would you reject a tool or application that was not tested to a conformance standard ?

PDF/A-2 Tools

80% expect to use conformance certified creation tools.

N=40, PDF/A users.

Page 33: Demystifying pd fs

Document management – Electronic document file format for long-term preservation including embedded files – Part 3: Use of ISO 32000-1 (PDF/A-3)

Specifies the use of PDF for preserving the static visual representation of page based electronic documents over time in addition to allowing any type of other content to be included as an embedded file or attachment

ISO/NWI/CD 19005-3 (PDF/A-3)

Page 34: Demystifying pd fs

AccessibilityDoes your content need to be accessible (able to be accessed and read by assistive technologies)?

There is a recognition of accessibility regulations.

N=144, all respondents .

Always

Some of it

Should be but isn't

No

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

Page 35: Demystifying pd fs

Document management applications – Electronic document file format enhancement for accessibility (PDF/UA) – Use of ISO 32000-1 (PDF/UA-1)

Specifies how to use PDF to produce electronic documents which are accessible

Does not specify:◦ Processes for converting paper or electronic

documents◦ Storage of PDF/UA documents◦ Specific design, user interface, implementation or

other details for rendering

ISO/CD 14289-1 (PDF/UA)

Page 36: Demystifying pd fs

Document management – Engineering document format using PDF – Part 1: Use of PDF 1.6 (PDF/E-1)

Specifies the use of PDF for the creation of documents used in engineering workflows. It does not define:◦ Method of electronic distribution◦ Method of creation or conversion from paper or

electronic documents to the PDF/E format◦ Specific technical design, user interface, or

implementation◦ Required hardware or methods for validation

ISO 24517-1:2008 (PDF/E)

Page 37: Demystifying pd fs

Addresses need for reliable exchange of engineering documentation◦ Secure distribution of intellectual property◦ Reliable exchange and change management (multiple

application types and platforms)◦ Reduces costs associated with paper (distribution as well as

storage/archive) Covers 3 primary areas:

◦ Compact, accurate printing of engineering drawings◦ Support for exchanging/managing annotation and comment

data◦ Incorporation of complex data into PDF (3D, object level data,

etc.) Part 2 – Update to ISO 32000-1 and archive capabilities

ISO 24517-1: 2008 (PDF/E)

Page 38: Demystifying pd fs

Document management – 3D use of Product Representation Compact (PRC) format – Part 1: PRC 10001

Describes a file format for 3D content data for the purposes of 3D visualization and exchange.

Used for creating, viewing and distributing 3D data in a document exchange workflow

ISO/CD 14739-1 (PRC)

Page 39: Demystifying pd fs

39

What is PDF Healthcare? A “Best Practices Guide” describing

attributes of the Portable Document Format (PDF) to facilitate the capture, exchange, preservation and protection of healthcare information◦ Share data easily between healthcare

institutions◦ Ease the transition into digital health records

for information exchange and sharing◦ Bridge the gap between healthcare providers

and consumers

Page 40: Demystifying pd fs

40

PDF Healthcare Background eHealthcare is a reality in today’s environment PDF advantages in healthcare

◦ Long-standing success and adoption of PDF◦ PDF provides a secure and universal container for

multiple data types regardless of data source or destination

◦ PDF is platform- and system-neutral◦ PDF allows for interoperability and bi-directional

information exchange◦ Selected records can be easily and quickly printed from

PDF when necessary

Page 41: Demystifying pd fs

41

Initial PDF Healthcare Offering Best Practices Guide

◦ Describes the attributes of the Portable Document Format (PDF) that are relevant to facilitate the capture, exchange, preservation and protection of healthcare information

Implementation Guide / Use Cases◦ Supplemental information that will provide

examples of interoperability with existing healthcare standards such as ASTM’s Continuity of Care Record (CCR)

Page 42: Demystifying pd fs

42

Additional PDF Healthcare Offering

PDF Healthcare Supporting the Clinical Document Architecture: White Paper◦ Discusses the implementation of PDF Forms in

support of the HL7 Clinical Document Architecture (CDA) to simplify, secure, and speed transactions between entities with varying levels of automation

Creating PDF Forms for the CDA: Implementation guide◦ Supplemental information that will provide

examples of various forms, i.e., Emergency Information Form for Children with Special Needs that support a subset of the CDA schema

Page 43: Demystifying pd fs

Proposed Legislation – PDF/A

Alabama Alaska California (Repealed

10/19/2010) Connecticut Florida Idaho Kentucky

Missouri Nevada New York Ohio Wisconsin

Page 44: Demystifying pd fs

PDF/A Adoption Europe

◦ Standard eBilling (Organisation for Promotion of Automated Accounting)

◦ Germany, France, Austria, Switzerland, Poland, Norway

Brazil China MoREQ2

U.S. Nuclear Regulatory Commission

U.S. District Courts NARA Library of Congress

Page 45: Demystifying pd fs

PDF/A-1 compliance is not enough◦ Comply with NARA’s transfer instructions for

records in PDF◦ Provide transfer documentation◦ Must comply with image quality specifications for

transfer of permanent records◦ Must use OCR processes that do not alter the

original bit-mapped image

NARA Guidelines

Page 46: Demystifying pd fs

Opportunities Conversion

◦ Paper based ◦ Electronic files to PDF subsets

Validation◦ Isartor Test Suite◦ Bavaria Report (PDFLib)◦ Adobe Acrobat Preflight

Data cleanup◦ Metadata◦ Embedding Fonts and images◦ Tagging

Consulting and recommending use of PDF/A Conversion of Healthcare records

Page 47: Demystifying pd fs

Betsy Fanning Ph: +1.301.755.2682 Skype: betsy.fanning Email: [email protected] Twitter: bfanning LinkedIn: www.linkedin.com/in/betsyfanning PDF Standards – www.aiim.org/standards Get involved – Service Companies still

needed for AIIM’s National Standards Council (NSC)

Questions/Contact

Page 48: Demystifying pd fs

http://www.mach2solutions.net/pdf/pdf.html

PDF Demonstration URL