cs 502: computing methods for digital libraries
DESCRIPTION
CS 502: Computing Methods for Digital Libraries. Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library. What are Digital Images?. Electronic snapshots taken of a scene or scanned from documents samples and mapped as a grid of dots or picture elements (pixels) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/1.jpg)
1
CS 502: Computing Methods for Digital Libraries
Lecture 9
Conversion to Digital Formats
Anne Kenney, Cornell University Library
![Page 2: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/2.jpg)
2
What are Digital Images?
• Electronic snapshots taken of a scene or scanned from documents
• samples and mapped as a grid of dots or picture elements (pixels)
• pixel assigned a tonal value (black, white, grays, colors), represented in binary code
• code stored or reduced (compressed)
• read and interpreted to create analog version
![Page 3: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/3.jpg)
Four Scanning Methods
Bitonal Grayscale
Color Special Treatment
![Page 4: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/4.jpg)
4
Digital Image Quality is Governed By:
• resolution and threshold
• bit depth
• image enhancement
• color management
• compression
• system performance
• operator judgment and care
![Page 5: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/5.jpg)
5
Resolution
• determined by number of pixels used to represent the image
• expressed in dots per inch (dpi)--actually dots/sq. inch
• increasing resolution increases level of detail captured and geometrically increases file size
![Page 6: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/6.jpg)
Effects of Resolution
600 dpi600 dpi
300 dpi300 dpi
200 dpi200 dpi
![Page 7: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/7.jpg)
7
Threshold Setting in Bitonal Scanning
defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white
![Page 8: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/8.jpg)
8
Effects of Threshold
threshold = 100
threshold = 60
![Page 9: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/9.jpg)
9
Bit Depth
• number of bits used to represent each pixel, typically 8 bits or more per channel
• representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel
00000000 = black
11111111 = white
![Page 10: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/10.jpg)
10
Bit Depth
• increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size
• affects resolution requirements
![Page 11: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/11.jpg)
11
Effects of Grayscale on Image Quality
3-bit gray 8-bit gray
![Page 12: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/12.jpg)
12
Image Enhancement
• can be used to improve image capture
• use raises concerns about fidelity and authenticity
![Page 13: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/13.jpg)
13
Effects of FiltersEffects of Filters
no filters usedno filters used
maximum maximum enhancementenhancement
![Page 14: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/14.jpg)
14
Image Editing
![Page 15: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/15.jpg)
15
Compression
• reduces file size for processing, storage, transmission, and display
• image quality may be affected by the compression techniques used and the level of compression applied
![Page 16: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/16.jpg)
16
Compression Variables
• lossless versus lossy compression
• proprietary vs. open schemes
• level of industry support
• bitonal vs. gray/color
![Page 17: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/17.jpg)
17
Common Compression Schemes• bitonal
– ITU Group 4: lossless – JBIG (ISO 11544): lossless– CPC: Lossy– DigiPaper
• grayscale/color– LZW, lossless– JPEG: lossy– Kodak Image Pac, “visually lossless”– Fractal and Wavelet compression
![Page 18: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/18.jpg)
18
Effects of JPEG Compression
300 dpi, 8-bit grayscaleuncompressed TIFF
JPEG 18.5:1 compression
![Page 19: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/19.jpg)
19
Compression Observations
• the richer the file, the more efficient and sustainable the compression
• the more complex the image, the poorer the compression
![Page 20: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/20.jpg)
20
Equipment used and its performance over time
• scanners offer wide range of capabilities to capture detail, dynamic range, and color
• scanners with same stated functionality can produce different results
• calibration, age of equipment, and environment affect quality
![Page 21: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/21.jpg)
21
Equipment used and its performance over time
• attributes and capabilities of monitor and/or printer are also factors
• assess quality visually and computationally– use targets– control QC environment– increasing availability of software to assess
resolution, tone, color, artifacts
![Page 22: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/22.jpg)
22
Image Capture:
Create digital objects rich enough to be useful over time in the most cost- effective manner.
![Page 23: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/23.jpg)
23
How to determine what’s good enough?
• Connoisseurship of document attributes
• Objective characterizations
• Translation between analog and digital– measurement to scanning requirement to
corresponding image metrics– e.g., detail sizeresolution MTF– tonal range bit depth signal-to-noise ratio
![Page 24: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/24.jpg)
24
Case Study
• Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics
• 600 dpi 1-bit capture adequately preserves informational content of text-based materials
![Page 25: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/25.jpg)
25
Ensuring Full Informational Capture: “No More, No Less”
cost
imag
e qu
ality
and
util
itydesired point of capture
![Page 26: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/26.jpg)
26
Create One Scan To Serve Multiple Uses
• Derive alternative formats/approaches to meet current and future information needs
• Base “derivative” requirements on document attributes, technical infrastructure, user requirements, and cost
• Understand technical links affecting presentation and utility of derivatives
![Page 27: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/27.jpg)
27
User Requirements
• completeness
• legibility
• speed of delivery
• “cooked” files
![Page 28: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/28.jpg)
28
Derivatives from a Digital Master
• the richer the image, the better the derivative– a derivative from a rich file is superior in
quality to one from a poorer scan– the richer the image, the better the image
processing
![Page 29: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/29.jpg)
![Page 30: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/30.jpg)
![Page 31: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/31.jpg)
monitor: 800 x 600 pixels
800
600
document: 8” x 10”, 200 dpi (1,600 x 2,000 pixels)
2,000pixels
1,600 pixels
document at 60 dpi480 pixels x 600 pixels
document at 100 dpi800 pixels x 1,000 pixels
![Page 32: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/32.jpg)
TIFF Uncompressed GGIF Compressed6:1 (NARA)6:1 (NARA)
JPEG Compressed 20:1 ( LC) Compressed
20:1 (LC)
Compression/File Format Comparison for Derivative Files
![Page 33: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/33.jpg)
33
Alternatives for Displaying Oversize Images
• File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix
• User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality
![Page 34: CS 502: Computing Methods for Digital Libraries](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f92550346895daa8196/html5/thumbnails/34.jpg)
34
Recommendations Coalescing• Intent of conversion drives decisions
– issues of access considered at conversion– notion of long-term utility and cross-institutional
resources gaining ground
• Access images will change with:– changing user needs and capabilities– changes in technologies: file formats, technical
infrastructure,compression, web browsers, processing programs, scaling routines