seminario maurizio agelli, 20-09-2012
TRANSCRIPT
Archiving and Cataloging Digital Photographs
Maurizio Agelli, CRS4
September 20th 2012, 5.30pm
Aula Magna Facoltà di Architettura - Via Corte d'Appello - Cagliari
Point de vue du Gras, Nicéphore Niépce, 1826 (from Wikimedia Commons)
Boulevard du Temple, Louis Daguerre, 1838 (from Wikimedia Commons)
The first photograph was taken less than 200 years ago ...
How many photos have ever been
taken ?
[ source: Jonathan Good, 2011 - 1000memories.com ]
500 to 800 billiontaken in 2011 [source: Observatoire des Professions de l'Image ]
Number of photos ever shot (up to 2011): ~3.5 x 1012
Presentation Outline
1) Archiving as part of the photographic workflow
2) Describing photographs: metadata
3) Organizing images in catalogs
4) Ensuring long-term storage: backup and migration
5) An overview of image archiving tools
6) A Digital Asset Management platform developed at CRS4
- 1 -
Archiving as part of the photographic workflow
Photo Archive
A collection of images kept in secure, long-term storage.
[ dpBestflow.org ]
Pho
to b
y S
eew
eb -
CC
BY
-SA
2.0
Pho
to b
y M
.Age
lli -
CC
BY
-SA
2.0
Building a digital photo archiveinvolves many decisions ...
File formats
Metadata File naming
Folder structure
Catalog organization
Backup policies
Archiving platform
... which strongly depend on the photographic workflow
Migration policies
What to archive ?
A general workflow
Capture Ingestion Working Publishing
Archive
No single workflow suits all photographers and all clients [UPDIG]
Workflow decisions are determined by volume production, turnaround, image quality requirements, regulations, costs, etc..
A general workflow, more in detail
Capture Ingestion Working Publishing
Archive
camera computer
All camera-related stuff
- Image transfer- File renaming- Add bulk metadata- Batch editing- Format conversion
Focus on volume and speed
- Image editing- Metadata editing- Create derivative work
Focus on quality
- Export images- Print images- Publish to web
Store, search, organize, ...
Digital Asset Management Platform
File formats / 1
Camera sensor
In-camera processing
Scanner
TIFFJPEG(DNG)
RAWJPEG(DNG)(TIFF)
Film
RAWMany RAW formats (>200).Proprietary, undocumented.Encodes values from camera sensor, before demosaicing (12-16 bit/pixel, 1 color/pixel) .Lossless. May be compressed.
TIFFOpen standard.8, 16, 32 bit RGBLossless, big file size !Possible PSD replacement (supports layers).
DNG (DIGITAL NEGATIVE)
Open standard, created by Adobe.Targeted to replace RAW, but stilllimited adoption by the industry.
File formats / 2
JPEGOpen standardCompressed, lossy8 bit RGB: suitable for displaying, not good for editing
~35 MB
~5.3 MB
TIFF48 bit / pixel
uncompressed
NEF12 bit / pixelcompressed
JPEG 2000Better compression than Jpeg (wavelet transform vs. cosine transform)8, 16 bit RGBLossless / lossyMany extra features: regions of interest, progressive decoding, multi-resolution decoding.
Example: 6Mpixel image (Nikon D40)
~5 MB
DNG12 bit / pixelcompressed
JPEG90%
quality
~0.6 MB
File formats and image editing
CAMERAPARAMETRIC EDITING
RASTER EDITING
EXPORTRAW RAW or DNG JPG
TIFF or DNGEXPORT
JPG
Parametric Image EditingImage data are not modified.Source file is preserved. Editing is saved as a list of rules which are applied at rendering time.(e.g. Lightroom, Aperture)
Raster Image EditingImage pixels are modified.A new file containing the edited image shall be saved in order to preserve the original.(e.g. Photoshop, Picture Window Pro)
TIFF or DNG
CAPTURE
INGESTION
WORKING
File formats decision tree
PUBLISHINGJPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG
JPG JPG TIFF RAW TIFF DNG TIFF JPG DNG TIFF JPG DNG TIFF
JPG JPG TIFF RAW DNG JPG DNG JPG DNG TIFF
JPG RAW DNG TIFF
CAMERA SCANNER
Note: unusual decision paths have been omitted
Capture Ingestion Working Publishing
A r c h i v e
Which files to archive?
ORIGINALFILES
MASTERFILES
DERIVATIVEFILES
- 2 -
Metadata
The importance of metadata
"An image is worth 1000 words", but ...
... there are questions which only words can answer:
When was it shot?
... and where?
Who are those people?
Who took this photograph ?
Can I use it freely ?
Pho
to b
y M
auriz
io A
gelli
- C
C B
Y-S
A 2
.0
Metadata
Information about content.
Pho
to b
y M
. Age
lli -
CC
BY
-SA
2.0
A more precise definition
METADATA
"Structured encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities"
[source American Library Association]
Image metadata is nothing new ...
Pho
to b
y an
yjaz
z65
[ CC
BY
-NC
2.0
] ht
tp://
ww
w.fl
ickr
.com
/pho
tos/
4902
4304
@N
00/
Where digital image metadata can be written?
image data
metadata
+image data
metadata
○ inside the image file
○ in a sidecar file
○ in a database○ in an online registry○ in the file name
d40-20120920-DSC_0153-edited.jpgcamera date id derived
Image metadata standards
EXIFIPTC
XMPMpeg-7
DICOM
PLUS
Creative Commons
Dublin Core
IPTC IIMInformation Interchange ModelCreated in 1991 by International Press Communication CouncilAdobe defined the mechanism for embedding IPTC IIM metadata in image files (1994)Driven by NEWS INDUSTRYFocused on high-level properties (description, geo location, ...) Cannot be extended
EXIFExchangeable Image File FormatCreated in 1995 by Japan Electronic Industries Development AssociationDriven by CAMERA MANUFACTURERSFocused on low-level properties (camera settings, geo coordinates, date/time, ...) Cannot be extended
Image Data
EXIF
IPTC IIM
XMPExtensible Metadata PlatformOpen standard, created by Adobe○ defines a data model and a
serialization model (RDF/XML)○ also covers video, audio, text○ structured as a set of schemas○ can be extended with new
metadata schemas○ multi-lingual qualifiers○ can be serialized and stored in
most file formats (not in RAW!)○ it is widely supported by the
industry
Image Data
EXIF
IPTC IIM
XMP
Legacy Metadata
Dublin Core
XMP Basic
Rights
Media Mng
Photoshop
Camera RAW
EXIF
IPTC Core
IPTC Extens.
...
A timeline of image standards
1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2001
First DSLR(Kodak DCS-100)
professional DSLRs
EXIF(first release)
JPEG(first release)
Kodak Photo CD
TIFF(first release)
IPTCIIM
IPTCHeaders(Adobe)
XMP(first release)
consumer DSLRs
A quick look inside XMP>200 properties + all EXIF and IPTC properties
TITLE (dc:title)DESCRIPTION (dc:description)DESCRIPTION WRITER (photoshop:CaptionWriter)RATING (xmp:Rating)KEYWORDS (dc:subject)GEO COORDINATES (exif:GPSLatitude, exif:GPSLongitude)LOCATION (photoshop:Country, photoshop:State, photoshop:City,..)AUTHOR (dc:creator, exif:Artist)RIGHTS (xmp:Rights).....
A quick look inside XMPDate/Time Metadata
The originalpainting( ~1507)
Iptc4xmpExt:AODateCreated
An ancient postcard(1925)
photoshop:DateCreated
The digital representationof the postcard(2008)
xmp:CreateDate
The archived image (metadata last edited in 2012)
xmp:MetadataDate
Extending XMPCreative CommonsCC provides a legal and technical infrastructure to help people share knowledge and creativity.
Pho
to b
y C
reat
ive
Com
mon
s C
C B
Y 3
.0
CC defines a set of properties that allow authors to specify under which conditions their content can be distributed and used.
CC recommends XMP for embedding CC properties inside resources.
Extending XMPPLUS
Picture Licensing Universal SystemNon-profit organization whose mission is to simplify and facilitate the communication and management of image rights.PLUS Registry○ unique ids for creators, right holders, images, ...○ access to rights information and other metadataPLUS License Data Format (LDF)○ metadata schema for embedding image license○ 88 properties○ dedicated XMP PLUS namespace
Extending XMPPRISM
Publishing Requirements for Industry Standard MetadataDefined by IDEAlliance, a global community of content and media creators.PRISM Metadata for Images provides information about:○ objects pictured (manufacturer, model, description, ...)○ slideshows (sequences of images)○ shooting info (viewpoint, season, visual technique, ...)PRISM Advertising Metadata provides information about the usage of the image in an advertising campaignPRISM defines dedicated XMP namespaces: pmi and pam
Extending XMPArea Tagging
Metadata Working Group
○ XMP-MP Schema for face tags○ adopted by Picasa
Microsoft has created a new XMP schema for tagging people
Handling Social TaggingA research issue
[ source: Jonathan Good, 2011 - 1000memories.com ]
140 billion photos in Facebook (up to 2011)
- 3 -
Organizing images in catalogs
Pic
ture
by
Hen
ry T
rotte
r, 20
05 -
Sou
rce:
Wik
imed
ia C
omm
ons
catalognouna list of the contents of a library or a group of libraries, arranged according to any of various systems
[ Dictionary.com ]
catalogv.tr.1. to make an itemized list of2. to classify (a book or publication, for
example) according to a categorical system
[ Dictionary.com ]
Photo Cataloging Software
Prime goals of Photo Cataloging Software:○ provide a secure, long-term storage○ find the images when you need them○ interoperate with other tools of the same ecosystem (in
the present, as well as future)
Photo Cataloguing Software falls into the broad domain of Digital Asset Management. Let's try grabbing some definitions ...
An ecosystem is made up of many parts that must not only coexist but also work with each other to survive. When all the elements work in concert, the system can thrive.(Peter Krogh, The DAM Book)
Digital Asset Management
a way of keeping an overview of your digital files and make sure they don't get lost or altered unintentionally [J.Jacobsen, T.Schlenker, L.Edwards, Implementing a DAM System, Elsevier]
the protocol for downloading, renaming, backing up, rating, grouping, archiving, optimizing, maintaining, thinning, and exporting files [P.Krog, The DAM Book, O'Reilly]
a complete toolbox to the author, publisher, and the end users of the media to efficiently utilize the assets [D.Austerberry, Digital Asset Management 2nd edition, Focal Press]
a term open to many definitions ...
... and whose scope goes beyond the domain of photography
Digital Libraries
Creative Industries Publishing
Enterprise Content Management
Core functionalitiesof a photo catalog / DAM software( will use these two terms interchangeably )
○ Import images○ Harvest metadata○ Manage metadata in a database ( + index for search)○ Synchronize metadata○ Export images○ Organize photos with hierarchical keywords○ Manage originals, masters and derivatives files as
different renditions of the same item
Extra functionalities such as file rename, raw converter, editor, publishing tools may be provided too.
Harvesting and synchronizing metadata
Image Data
EXIF
IPTC IIM
XMP
EXIF
IPTC IIM
.....
DatabaseHarvest
metadata
Synchronize metadata
Image Storageimport export
User Interface
Hierarchical keywords
Phot
o by
Isa
belle
Pal
atin
CC
BY-
SA 2
.0
○ typically mapped to dc:subject○ no semantic rules for describing the hierarchy,
special characters are used, e.g.:Organizations|Industry|ACME
Renditions / Version sets
Image Storageimport
export
ORIGINAL
MASTER (edited)
DERIVATIVES...
Different files related to the same image under certain circumstances shall be managed as a single item.
Covered by XMP-MM (Media Management)
Cataloging applications provide different solutions (e.g. stacking, version sets) 1 item, N renditions
- 4 -
Ensuring long-term storage:backup and migration
There are many causes of data loss
disk / hardware failure
viruses
lightning
transfer errorstheft
loss
fire
human errors
floods
Pho
to b
y Lu
cina
M -
CC
BY
-NC
2.0
Which files to backup
Original Files
Working Files
Derivative Files
Master Files
Catalog (DB)
PRIMARY STORAGE
1 2 3
ON-LINEBACKUP(e.g. NAS)
OFF-LINEBACKUP
OFF-SITEBACKUP
storage media are swapped at every backup
rsync (*)
A possible backup strategy for single user workflow
4
CLOUDBACKUP
(*) deleting files on the receiving side shall be disabled for ORIGINALS, MASTERS and DERIVATIVES 5 additional copy on CLOUD
Service (Amazon S3, Elephant Drive, Symform. ...)
additional copy ona remote NAS
Copy to optical storage(ORIGINALS, MASTERS, DERIVATEIVES)
Migration
○ file formats can become obsolete (just think what is happening to Kodak Photo CD ...)
○ storage evolves (higher capacity, higher speed, ...)○ solution:
○ monitoring the storage process○ conversion to newer and safer formats (e.g. DNG)○ periodical replacement of storage devices
Currently there are no permanent solutions for storing digital content. No media lasts forever, and file formats become obsolete. Migration must be considered as a necessary part of every storage strategy.
[ dpBestflow.org ]
- 5 -
An overview of image archiving tools and services
Image management applicationsApplication types
INGESTIONTOOL
CULLING APPLICATION
RASTER IMAGE
EDITOR
PARAMETRIC IMAGE
EDITOR
RAWPROCESSOR
SPECIAL PURPOSE
EDITOR
PUBLISHINGTOOLS
DEDICATEDPRINTING
SOFTWARE
Image Browser DAM
(Photo Catalog)
SCANNERSOFTWARE
Image management applicationsExamples
INGESTIONTOOL
CULLING APPLICATION
RASTER IMAGE
EDITOR
PARAMETRIC IMAGE
EDITOR
RAWPROCESSOR
SPECIAL PURPOSE
EDITOR
PUBLISHINGTOOLS
DEDICATEDPRINTING
SOFTWARE
Image Browser DAM
(Photo Catalog)
SCANNERSOFTWARE
Fast Picture Viewer
Photomatix
Picture Window Pro Photoshop
Lightroom
Vuescan
ApertureIDImager
Bridge
Adobe Camera Raw
ImageIngester Pro
Silverfast
QimageQuad Tone RIP
Bibble Pro
A few photo cataloging applications Product Notes Platforms Cost (EUR)
Adobe Lightroom 4 include Adobe Camera RAW, many export features WIN / MAC 130
Photo Supreme (formerly known as IDIMAGER)
very powerful catalog explorer, multiuser DB WIN / MAC 80
Phase One Media Pro (formerly known as Expression Media, formerly as iView)
WIN / MAC ~85
Apple Aperture 3 MAC 63
Corel AfterShot Pro (formerly known as Bibble Pro)
WIN / MAC ~50
Digikam Software Collection 3
RAW processing based on dcraw, rendition support from version 2
Linux free
Picasa 3.9 WIN / MAC free
PicaJet basic editing, multiuser DB WIN ~50
Common features:○ parametric editor, with possibility to use an external editor○ XMP support (with some issues when exporting/importing keyword hierarchies)○ some kind of rendition support○ trial period (typically 30 days)
Multi-user photo management
○ commercial○ Daminion http://daminion.net/○ Canto Cumulus http://www.canto.com/○ Celum http://www.celum.com/
○ open-source○ ZenPhoto (GPL)○ Montala Resource Space (BSD)○ Gallery (GPL)○ Razuna (AGPL)○ NotreDAM (GPL3)
- 6 -
NotreDAM:an open-source DAM
platform developed at CRS4
Bibliography
References
1. Jonathan Good - How many photos have ever been taken? - September 15, 2011 - http://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox
2. Observatoire des Professions de l'Image - Les chiffres officiels 2010 du marché de la photo et de l'image en France et dans le Monde - http://www.sipec.org/pdf/OPI2011.pdf
3. UPDIG Photographers Guidelines v4.0 - Universal Photographic Imaging Guidelines - http://www.updig.org/pdfs/updig_photographers_guidelines_v40.pdf
4. dpBestflow.org Best Practices - http://dpbestflow.org/links/32 5. Maurizio Agelli, Maria Laura Clemente, Mauro Del Rio, Daniela Ghironi,
Orlando Murru and Fabrizio Solinas, CRS4 - NotreDAM, a multi-user, web based Digital Asset Management platform - TPDL 2011 Conference on Theory and Practice of Digital Libraries, Berlin http://notredam.org/wp-content/uploads/2012/02/TPDL2011-notredam-demo.pdf
6. MS Windows Dev center - People tagging Overview - http://msdn.microsoft.com/en-us/library/windows/desktop/ee719905(v=vs.85).aspx#_people_tagging
Metadata Standards
○ Exchangeable image file format for digital still cameras: Exif Version 2.3 http://www.cipa.jp/english/hyoujunka/kikaku/pdf/DC-008-2010_E.pdf
○ IPTC Information Interchange Model (IIM), IIM Schema for XMP, Specification Version 1.0, Document Revision 1, 2008 http://www.iptc.org/std/IIM/4.1/specification/IPTC-IIM-Schema4XMP-1.0-spec_1.pdf
○ XMP Specification http://www.adobe.com/devnet/xmp.html○ Part 1: Data Model, Serialization and Core Properties○ Part 2: Additional Properties○ Part 3: Storage in Files
○ PLUS Technical Specification http://ns.useplus.org/go.ashx
○ PRISM 2.0 Specifications http://www.prismstandard.org/specifications/
○
Further reading
○ Peter Krogh - The DAM Book, Digital Asset Management for Photographers, 2nd edition - O'Reilly
○ Patti Russotti, Richard Anderson - Digital Photography Best Practices and Workflow - Focal Press
○ Metadata Working Group - Guidelines for Handling Image Metadata - http://www.metadataworkinggroup.org/specs/