author(s): jeremy york, 2010 - open.michigan · 2016. 10. 5. · hathi trust a shared digital...

31
Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact [email protected] with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers.

Upload: others

Post on 24-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/

We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact [email protected] with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers.

Page 2: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Citation Key for more information see: http://open.umich.edu/wiki/CitationPolicy

Use + Share + Adapt

Make Your Own Assessment

Creative Commons – Attribution License

Creative Commons – Attribution Share Alike License

Creative Commons – Attribution Noncommercial License

Creative Commons – Attribution Noncommercial Share Alike License

GNU – Free Documentation License

Creative Commons – Zero Waiver

Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ

Public Domain – Expired: Works that are no longer protected due to an expired copyright term.

Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105)

Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain.

Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ

Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair.

To use this content you should do your own independent analysis to determine whether or not your use will be Fair.

{ Content the copyright holder, author, or law permits you to use, share and adapt. }

{ Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. }

{ Content Open.Michigan has used under a Fair Use determination. }

Page 3: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

HATHI TRUST A Shared Digital Repository

Building A Future By Preserving Our Past The Preserva*on Infrastructure of  

HathiTrust Digital Library    

Jeremy York IFLA 2010 

August 15, 2010 

Page 4: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Current Partners –  Columbia University –  New York Public Library –  University of California system –  CIC (CommiJee on Ins*tu*onal Coopera*on)      –  University of Virginia –  Yale University 

University of Chicago University of Illinois Indiana University University of Iowa University of Michigan  Michigan State University  

University of Minnesota Northwestern University  Ohio State University  Pennsylvania State University  Purdue University  University of Wisconsin‐Madison  

Page 5: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Mission  

•  To contribute to the common good by collec*ng, organizing, preserving, communica*ng, and sharing the record of human knowledge 

Page 6: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Universal Library 

Common Goal 

Single En*ty, Many Partners 

HathiTrust 

Page 7: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Goals 

•  Comprehensive collec*on •  Preserva*on…with Access •  Shared strategies 

–  Collec*on management, development –  Preserva*on –  Copyright –  Efficient user services 

•  Openness  

Page 8: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Content Distribu*on 

6,549,680 – Total volumes 1,300,896 – Public Domain 3,798,116 Book *tles 153,311 Serial *tles 

* As of August 13, 2010 

Page 9: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Language Distribu*on (1) 

* As of August 13, 2010 

Page 10: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Language Distribu*on (2) The next 40 languages make up ~13% of total 

* As of August 13, 2010 

Page 11: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Dates 

* As of August 13, 2010 

Page 12: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Content Growth 

Page 13: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Repository Philosophy/Design 

•  OAIS/TRAC •  Consistency •  Standardiza*on •  Simplicity (in design, not func*on) •  Prac*cality •  Sustainability 

Page 14: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of
Page 15: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Content 

•  Largely uniform in technical characteris*cs •  4 formats 

–  ITU G4 TIFF –  JPEG2000 –  JPEG – Unicode (with and without coordinates) 

Page 16: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Object Package 

images 

bib data bib 

data 

Source METS text 

HT METS 

Zip 

malachus, Flickr.com

Page 17: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Metadata 

•  Details and specifica*ons at repository level – Object specifica*ons / Valida*on criteria – Page‐tagging 

•  Varia*ons at object level – Files missing – Non‐valid files –  Incorrect file checksums 

hJp://www.hathitrust.org/digital_object_specifica*ons 

Page 18: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

•  Bibliographic Data – Must be present prior to content ingest – MARCXML, as complete as possible 

•  Content – Pre‐ingest –  Ingest 

Ingest 

Page 19: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Ingest (2) 

Pre‐ingest 

SIP 

Backend servers 

GROOVE 

Valida*on 

METS crea*on 

Package crea*on 

Handle crea*on 

‐  Evalua*on ‐  Determina*on   of standards ‐ Modifica*on /   Transforma*on  

‐  Ensure conformance ‐  Barcode ‐  Fixity ‐  Consistency ‐  Well‐formedness ‐  Prepare archival package 

 

Bibliographic data 

Page 20: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Archival Storage   

•  Reliability – ensure integrity •  Redundancy – in single and mul*ple sites •  Scalability – including ease of management •  Accessibility – for repository processes and services 

•  Planorm‐independence – for data/object management 

Page 21: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Media & Architecture 

Michigan 

Indiana 

Tape Backup 

Archival Storage •  Isilon Systems •  Load balancing and failover 

•  Ingest at Michigan, replicated to Indiana 

•  Replacement on 3‐4 year cycle 

Page 22: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Architecture & Management 

images 

bib data bib 

data 

Source METS text 

HT METS 

../uc1/pairtree_root/b3/54/34/86/b34543486 

b34543486.zip 

b34543486.mets.xml 

Example ids:  

wu.89094366434 mdp.39015037375253 

uc2.ark:/1390/t26973133 miua.aaj0523.1950.001 

malachus, Flickr.com

Page 23: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Data Management 

Rights Determina*on 

Rights Database 

Bibliographic Management 

System  

Copyright Review Management 

System 

‐ Inventory ‐  Loading and upda*ng    records ‐  Duplicate detec*on and    colla*on ‐  Solr indexes behind   VuFind catalog ‐  Source of informa*on   for Access services ‐ Rights determina*on    (automated and support   for manual review)   

Page 24: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Rights Database 

•  System of precedence 

 •  9 aJributes  •  11 reason codes 

Bibliographic (automa*c) 

Manual 1.  Conformance with formali*es 2.  Contractual agreements 3.  Access control overrides 

Page 25: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Access 

Rights Database 

Michigan 

Indiana 

Data Management 

Archival Storage 

Tab‐delimited Metadata files 

Collec>on Builder Index 

Rights Determina*on 

Bibliographic Management 

Full text Index 

VuFind Index 

Bibliographic Catalog 

Bibliographic API 

OAI sets 

Full text Search applica>on 

PageTurner 

Data API 

Collec>on Builder 

Page 26: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Content Access 

Rights Database 

Michigan 

Indiana 

Data Management 

Archival Storage 

Tab‐delimited Metadata files 

Collec*on Builder Index 

Rights Determina*on 

Bibliographic Management 

Full text Index 

VuFind Index 

Bibliographic Catalog 

Bibliographic API 

OAI sets 

Full text Search applica*on 

PageTurner 

Data API 

Collec*on Builder 

Page 27: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Search and Aggrega*on Access 

Rights Database 

Michigan 

Indiana 

Data Management 

Archival Storage 

Tab‐delimited Metadata files 

Collec>on Builder Index 

Rights Determina*on 

Bibliographic Management 

Full text Index 

VuFind Index 

Bibliographic Catalog 

Bibliographic API 

OAI sets 

Full text Search applica>on 

PageTurner 

Data API 

Collec>on Builder 

Page 28: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Metadata Access 

Rights Database 

Michigan 

Indiana 

Data Management 

Archival Storage 

Tab‐delimited Metadata files 

Collec*on Builder Index 

Rights Determina*on 

Bibliographic Management 

Full text Index 

VuFind Index 

Bibliographic Catalog 

Bibliographic API 

OAI sets 

Full text Search applica*on 

PageTurner 

Data API 

Collec*on Builder 

Page 29: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Source Undetermined

Page 30: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Thank you! hathitrust‐[email protected] 

Page 31: Author(s): Jeremy York, 2010 - Open.Michigan · 2016. 10. 5. · HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservaon Infrastructure of

Additional Source Information for more information see: http://open.umich.edu/wiki/CitationPolicy

Slide 16, Image 11: malachus, Flickr.com

Slide 22, Image 11: malachus, Flickr.com, http://www.flickr.com/photos/malachus/5152200478/

Slide 29, Image 0: Source Undetermined