htrc workshop 101 thatcamp gainesville april 24, 2014
TRANSCRIPT
HTRC Workshop 101
THATCamp GainesvilleApril 24, 2014
Outline
• HathiTrust and HathiTrust Research Center overview
• How to Use the HTRC Portal– Workset Builder– Algorithm Analysis
• Opportunities to connect you with the HathiTrust Research Center
HathiTrust “Wow” Numbers
• 11,135,776 total volumes• 5,801,121 book titles• 290,893 serial titles• 3,897,521,600 pages• 499 terabytes• 132 miles• 9,048 tons• Public Domain: 3,743,574 volumes(~34% of
total)http://www.hathitrust.org
Content Distribution
Dates
Language Distribution
The top 10 languages make up ~86% of all content
Board of Governors Executive Committee Executive Director
HathiTrustDigital Library
90+ partners
HathiTrustDigital Library
90+ partners
University of
Illinois
University of
Illinois
IndianaUniversity
IndianaUniversity
HathiTrustResearch
Center
HathiTrustResearch
Center
University of
Michigan
University of
Michigan
Data Copy
#1
Data Copy
#1
Data Copy
#2
Data Copy
#2
IndianaUniversity
IndianaUniversity
HathiTrust Collection Builder
HTRC Portal
www.hathitrust.org/htrc
Log in to HTRC Portal
Create a Log In
How To Start a Workset
Log In Again to Workset Builder
Workset Builder
Why Worksets?
• The result of a first-level, rough filter
• Better scale for intensive analytics
• Provides essential scope for certain analytics– Word frequency scope over Bacon’s essays
• Some tools (are trained to) work best on a narrow, homogeneous work-set
• Eliminate noise that would otherwise arise by asking questions across whole of HT
Workset Search
Select Items
Create Worksets
Analysis in the HTRC Portal
Choose Algorithm
Choose Collection(s) for Analysis
Run the Analysis…
Results!
View Results
Looking into the future
• Non-consumptive research on copyrighted texts
• Bookworm tool development: http://sandbox.htrc.illinois.edu/bookworm/
• Improvement of metadata through Workset Creation for Scholarly Analysis (WCSA) study
• Documentation and user guides forthcoming soon
Acknowledgements: HTRC Team
• HTRC @ Illinois (GSLIS and the University Library):
Stephen Downie, Tim Cole, Loretta Auvil, Sayan Bhattacharyya, Boris Capitanu, Colleen Fallaw, Katrina Fenlon, Harriett Green, Peter Organisciak, Megan Senseney, Craig Willis
• Indiana University: led by Beth Plale
Get Involved!
HTRC Announcements:htrc-announce-l @ list.indiana.edu
HTRC User Group:htrc-usergroup-l @ list.indiana.edu
Questions?
Harriett GreenEnglish and Digital Humanities Librarian
University of Illinois at Urbana-Champaign