txdhc openrefine training
TRANSCRIPT
intro/overview (15 min) walkthrough (45 min) intro to advanced (10 min) q&a (20 min)
http://www.txdhc.org/txdhc-training-webcast-materials/
Cleaning up data that is: in a simple tabular format is inconsistently formatted has inconsistent terminology
get an overview of a data set resolve inconsistencies split data up into more granular parts match local data up to other data sets enhance a data set with data from
other sources
…ask some questions about your data set: What type of data is it & what format is it in?
What’s the size of your data set?
What question do you want to ask your data?
What do you need to do to find the answer?
Excelfamiliarity, better for data entry, cut and paste operation, no paging to navigate
Google Spreadsheets similar to Excel, can get external data relatively easily, easy to collaborate and share
Google Fusion Tables if you just want to filter, easy to share
Text editor powerful text editor can do many things
Unix tools more challenging to use, but quick and some things (finding things, sorting) are easy
Writing code most sophisticated and most to learn!
Regular expressions “wildcards on steroids” that allow for
more granular data manipulation
(http://www.regular-expressions.info)
Retrieve data from online sources example: use names to retrieve birth/death dates
from Virtual International Authority File (VIAF)
Match data to external data sources using Extensions for RDF, DBpedia, Named-Entity
Recognition (NER), etc…
And ‘reconciliation’ services
Use ‘cross’ function to compare
contents of two Refine projects, or
share data between the two projects.
TxDHC blog post on this webinar http://www.txdhc.org/txdhc-training-
webcast-materials/
The OpenRefine Wiki https://github.com/OpenRefine/OpenRefine/wiki
OpenRefine User Documentation
https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users
The ‘Free your metadata’ site http://freeyourmetadata.org...
…and book http://book.freeyourmetadata.org
The OpenRefine mailing list and forum
http://groups.google.com/d/forum/openrefine
http://bit.ly/1uGPd0f
Please email us if you have any questions: Jennifer = [email protected]
Liz = [email protected]
credits * acknowledgements * citationsThese slides were developed by Jennifer Hecker ([email protected]) and Liz Grumbach ([email protected] ) on behalf of University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, Media and Culture, and the Texas Digital Humanities Consortium using many resources including the wonderful course material developed by Owen Stephens on behalf of the British Library (http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/).
Unless otherwise stated, all images, audio or video content are separate works with their own license, and should not be assumed to be CC-BY in their own right. This work is licensed under a Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/. It is suggested when crediting this work, you include the phrase “Developed by Liz Grumback and Jennifer Hecker on behalf of the university of Texas, Texas A&M, and the TXDHC.”
Thanks to University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, and the Texas Digital Humanities Consortium for facilitating this presentation.