1 lingdy february 14, 2012 tufs, tokyo david nathan endangered languages archive hans rausing...
Post on 14-Jan-2016
222 Views
Preview:
TRANSCRIPT
1
LingDyFebruary 14, 2012
TUFS, Tokyo
David NathanEndangered Languages Archive
Hans Rausing Endangered Languages ProjectSOAS, University of London
Data management(part 2)
2
Also (for Part 2)
creating a catalogue/inventory/index metadocumentation data/file versions transferring data sharing data backup character encoding
3
Different types of metadata
there are many types of metadata different types of materials may have
different metadata eg metadata for photos and videos may
have technical parameters, lists of people appearing
e.g. metadata for transcriptions may have date, version, who transcribed, notes on progress
4
Your collection catalogue
first, define your collection/corpus/project as some coherent (logical) set of materials
your collection catalogue/inventory/index is a type of metadata this should list and describe all files in your
collection it usually contains the categories of
information that are relevant for many files
5
Your collection catalogue
you could have one large catalogue that covers every file, or
you could have a catalogue that is subdivided according to types of files, and/or groups of resources
there is no “one size fits all” solution!
6
Examples
7
Making an “active” catalogue
this is not necessary, but may be useful if you use a spreadsheet, you can embed links
to actual files to make using your collection easier
8
Metadocumentation
you should keep an updated description of the methods, conventions, abbreviations you use
.. so somebody could fully understand (and use) your data and methods in your absence
example
9
Data/file versions
need to distinguish or keep versions depends on purposes
by suffixing filename, eg fugu1.txt
fugu2.txt, or fugu_1.txt
fugu_2.txt which of the above methods is better?
10
Data/file versions
fugu_14022013.txtfugu_20130214.txt14022013_fugu.txt20130214_fugu.txt
which of the above methods would be best?
note: do not rely on system dates!
11
Data/file versions
do you need to keep every version? often, fine to keep “original” plus current
if information is regularly updated, corrected you can keep 1 filename and put dates in the document itself, or record dates in a catalogue/metadata file
a series of files may have inherent value, e.g. your transcriptions/annotations, as your understanding and analysis changes, so date and keep files use different tiers in ELAN?
12
Transferring data
ensure your computer is not a “walled garden” you can use
drives/devices (but avoid DVDs!!) email upload (where available) send links “cloud” e.g. Dropbox
issues include cost, potential viruses, assuring integrity of copies, but generally little problem
13
Sharing
can we work in a shared, collaborative space? Dropbox Google Docs blogs, Tumblr etc can have shared
“authors”, and contributors with controlled roles
14
Character encoding
if your document contains anything other than those on a US keyboard, use UTF character encoding
how can I tell if characters in my MS Word document are encoded as UTF8? save as plain text and check options copy into plain text editor such as
Notepad++
15
Character encoding
useful tools Notepad++ http://notepad-plus-plus.org/
SIL ViewGlyph http://scripts.sil.org/cms/scripts/page.php?item_id=ViewGlyph_home
BabelMap http://www.babelstone.co.uk/software/babelmap.html
ExSite9 http://www.intersect.org.au/exsite9
16
Your projects
discuss in groups what are the problems or weaknesses in
our “data management plan” or data management methods?
top related