1 workshop goals delaman and dam-lr peter wittenburg mpi for psycholinguistics access management...
Post on 18-Dec-2015
225 views
TRANSCRIPT
1
Workshop GoalsDELAMAN and DAM-LR
Peter WittenburgMPI for Psycholinguistics
Access ManagementNijmegenNovember 2004
2
When did we start?
• it is just 5 years that we started in our discipline speaking about– large digital online collections
– standardizing the formats • XML was new and users were very skeptical• MPEG was and is something still not well understood
– open metadata to come to browsable and searchable domains
– using metadata to create well-organized archives
– interoperability
• LREC Athens 2000– first workshop on these issues
– start of the ISLE project (linguistic concepts, lexicon, metadata, …)
– start of the IMDI work
• in 2000 also first LDC workshop with OLAC as focus • little later DOBES was granted and E-Meld started
• this is very short time when you want to convince a community Access ManagementNijmegenNovember 2004
3
What did we achieve?
• have “large” on-line digital archives/collections/Digital Libraries– MPI ~40.000 session bundles / ~10 TB
– DOBES ~1.500 session bundles/ 1500 h
– AILLA
– PARADISEC
– Lund corpora
– also in HLT domain • LDC • ELRA • BAS
– also “traditional” archives (Phonogramm Archiv, NAA, …)
– etc
• some of us became “archivists” by practice • idea of web visibility and online accessibility spreads • despite archiving attempts: according to D. Schüller ~80% of the
digitized material is endangered Access ManagementNijmegenNovember 2004
4
What did we achieve?
• much evangelization and agreement about standards– DOBES workshops and documents
– LDC workshops and documents
– E-Meld workshops and excellent web-site
– ISLE workshops with IMDI result
– PARADISEC workshop with DELAMAN result
– HRELP workshops
– LREC workshops and contributions
– ACL workshops and contributions
– IASA/IAML conference
– etc
• “everyone” agrees with XML, UNICODE and linear PCM• “everyone” understands the relevance of schemas to make
linguistic structure and encoding explicit • wrt JPEG and MPEG we are shooting on a moving target, but
don’t yet have real alternativesAccess ManagementNijmegenNovember 2004
5
What did we achieve?
• created awareness about the need of metadata for visibility • created operational metadata infrastructures within 4 years
– structured IMDI for discovery and management
– OLAC for overall discovery
– gateways between the two domains
• however, still not satisfying situation – > 50 institutions are using IMDI (as far as we know)
– ?? institutions are providing OLAC records
– still only a small fraction of the language resources are visible
– MD creation is hard • it is work for others – although this increasingly often is wrong • it means cleaning up your own holding and figure out what is available • it means to write “correct” scripts and to learn new software • it means being disciplined
• have done our development job – have to continue dissemination• despite limitations we hope that people stick to what is out there
Access ManagementNijmegenNovember 2004
6
What did we achieve?
• interoperability is still a dream however …– have metadata gateways in our discipline (OLAC-IMDI)
– increasingly often tools are producing correct XML, UNICODE, …
– have filters for character encodings and formats although
we miss well-designed and comprehensive services
– have started with ontological work to tackle the linguistic aspects • GOLD ontology from E-Meld• ISO TC37/SC4 Data Category Registry • TDS (Dutch Typology Project) meta-language • EAGLES/ISLE/TEI specifications
• we are at the beginning• cannot speak yet about fully operational infrastructures
but there are islands like FIELD, LEXUS, ONTO-ELAN, …
Access ManagementNijmegenNovember 2004
7
Changing role of Language Archives
different groups of people contribute
Access ManagementNijmegenNovember 2004
The
Archive
different groups of people use the content
specialists maintain, unify, check quality, etc
• at the MPI it is understood that the archive is the capital to build on
• in the DOBES programme the point to make results explicit and accessible
• only works if we don’t have an “inert, dusty” archives – not an attractive perspective – hear more about this from D.Schüller
8
Vision for a single archive
Access ManagementNijmegenNovember 2004
MetadataTools
Archive Utility Layer
Domain ofRegistered Primary and Secondary Resources
Domain ofDescriptive Metadata
Primary Resources:TextsImagesSoundMovies
User
DataIngestion&
Management
UserAuthentication
AccessRights
Web-based Archive Exploration
AnnotationExploration
LexiconExploration
TextExploration
Ontological Knowledge
MediaAnnotation
(Web-based) Archive Enrichment
LexicalEncoding
WebCommentary
The Archive
done in progressto start
9
Everything ok – so let’s go home …
• what about the following scenario?
Access ManagementNijmegenNovember 2004
Raw Data
Metadata
Raw Data
Metadatadata exchange
for
data survival reasons
archive A archive B
10
Everything ok – so let’s go home …
• what about the following scenario?
Access ManagementNijmegenNovember 2004
Raw Data
Metadata
DOBES
Archive
Raw Data
Metadata
AILLA
Archive
my personalTrumai archive
AILLATrumai
DOBESTrumai
not just copies but result of own creative process
11
DELAMAN
Digital Endangered Languages and Music Archive Network
• loose network of “archives” sharing a set of visions such as
– want to exchange data automatically (list driven)
– want to allow people to create integrated virtual working spaces
– want to have an integrated access management domain
• first talks in Nijmegen and at HRELP workshops 2003• foundation at PARADISEC meeting in Sydney 2003
• no deep discussions about wishes in detail and implementation • therefore this workshop in Nijmegen
• it’s about future usage scenarios with distributed archives
Access ManagementNijmegenNovember 2004
12
DELAMAN / DAM-LR Map
• DELAMAN is an international network • DAM-LR
– Distributed Access Management for Language Resources
– 3 year EU project starting at 1.1.05 – yes we have money to start
– centered around the DELAMAN intentions
Access ManagementNijmegenNovember 2004
MPI
AILLA
EMELD
ANLC
LACITO
ELAR
PARADISEC
AMPM
LundINL
AIATSIS
13
Workshop
• want to get a deeper understanding of what “we” want • need good requirements specifications • want to get a deeper understanding what others are doing
– our ideas are not new – we share them with others
– Digital Library initiatives (FEDORA, …)
– GRID initiative(s) (SRB, GTK, …)
– compute/function/data GRID
• therefore we invited – linguists knowing about potential and real user wishes
– “archivists” knowing about maintaining large repositories
– technologists knowing about current and future developments
– some of us looked into the legal and ethical aspects
• at the end we should be ready to start Access ManagementNijmegenNovember 2004
14
Programme 1. Day
Access ManagementNijmegenNovember 2004
29.11. Setting the Framework
9.00 W. Klein Welcome
9.10 P. Wittenburg DELAMAN and Workshop Goals
9.40 D. Schüller Audiovisual archiving: Visions, Challenges, Strategies
10.15 Discussion
10.30 Coffee Break
Researcher Requirements
Kamp
11.00 T. Aristar/H. Dry Linguist Wishes
11.30 P. Austin/D. Nathan Linguist Wishes
12.00 G. Holton/H. Johnson Legal & Ethical Aspects
12.30 Lunch Break
Archivist Requirements
Strömquist
13.30 H. Johnson AILLA Setup and Implications
14.00 L. Barwick Paradisec Setup and Implications
14.30 Wittenburg/Skiba/Trilsbeek DOBES Setup and Implications
15.00 Coffee Break
Summary and Discussion
Strömquist
15.30 Uneson/Broeder/Strömquist Summary of Requirements
16.00 Questions and Discussion
17.00 W. Krull DOBES Program and the VW Foundation
17.15 Soddemann/Neumair/Verharen/Wbg Technology - Broad View
17.30
18.00 End
20.00 Joint Dinner at Kwok Paw
15
Programme 2. Day
Access ManagementNijmegenNovember 2004
30.11. Technology Components
Nathan
9.00 T. Soddemann (got the Billing Award) Web Services
9.40 D. Barry GRID Components
10.10 B. Kerver Authentication and Authorization Systems
11.00 Coffee Break
Nathan
11.30 L. Lannom Handle System
12.00 R. Moore Storage Resource Broker
12.45 Discussion
13.00 Lunch Break
Mapping Requirements and Technology
Aristar/Broeder
14.00 Aristar/Dry/Johnson/Barwick/… Understanding Technology Linguists/Archivists
14.15 Broeder/Nathan/Jacobson/Neumair/... Choice and Integration Aspects
14.30 Discussion
15.00 Coffee Break
15.30 Grand Summary and Open Discussion
16.00 Wittenburg Summary
16.30 Discussion
17.00 End
times not too strict – it’s a workshop
16
Let’s go …
Access ManagementNijmegenNovember 2004
The MPI team wishes us two interesting and highly interactive days in Nijmegen
Daan, Andreas Technology
Paul, Roman Archive
Peter ??