power point

16
EMELD Resource Conversion WG 20030713 EMELD Resource Conversion WG 20030713 Toward Best Practice for Toward Best Practice for Language Resource Language Resource Conversion Conversion EMELD 2003 Working Group EMELD 2003 Working Group on Resource Conversion on Resource Conversion

Upload: bpfanpage

Post on 14-Jul-2015

130 views

Category:

Documents


0 download

TRANSCRIPT

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Toward Best Practice for Toward Best Practice for Language Resource Language Resource

ConversionConversion

EMELD 2003 Working Group EMELD 2003 Working Group on Resource Conversionon Resource Conversion

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Working GroupWorking Group

Baden Hughes, Chilin Shih (co-Baden Hughes, Chilin Shih (co-chairs)chairs)

Helen Aristar-Dry, Steven Bird, Helen Aristar-Dry, Steven Bird, Reinhard Hiss, Will Lewis, Barbara Reinhard Hiss, Will Lewis, Barbara Need, Steven WeinbergerNeed, Steven Weinberger

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

ObjectivesObjectives

Consider the methodology for and Consider the methodology for and make recommendations about the make recommendations about the conversion of legacy (possibly non-conversion of legacy (possibly non-digital) language resources into digital) language resources into enduring BP formatsenduring BP formats

Examine ongoing conversion Examine ongoing conversion processes and identify issues in the processes and identify issues in the conversion of digital language conversion of digital language resources in working contextsresources in working contexts

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

MethodologyMethodology

Focus on high level principles which Focus on high level principles which pervade general language resource pervade general language resource conversion problems rather than conversion problems rather than format-specific resource conversion format-specific resource conversion issuesissues

Acceptance that appropriate Acceptance that appropriate technical expertise probably already technical expertise probably already exists “somewhere” but needs to be exists “somewhere” but needs to be adapted to the EMELD contextadapted to the EMELD context

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Subject MatterSubject Matter

Content and StructureContent and Structure• MetadataMetadata• TextText• AudioAudio• VideoVideo• Still ImagesStill Images

Physical MediaPhysical Media Hardware / SoftwareHardware / Software

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Core ValuesCore Values

Bird & Simons (2003) “Seven Bird & Simons (2003) “Seven Dimensions …”: content, format, Dimensions …”: content, format, discovery and preservationdiscovery and preservation

Motivation to ensure persistence and Motivation to ensure persistence and longevity of archive quality digital longevity of archive quality digital objectsobjects

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Principles …1Principles …1

Ignorance is not bliss !Ignorance is not bliss ! Not every user needs to be a technical Not every user needs to be a technical

expert, but should be assisted their expert, but should be assisted their context and functional requirements and context and functional requirements and to access sufficient information to make an to access sufficient information to make an informed choiceinformed choice

Conversion issues will affect institutions Conversion issues will affect institutions and individuals at many levels – and individuals at many levels – particularly in terms of resources available particularly in terms of resources available to address issuesto address issues

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Principles …2Principles …2

Conversion and ArchivingConversion and Archiving• The best available copy should be archived The best available copy should be archived

according to BPaccording to BP• Format neutrality in respect to use involves Format neutrality in respect to use involves

effort but is essential to ensure long term effort but is essential to ensure long term viabilityviability

• Archiving practice will imply resource Archiving practice will imply resource conversion for preservation purposesconversion for preservation purposes

• Consistency in conversion methodology is Consistency in conversion methodology is inherently better than random variationinherently better than random variation

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Principles …3Principles …3 Conversion and Re-UseConversion and Re-Use

• Requirements for re-use vary between agents Requirements for re-use vary between agents and purposesand purposes

• Inherent in most (all?) conversion processes is Inherent in most (all?) conversion processes is some degree of information loss, thus the some degree of information loss, thus the absolute minimum number of format absolute minimum number of format conversions should be undertakenconversions should be undertaken

• Where possible, converted materials should Where possible, converted materials should include information about their digital lineageinclude information about their digital lineage

• Additional information pertaining to the Additional information pertaining to the language resource may be located separately language resource may be located separately from the resource itself and needs to be from the resource itself and needs to be preservedpreserved

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

A Pragmatic Approach to BP .. 1A Pragmatic Approach to BP .. 1 The lineage of digital language resources may The lineage of digital language resources may

have included processes which are less than have included processes which are less than optimal practicesoptimal practices

BP may not realistically be achievable in all BP may not realistically be achievable in all contexts (constraints such as time, money, contexts (constraints such as time, money, equipment, expertise, inclination …)equipment, expertise, inclination …)

Some practices have inherently higher potential Some practices have inherently higher potential to cause conversion and archiving issuesto cause conversion and archiving issues

Significant incentives need to be offered to induce Significant incentives need to be offered to induce change in language data management practices change in language data management practices towards BP – would you prefer to choose BP or be towards BP – would you prefer to choose BP or be forced to adopt BP when you lose data ?forced to adopt BP when you lose data ?

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

A Pragmatic Approach to BP .. 2A Pragmatic Approach to BP .. 2 Software choice will impact on the Software choice will impact on the

longevity of language resource data.longevity of language resource data. Ideological debates about software Ideological debates about software

development methodologies is often development methodologies is often misleading when considering longevity and misleading when considering longevity and preservationpreservation

Absolute ranking of practice on a scale of Absolute ranking of practice on a scale of worst to best is not transparent (context worst to best is not transparent (context sensitive, moving target …)sensitive, moving target …)

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Ongoing Work Items …1Ongoing Work Items …1

Identify and review core documents on BP Identify and review core documents on BP formats, including accessible formats, including accessible recommendations for different audiencesrecommendations for different audiences

Identify and review software tools which Identify and review software tools which enable conversion according to BP enable conversion according to BP principles (this is not necessarily a principles (this is not necessarily a democratic system!)democratic system!)

Develop accessible case studies of typical Develop accessible case studies of typical language resource conversion problems, language resource conversion problems, critique them and provide advice on how critique them and provide advice on how to achieve BP in these contextsto achieve BP in these contexts

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Ongoing Work Items … 2Ongoing Work Items … 2

Examine how physical media choices Examine how physical media choices can affect the retention or loss of can affect the retention or loss of information and implications for the information and implications for the language resource conversion language resource conversion processprocess

Promulgate resource conversion as a Promulgate resource conversion as a pervasive issue to be considered by pervasive issue to be considered by many other BP contextsmany other BP contexts

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Observations Relevant to Other Observations Relevant to Other Working GroupsWorking Groups

Resource ArchivingResource Archiving• Good archiving practice will consider resource Good archiving practice will consider resource

conversion as a fundamental issueconversion as a fundamental issue• Infrastructural constraints may significantly Infrastructural constraints may significantly

increase the risk of information loss increase the risk of information loss Resource CreationResource Creation

• BP at the data collection point reduces the risk BP at the data collection point reduces the risk of information loss in any conversion processof information loss in any conversion process

• Conversion implications need to be considered Conversion implications need to be considered when selecting an appropriate tool for the data when selecting an appropriate tool for the data and functionality types requiredand functionality types required

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Observations Relevant to EMELDObservations Relevant to EMELD

EMELD needs to consider the EMELD needs to consider the longevity and persistency longevity and persistency implications for ongoing archiving implications for ongoing archiving functions particularly in reference to functions particularly in reference to the “long term” – this may include the “long term” – this may include adequate financial resourcingadequate financial resourcing

EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713

Logistical RecommendationsLogistical Recommendations

Creation of Communities of Expertise Creation of Communities of Expertise within EMELD framework to advise on within EMELD framework to advise on working group topics (cf. Ask-A-Linguist) working group topics (cf. Ask-A-Linguist) including experts from outside linguisticsincluding experts from outside linguistics

Creation of Working Groups email lists for Creation of Working Groups email lists for ongoing work in these areasongoing work in these areas

User reviews and solutions section for User reviews and solutions section for tools and processes within the EMELD tools and processes within the EMELD School site School site