sharing and browsing linguistic data emeld arizona: terry langendoen scott farrar
TRANSCRIPT
Sharing and Browsing Sharing and Browsing Linguistic DataLinguistic Data
EMELD Arizona:EMELD Arizona:
Terry LangendoenTerry Langendoen
Scott FarrarScott Farrar
Since Santa BarbaraSince Santa Barbara
Focus on morpho-syntaxFocus on morpho-syntax Decided to build ontology (to be Decided to build ontology (to be
discussed later in this talk)discussed later in this talk) Decided to build supporting toolsDecided to build supporting tools
– smart search engine (Hedwig)smart search engine (Hedwig)– editoreditor
Some work on xml markupSome work on xml markup
The ProblemThe Problem
Currently there is no general way for Currently there is no general way for researchers in the endangered researchers in the endangered languages community to languages community to electronically share information.electronically share information.
The Web is the most likely tool that The Web is the most likely tool that could provide a solution.could provide a solution.
The current WWW is not adequate.The current WWW is not adequate. An Example from the WWW:An Example from the WWW:
Further ComplicationsFurther Complications
What about other data formats?What about other data formats?– lexiconslexicons– grammatical descriptionsgrammatical descriptions– (comparative) word lists(comparative) word lists– paradigmsparadigms– etc.etc.
Warumungu DescriptionWarumungu Description
'Grammatical case suffixes' are those which 'Grammatical case suffixes' are those which express grammatical relations (subject, express grammatical relations (subject, object, indirect object), like /karriny-ji/ in object, indirect object), like /karriny-ji/ in (4). A noun without a case suffix is (4). A noun without a case suffix is interpreted as having Absolutive case - interpreted as having Absolutive case - /nanttu/ in (4) and /wangarri/ in (5) - or as /nanttu/ in (4) and /wangarri/ in (5) - or as being the main predicator, or as agreeing being the main predicator, or as agreeing with some argument with Absolutive case - with some argument with Absolutive case - /kumppu/ and /pulyurrulyurru/ in (5)./kumppu/ and /pulyurrulyurru/ in (5).
(from J. Simpson 1998)(from J. Simpson 1998)
(4)Karriny-ji +ajjul nyirri-njina nanttu, ngapa-kajji.people-ERG +3pl.S put-PAST.CONT humpy, water-LEST'The people were erecting humpies for fear of the rain.' [JS:PND:RS]
(5)Nyirri-nyi +ama wangarri kumppu pulyurrulyurru.place-PAST.PUN +he rock ABS big.ABS red.ABS'He placed a big red hill.' [JS:PND:RS]
Chichewa DescriptionChichewa Description
Other elements that appear as verbal Other elements that appear as verbal prefixes include modals – for prefixes include modals – for instance, -ngo- 'just, merely' – as well instance, -ngo- 'just, merely' – as well as directional elements -ka- 'go' and -as directional elements -ka- 'go' and -dza- 'come'. These are placed in the dza- 'come'. These are placed in the immediate pre-OM position, after the immediate pre-OM position, after the tense. This is shown by the following:tense. This is shown by the following:
(from Mchombo 1998)(from Mchombo 1998)
(8a)Mkângo s-ú-ná-ngo-wá-phwány-a maûngu . . . 3-lion NEG-3SM-past-just-6OM-smash-fv 6-pumpkins . . .'The lion did not just smash them, the pumpkins . . .'
(8b)Mkângo u-ku-ká-phwány-á máûngu.3SM-pres.-go-smash-fv 6-pumpkins'The lion is going to smash some pumpkins.'
A SolutionA Solution
Take advantage of new Web Take advantage of new Web technologytechnology
Build a community of practice on the Build a community of practice on the Semantic WebSemantic Web
What is the Semantic Web?What is the Semantic Web?
The Semantic WebThe Semantic Web
New markup: <xml>, <rdf>, <owl>New markup: <xml>, <rdf>, <owl>
New tools: smart search engines New tools: smart search engines ontologies, new editorsontologies, new editors
Meaning is encoded explicitly.Meaning is encoded explicitly.
Pages are interpreted by a reasoner.Pages are interpreted by a reasoner.
An Example from the Semantic An Example from the Semantic WebWeb
New markup adds functionality to New markup adds functionality to existing <html> documents.existing <html> documents.
Example:Example:
<rdf:Description rdf:about="#A110604"> <rdf:type rdf:resource="#State" /> <NS0:name>Tennessee</NS0:name> </rdf:Description>
<rdf:Description rdf:about="#876555"> <rdf:type rdf:resource="#Language" /> <EMELD:name>Navajo</EMELD:name> </rdf:Description>
Aardvark
nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata WordNet for 'aardvark'
Nouns:
1. nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer
Verbs:
Adjectives:
Adverbs:
<html><head><rdf:RDF…<Word rdf:about="aardvark"> <hasSense rdf:resource="9385"/></Word><SynSet rdf:about="9385"> <type rdf:resource="noun"/> <rdfs:comment>nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata </rdfs:comment> <hasElement rdf:resource="aardvark"/> <hasElement rdf:resource="ant_bear"/> <hasElement rdf:resource="anteater"/> <hasElement rdf:resource="Orycteropus_afer"/></SynSet></rdf:RDF></head><body>WordNet for 'aardvark'<br><br>Nouns:<br><br> 1. nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata<br> Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer<br><br>Verbs:<br><br>Adjectives:<br><br>Adverbs:<br><br></body></html>
The OntologyThe Ontology
Crucial component of the Semantic Crucial component of the Semantic WebWeb
A resource that explicitly defines A resource that explicitly defines what entities can exist in a domain, what entities can exist in a domain, i.e., the endangered languages i.e., the endangered languages communitycommunity
A resource that defines what A resource that defines what relations hold between entitiesrelations hold between entities
demodemo
OWL Web Ontology LanguageOWL Web Ontology Language
Analogous role of <html> on the Analogous role of <html> on the WWWWWW
The most current “standard” The most current “standard” Semantic Web languageSemantic Web language
Under development at the W3C:Under development at the W3C:
www.w3c.orgwww.w3c.org
Facilitating ToolsFacilitating Tools
Search tools for the Semantic WebSearch tools for the Semantic Web Editors for composing Semantic Web Editors for composing Semantic Web
pagespages Reasoning enginesReasoning engines An extensible data modelAn extensible data model
A Search EngineA Search Engine
EMELD Arizona’s prototype (Hedwig)EMELD Arizona’s prototype (Hedwig)
http://emeld.douglass.arizona.edu:http://emeld.douglass.arizona.edu:
8080/searchindex.html (temporarily 8080/searchindex.html (temporarily out of service)out of service)
demo on Sundaydemo on Sunday
An EditorAn Editor
EMELD Arizona’s prototype (name?)EMELD Arizona’s prototype (name?)
demo on Sundaydemo on Sunday
A Good Data Model for Creating a A Good Data Model for Creating a Community of PracticeCommunity of Practice
Language data should be searchable Language data should be searchable and comparable—broad access and comparable—broad access (centralized).(centralized).
Authors or communities want control Authors or communities want control over their data (local/distributed).over their data (local/distributed).
Local control should be balanced with Local control should be balanced with data interoperability (Semantic Web).data interoperability (Semantic Web).
Centralized ModelCentralized Model
Warumungu
Wari
Mocovi
Biao Min
ArchiHopi
Community
Local Control with Broad AccessLocal Control with Broad Access
Semantic Web
ontology
Wari<xml>
Hopi<xml>
Archi<xml>
Community
toolstools
tools
Community RequirementsCommunity Requirements
No need to standardize your No need to standardize your terminology or abandon tradition.terminology or abandon tradition.
No need to learn <xml> (it doesn’t No need to learn <xml> (it doesn’t hurt!)hurt!)
Use EMELD tools to put your data on Use EMELD tools to put your data on the Semantic Webthe Semantic Web
Maintain your dataMaintain your data
Contact InfoContact Info
Terry LangendoenTerry Langendoen Scott FarrarScott Farrar
[email protected]@[email protected]@u.arizona.edu
See our website:See our website:
http://emeld.douglass.arizona.edu:8080http://emeld.douglass.arizona.edu:8080