poio api and graf-xml @ balisage 2013

Download Poio API and GraF-XML @ Balisage 2013

If you can't read please download the document

Upload: peter-bouda

Post on 16-Apr-2017

411 views

Category:

Technology


2 download

TRANSCRIPT

Poio API and GrAF-XML

A radical stand-off approach inlanguage documentation and language typology

Jonathan Blumtritt, Cologne Center for eHumanities, University of ColognePeter Bouda, Centro Interdisciplinar de Documentao Lingustica e SocialFelix Rau, Department of Linguistics, University of Cologne

Overview

Existing infrastructure and workflows

CLARIN

Annotation graphs

GrAF and Poio API

Example: Elan EAF to GrAF-XML

CLASS

Fieldwork

Fotos

Existing Infrastructure

LD tools and standards

Elan: EAF, MPEG, WAV

Toolbox: TXT, XML, WAV

Arbil: IMDI/CIMDI (Component MetaData Infrastructure)

Praat: XML, WAV

...

No standards for tier hierarchies, tier names or annotation schemes

Efforts in ISOcat

European initiative within the European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure (CLARIN)

aims at providing easy and sustainable access for scholars in the humanities and social sciences to digital language data

Started in 2006, part of a roadmap process, timeline currently ending 2020

CLARIN-D: working groups in Germany

Curation projects for different research areas in linguistics

Annotation Graphs

the underlying data model for linguistic annotations

pivot structure for linguistic data

time vs. byte offsets

not hierarchical (but trees are also graphs)

stand-off annotation

"It is important to recognize that translation into AGs does not magically create compatibility among systems whose semantics are different." [Bird & Liberman 2001]

AGs visualized

GrAF

GrAF: Graph Annotation Framework

ISO 24612: Language resource management - Linguistic annotation framework (LAF)

Started as stand-off version of XCES

API and representation as data structures, not a file format

GrAF/XML as XML representation

Used for the MASC of the ANC

Nodes, edges, regions, annotations, feature structures

TEI and GrAF

Schemata for GrAF created with TEI Roma

Custumized version of TEI P5 schema

ODD: One Document Does it all

GrAF is not TEI compliant

Share data types and feature structures of annotations

TEI has stand-off variant, uses XPointer/XLinkPrimary data has to be XML

Why we use GrAF

Because it's new! :-)

No inline markup

Radical stand-off approachEasier to share and manage data

Preferred solution to archive cultural heritage

Ideal for sparse annotations

Existing code: Java and Python

The beauty of annotation graphs

Poio API

Think of GrAF as an assembly language for linguistic annotation; then Poio API is a libray to map from and to higher-level languages

Subset of GrAF to represent tier based annotation

Filters and filter chains for search

Plugin mechanism for file formatsMapping semantics: tiers and annotations to nodes and edges

Meta-data for additional information (tier types etc.)

Example: Mapping of EAF to GrAF-XML

Elan EAF

so [...]

GrAF entities

GrAF structure

GrAF-XML

so

Tier hierarchies

[ ['utterance..K-Spch'],

['utterance..W-Spch', ['words..W-Words', ['part_of_speech..W-POS'] ], ['phonetic_transcription..W-IPA'] ],

['gestures..W-RGU', ['gesture_phases..W-RGph', ['gesture_meaning..W-RGMe'] ] ],

['gestures..K-RGU', ['gesture_phases..K-RGph', ['gesture_meaning..K-RGMe'] ] ]]

The code

ag = poioapi.annotationgraph.AnnotationGraph()parser = poioapi.io.ElanParser("example.eaf")writer = poioapi.io.graf.Writer()converter = poioapi.io.graf.GrAFConverter(parser, writer)

converter.parse()converter.write("example.hdr")

Analysis workflows

Graph-based methods

Pipe to scientific Python libraries

GrAF connectors for major linguistic workflow tools (GATE and Apache UIMA)

Example: Polysemy in dictionaries

Example: Counting word orders

CLASS

Thank you for your attention!

[email protected]

Links

Clarin curation project: http://de.clarin.eu/en/discipline-specific-working-groups/wg-3-linguistic-fieldwork-anthropology-language-typology/curation-project-1.html

Poio API:http://media.cidles.eu/poio/poio-api/

GrAF:http://www.xces.org/ns/GrAF/1.0/

CLASS:http://class.uni-koeln.de