framenet development for latvian

Post on 13-Apr-2017

104 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FrameNet development for Latvian

Normunds GrūzītisGuntis Bārzdiņš

University of Latvia, Institute of Mathematics and Computer ScienceNational information agency LETA

2nd International FrameNet Workshop, Juiz de Fora, Brazil, 8-9 October 2016

Latvian• Member of the Baltic language group• Official language of European Union

• Around 2M speakers

• Typically classified as an under-resourced language Situation is rapidly improving in several directions of NLP

o Automatic speech recognitiono Machine translationo Natural language understandingo Natural language generation

Latvian FrameNet: a pilot• Application/domain-specific (LETA)

Facilitates the semi-automatic information extraction process for the media monitoring needso For populating and updating profiles of public persons and

organizations

• Covers 26 Berkeley FrameNet frames: Being_born, Being_employed, Change_of_leadership, Earnings_and_losses, Education_teaching, Hiring, Personal_relationship, Residence, Win_prize, etc.

• Nearly 5000 annotated sentences

FrameNet ontology: LETA frames

FrameNet annotationson top of dependency heads

Accuracy of automatic SRLParser / Year / Dataset

Frame identification FE identification

Precision Recall F1 Precision Recall F1

C6.0 / 2014 / LETA 63.5 62.7 63.1 65.9 76.8 70.9

C6.0 / 2014 / BFN 1.3 77.1 53.7 63.3 47.3 47.0 47.1

SEMAFOR / 2014 / BFN 1.3 69.7 54.9 61.4 58.1 38.8 46.5

LTH / 2007 / BFN 1.3 68.9 53.6 60.3 51.6 35.4 42.0

http://c60.ailab.lv

Exhaustive search binary classifier

Used to parse the entire LETA news archive (12M articles)

LETA IE and KB population system

Scalable Understandingof Multilingual MediA

Discover trends, emerging events, crucial new stories

H2020 grant No. 688139

Event-based summarization

Storyline highlights across a set of related articles

Multilingual / Cross-lingual apps

Full stack of language resources for NLU and NLG [in Latvian]

Full stack of language resources for NLU and NLG [in Latvian]

Full stack of language resources for NLU and NLG [in Latvian]

GF for implementing multilingual frames and constructions• FrameNet – semantic abstraction

BFN frames reused across languages Representation of valence patterns varies a lot FNs as such are semi-formal/computational

• GF – syntactic abstraction Grammar formalism and resource grammar library Towards a computational implementation of FNs

o In some aspects; for multilingual NLG Unified method to compare valence patterns across FNs

Latvian FrameNet++• Integrated: a part of a multi-layered corpus• Balanced

We anticipate that the corpus will represent at least 2000 common verbs with at least 10 examples for each of the 1000 most common verbs

• Manually verified at all layers Instead of adding e.g. the syntactic layer afterwards by

an erroneous probabilistic parser

• Computationally oriented• Accessible (open data)

top related