polyglot: multilingual semantic role labeling with unified labels

1
English Proposition Bank Frames Generating Training Data for Multilingual SRL Semantic Parsing of 9 Languages English Chinese Spanish buy.01 Roles: A0: buyer (agent) A1: thing bought (theme) A2: seller (source) A3: price paid (asset) A4: benefactive (beneficiary) German Japanese Russian Training data generation pipeline Optional: Manual aliasing of TL verbs to English frames Filtered annotation projection (Akbik et al., 2015) like.01 Roles: A0: liker (experiencer) A1: object of affection (theme) give.01 Roles: A0: giver (agent) A1: thing given(theme) A2: entity given to(recipient) sell.01 Roles: A0: Seller (agent) A1: Thing Sold (theme) A2: Buyer (recipient) A3: Price Paid A4: Benefactive Challenges and open questions Source-language SRL errors Coverage: Do appropriate English frames exist for all TL verbs? pouvoir (to be able to), sollen (to be supposed to) Crowdsourced data curation (Akbik et al., 2016) Design of crowdsourcing task Alan Akbik and Yunyao Li IBM Research - Almaden POLYGLOT Multilingual Semantic Role Labeling with Unified Labels Idea: Use English Proposition Bank Frames and Roles as universal semantic labels annehmen.01 (accept) Roles: A0: acceptor (agent) A1: thing accepted (theme) A2: accepted-from (source) A3: attribute (attribute) annehmen.02 (assume) Roles: A0: thinker (agent) A1: thought(theme) A2: attributive (source) Predicate: annehmen Example Target Language Frame Annotation Projection English, German, Chinese, French, Japanese, Spanish, Russian, Hindi and Arabic semantic labels (predicates + roles) EN unlabeled corpus TL Parallel corpus semantic labels (projected) Annotation projection TL Annotation projection Future work: Crowdsourced and expert data curation Crowd agrees? Input Crowdsourced data curation semantic labels (crowd cannot curate) semantic labels (curated, final) TL TL Expert data curation yes no Multilingual aliases Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. ACL 2015. Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik and Yunyao Li. EMNLP 2016. Evaluation

Upload: yunyao-li

Post on 08-Apr-2017

113 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Polyglot: Multilingual Semantic Role Labeling with Unified Labels

English Proposition Bank Frames

Generating Training Data for Multilingual SRL

Semantic Parsing of 9 Languages

English Chinese Spanish

buy.01Roles:

A0: buyer (agent) A1: thing bought (theme) A2: seller (source) A3: price paid (asset) A4: benefactive (beneficiary)

German Japanese Russian

Training data generation pipeline• Optional: Manual aliasing of TL verbs to English frames• Filtered annotation projection (Akbik et al., 2015)

like.01Roles:

A0: liker (experiencer) A1: object of affection (theme)

give.01Roles:

A0: giver (agent) A1: thing given(theme) A2: entity given to(recipient)

sell.01Roles:

A0: Seller (agent) A1: Thing Sold (theme) A2: Buyer (recipient) A3: Price PaidA4: Benefactive

Challenges and open questions• Source-language SRL errors• Coverage: Do appropriate English

frames exist for all TL verbs?• pouvoir (to be able to), sollen (to be

supposed to)

• Crowdsourced data curation(Akbik et al., 2016)• Design of crowdsourcing task

Alan Akbik and Yunyao Li

IBM Research - Almaden

POLYGLOT

Multilingual Semantic Role Labeling with Unified Labels

Idea: Use English Proposition Bank Frames and Roles as universal semantic labels

annehmen.01(accept)

Roles:A0: acceptor (agent) A1: thing accepted (theme) A2: accepted-from (source) A3: attribute (attribute)

annehmen.02(assume)

Roles:A0: thinker (agent) A1: thought(theme) A2: attributive (source)

Predicate: annehmen

Example Target Language FrameAnnotation Projection

English, German, Chinese, French, Japanese, Spanish, Russian, Hindi and Arabic

semantic labels(predicates + roles)

EN

unlabeled corpus

TL

Parallel corpus

semantic labels(projected)

Annotation projection

TL

Annotation projection Future work: Crowdsourced and expert data curation

Crowd agrees?

Input

Crowdsourced data curation

semantic labels(crowd cannot curate)

semantic labels(curated, final)

TL

TL

Expert data curation

yes

no

Multilingual aliases

Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. ACL 2015.

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik and Yunyao Li. EMNLP 2016.

Evaluation