polyglot: multilingual semantic role labeling with unified labels
TRANSCRIPT
English Proposition Bank Frames
Generating Training Data for Multilingual SRL
Semantic Parsing of 9 Languages
English Chinese Spanish
buy.01Roles:
A0: buyer (agent) A1: thing bought (theme) A2: seller (source) A3: price paid (asset) A4: benefactive (beneficiary)
German Japanese Russian
Training data generation pipeline• Optional: Manual aliasing of TL verbs to English frames• Filtered annotation projection (Akbik et al., 2015)
like.01Roles:
A0: liker (experiencer) A1: object of affection (theme)
give.01Roles:
A0: giver (agent) A1: thing given(theme) A2: entity given to(recipient)
sell.01Roles:
A0: Seller (agent) A1: Thing Sold (theme) A2: Buyer (recipient) A3: Price PaidA4: Benefactive
Challenges and open questions• Source-language SRL errors• Coverage: Do appropriate English
frames exist for all TL verbs?• pouvoir (to be able to), sollen (to be
supposed to)
• Crowdsourced data curation(Akbik et al., 2016)• Design of crowdsourcing task
Alan Akbik and Yunyao Li
IBM Research - Almaden
POLYGLOT
Multilingual Semantic Role Labeling with Unified Labels
Idea: Use English Proposition Bank Frames and Roles as universal semantic labels
annehmen.01(accept)
Roles:A0: acceptor (agent) A1: thing accepted (theme) A2: accepted-from (source) A3: attribute (attribute)
annehmen.02(assume)
Roles:A0: thinker (agent) A1: thought(theme) A2: attributive (source)
Predicate: annehmen
Example Target Language FrameAnnotation Projection
English, German, Chinese, French, Japanese, Spanish, Russian, Hindi and Arabic
semantic labels(predicates + roles)
EN
unlabeled corpus
TL
Parallel corpus
semantic labels(projected)
Annotation projection
TL
Annotation projection Future work: Crowdsourced and expert data curation
Crowd agrees?
Input
Crowdsourced data curation
semantic labels(crowd cannot curate)
semantic labels(curated, final)
TL
TL
Expert data curation
yes
no
Multilingual aliases
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. ACL 2015.
Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik and Yunyao Li. EMNLP 2016.
Evaluation