the hindi-urdu treebank
DESCRIPTION
The Hindi-Urdu Treebank. Lecture 7 : 7/29/2011. Multi-representational, Multi-layered treebank. Traditional approach: Syntactic treebank : PS or DS, but not both Layers are added one-by-one Our approach: Syntactic treebank : both DS and PS DS, PS, and PB are developed at the same time - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/1.jpg)
1
The Hindi-Urdu Treebank
Lecture 7: 7/29/2011
![Page 2: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/2.jpg)
2
Multi-representational, Multi-layered treebank
• Traditional approach:– Syntactic treebank: PS or DS, but not both– Layers are added one-by-one
• Our approach:– Syntactic treebank: both DS and PS– DS, PS, and PB are developed at the same time– Automatic conversion from DS+PB to PS
• Why?– DS and PS are both useful– Annotating them together allows us to maintain “consistency” and
reduce annotation time
![Page 3: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/3.jpg)
3
The team
• DS team: IIIT• PB team: Univ of Colorado at Boulder• PS team: UMass, Columbia Univ• Conversion: Univ. of Washington
• Biweekly conference calls• Group meetings every six months
![Page 4: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/4.jpg)
4
Outline
• Overview of the treebank
• Three Representations– Dependency– Proposition Bank– Phrase Structure
• Conversion
![Page 5: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/5.jpg)
5
Dependency structure (DS) and Phrase Structure (PS)
• DS: all nodes are labeled with words or empty strings
• PS: leaf nodes are labeled with words or empty strings, internal nodes are labeled with non-terminal symbols (special alphabet)
![Page 6: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/6.jpg)
Information in PS and DS PS (e.g., PTB)
DS(some target DS)
POS tag yes yes
Function tag (e.g., -SBJ)
yes yes
Syntactic tag yes no
Empty categoryand co-indexation
Often yes Often no
Allowing crossing Often no Often yes
6
![Page 7: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/7.jpg)
7
Motivation 1: Two Representations
• Both phrase-structure treebanks and dependency treebanks are used in NLP– Collins/Charniak/Bikel parser for PS– CoNLL task on dependency parsing
• Problem: currently few treebanks (no?) with PS and DS which are independently motivated
Our project: build treebank for Hindi/Urdu for which PS and DS are linguistically motivated from the outset– Dependency: Paninian grammar (Panini 400 BC)– Phrase structure: variant of Minimalism (Chomsky 1995)
![Page 8: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/8.jpg)
8
Motivation 2: Two Content Levels
• Everyone (?) wants syntax
• Recent popularity of PropBank (Palmer et al 2002): lexical predicate-argument structure; “semantics as surfacy as it gets”
• Recent experience: PropBank may inform some treebanking decisions
Our project: build treebank with all levels from the outset
![Page 9: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/9.jpg)
9
Goals
• Hindi/Urdu Treebank:– DS, PB, and PS for
• 400K-word Hindi• 150K-word Urdu
– Unified annotation guidelines– Frame files for PropBank
• Better understanding of DS=>PS conversion
![Page 10: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/10.jpg)
10
Outline
• Overview of the project
• Three Representations– Dependency– Proposition Bank– Phrase Structure
• Conversion
![Page 11: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/11.jpg)
11
Hindi Paninian Framework(Dipti Sharma, Hyderabad)
There are 6 main karakas (karaka relations):• karata (k1): Activity of the verb resides in karta.• karma (k2): Result of the verb resides in karma.• karana(k3): Instrument helping in achieving the activity of the verb is karana• sampradaan (k4): Receiver of the action is sampradaan• apaadan (k5): Point of separation from which an entity has moved away in an action is apaadan• adhikaran (k7): Place (k7p) or time (k7t) where the action is located
![Page 12: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/12.jpg)
12
Full Set of Relations
![Page 13: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/13.jpg)
13
Sample Paninian Analysis
![Page 14: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/14.jpg)
14
Basic Clause Structure
अति�फ़
ने ति��ाब �ो पढ़ा
Atif ne kitaab ko paRhaa
Atif Erg book Acc read.Pfv
Atif read the book
![Page 15: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/15.jpg)
15
Basic Clause Structure: DS
पढ़ा
अति�फ़-ने ति�ाब-ो
k1 k2
(read)
(Atif) (book)
![Page 16: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/16.jpg)
16
Outline
• Overview of the project
• Three Representations– Dependency– Proposition Bank– Phrase Structure
• Conversion
![Page 17: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/17.jpg)
17
PropBank:Lexical Semantic Annotation
• Dependency annotation on top of DS– PropBank is a dependency representation, but the arc labels are
different from DS
• Captures diathesis alternations:– John loaded the cart with hay.– John loaded hay on the cart.
hay has same relation to predicate load in all these sentences
• PropBank annotates verb-meaning specific verbal roles
![Page 18: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/18.jpg)
18
Basic Clause Structure: PropBank
ति�ाब-ो
पढ़ा Roleset: पढ़ना.01
अति�फ़-ने
Arg0 Arg1
पढ़ना.01 Arg0 readerArg1 what is read
(Atif) (book)
(read)
![Page 19: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/19.jpg)
19
Phrase Structure
• Inspired by Chomskyan Principles-and-Parameters approach
• (Mostly) binary branching
• Small number of non-terminals
• Key structural assumptions:– Only two marked argument positions for verbs, all other
NPs are adjuncts and can appear anywhere– Use of traces for displacement from normal position– Case assigned under c-command
![Page 20: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/20.jpg)
20
Basic Clause Structure: Phrase Structure
(Atif)
(book)
(read)
![Page 21: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/21.jpg)
21
Unaccusatives
दरवाज़ा खुल गयाdarwaaza khul gayaa
door open go.Pfv.MSgThe door opened.
![Page 22: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/22.jpg)
22
Unaccusative: Dependency Structure
खुल गया
दरवाज़ा
K1
(door)
(open go)
![Page 23: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/23.jpg)
23
Unaccusative: PropBank
खुल गया
दरवाज़ा
arg1
(door)
(open go)
![Page 24: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/24.jpg)
24
Unaccusative: Phrase Structure
(door)
(open) (go)
![Page 25: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/25.jpg)
25
Support Verb Constructions
गहनें चोरी हो गयेgeheneN chorii ho gaye
jewels (m) theft do go.Pfv.MPlThe jewels got stolen
![Page 26: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/26.jpg)
26
Support Verb Constructions: Dependency Structure
हो गये
गहनें चोरी
k2 pof
(do go)
(jewels) (theft)
![Page 27: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/27.jpg)
27
Support Verb Constructions: PropBank
हो.sv (do)Arg0 agent of true predicateArg1 true predicateArg2 patient of true predicate
(jewels)(theft)
![Page 28: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/28.jpg)
28
Support Verb Constructions: Phrase Structure
(jewels)
(theft) (do go)
![Page 29: The Hindi-Urdu Treebank](https://reader033.vdocuments.mx/reader033/viewer/2022061604/5681685e550346895ddea21b/html5/thumbnails/29.jpg)
29
Where we are now
• Guidelines:– DS and PS guidelines are complete and checked– PropBank guidelines under development
• Annotation:– Finished 353K-word Hindi and 60k-word Urdu
• Automatic conversion from DS + PropBank in progress.
• Close co-operation in development of the three components essential