combining gate and uimauima and gate •in gate, unit of processing is the document!text, plus...
TRANSCRIPT
![Page 1: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/1.jpg)
Combining GATE and UIMA
Ian Roberts
![Page 2: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/2.jpg)
University of Sheffield NLP
Overview
• Introduction to UIMA
• Comparison with GATE
• Mapping annotations between GATE and
UIMA
• Examples and demo
![Page 3: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/3.jpg)
University of Sheffield NLP
What is UIMA?
• Language processing framework developed by IBM
• Similar document processing pipeline architecture to GATE
• Concentrates on performance and scalability
• Supports components written in different programming
languages (currently Java and C++)
• Native support for distributed processing via web services
![Page 4: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/4.jpg)
University of Sheffield NLP
UIMA Terminology
• Processing tasks in UIMA are encapsulated inAnalysis Engines (AEs)
• Text-specific processing by Text Analysis Engines(TAEs)
• In UIMA, AEs can be primitive (~ a single PR inGATE terms), or aggregate (~ a GATE controller).! Aggregate AE can include other primitive or aggregate AEs
• GATE includes interoperability layer to run! GATE controller as a primitive TAE in UIMA
! UIMA TAE (primitive or aggregate) as a GATE PR
![Page 5: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/5.jpg)
University of Sheffield NLP
UIMA and GATE
• In GATE, unit of processing is the Document
! Text, plus features, plus annotations
! Annotations can have arbitrary features, with anyJava object as value
• In UIMA, unit of processing is CAS (commonanalysis structure)
! Text, plus Feature Structures
! Annotations are just a special kind of FS, whichincludes start and end offset features
![Page 6: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/6.jpg)
University of Sheffield NLP
Key Differences
• In GATE, annotations can have any features, withany values
• In UIMA, feature structures are strongly typed! Must declare what types of annotations are supported by
each analysis engine
! Must specify what features each annotation type supports
! Must specify what type feature values may take• Primitive types - string, integer, float
• Reference types - reference to another FS in the CAS
• Arrays of the above
! All defined in XML descriptor for the AE
![Page 7: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/7.jpg)
University of Sheffield NLP
Integrating GATE and UIMA
• So the problem is to map between the loosely-
typed GATE world and the strongly-typed
UIMA world
• Best explained by example…
![Page 8: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/8.jpg)
University of Sheffield NLP
Example 1
• Simple UIMA annotator that annotates each
instance of the word “Goldfish” in a document.
• Does not need any input annotations
• Produces output annotations of typegate.example.Goldfish
![Page 9: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/9.jpg)
![Page 10: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/10.jpg)
![Page 11: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/11.jpg)
![Page 12: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/12.jpg)
University of Sheffield NLP
Example 2
• We may want to copy annotations, as well astext, from the original GATE document.
• Consider a UIMA annotator that! takes gate.example.Sentence annotations as
input
! annotates “Goldfish” as before
! also adds a feature GoldfishCount to eachSentence giving the number of goldfishannotations in that sentence
![Page 13: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/13.jpg)
![Page 14: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/14.jpg)
![Page 15: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/15.jpg)
![Page 16: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/16.jpg)
![Page 17: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/17.jpg)
![Page 18: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/18.jpg)
![Page 19: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/19.jpg)
![Page 20: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/20.jpg)
![Page 21: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/21.jpg)
![Page 22: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/22.jpg)
![Page 23: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/23.jpg)
![Page 24: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/24.jpg)
![Page 25: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/25.jpg)
![Page 26: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object](https://reader033.vdocuments.mx/reader033/viewer/2022060916/60a9320ae8b1de17bc2fc7a7/html5/thumbnails/26.jpg)