furthering natural language processing in bulgaria · 2011-06-28  · furthering nlp in bulgaria...

40
Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119. Co-funded by the ICT PSP Programme of the European Commission through the contract CESAR, grant agreement no.: 271022. Furthering Natural Language Processing in Bulgaria Svetla Koeva Institute for Bulgarian, Bulgaria [email protected] META-FORUM Budapest, Hungary, 2011-06-27-28 Tuesday, June 28, 2011

Upload: others

Post on 09-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119.

Co-funded by the ICT PSP Programme of the European Commission through the contract CESAR, grant agreement no.: 271022.

Furthering Natural Language Processing in Bulgaria

Svetla KoevaInstitute for Bulgarian, Bulgaria

[email protected]

META-FORUMBudapest, Hungary, 2011-06-27-28

Tuesday, June 28, 2011

Page 2: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

Republic of BulgariaArea - 110, 993. 6 km2Population - 7 351 633Bulgarian -

9 million native speakers

2

Tuesday, June 28, 2011

Page 3: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

Republic of BulgariaArea - 110, 993. 6 km2Population - 7 351 633Bulgarian -

9 million native speakers

2

Tuesday, June 28, 2011

Page 4: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

Republic of BulgariaArea - 110, 993. 6 km2Population - 7 351 633Bulgarian -

9 million native speakers

2

Tuesday, June 28, 2011

Page 5: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

Republic of BulgariaArea - 110, 993. 6 km2Population - 7 351 633Bulgarian -

9 million native speakers

2

Tuesday, June 28, 2011

Page 6: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

Republic of BulgariaArea - 110, 993. 6 km2Population - 7 351 633Bulgarian -

9 million native speakers

2

Tuesday, June 28, 2011

Page 7: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

Republic of BulgariaArea - 110, 993. 6 km2Population - 7 351 633Bulgarian -

9 million native speakers

2

Tuesday, June 28, 2011

Page 8: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

Republic of BulgariaArea - 110, 993. 6 km2Population - 7 351 633Bulgarian -

9 million native speakers

2

Tuesday, June 28, 2011

Page 9: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

General facts

The official alphabet is Cyrillic.

Официалната азбука е кирилица.

Cyrillic became the third official alphabet of the European Union, following the Latin and Greek alphabets.3

Tuesday, June 28, 2011

Page 10: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 11: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 12: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 13: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 14: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 15: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 16: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 17: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Research

4

Tuesday, June 28, 2011

Page 18: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK Over the past decade a number of important

language resources and tools have been developed.

5

Tuesday, June 28, 2011

Page 19: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK Over the past decade a number of important

language resources and tools have been developed.

5

Tuesday, June 28, 2011

Page 20: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK Over the past decade a number of important

language resources and tools have been developed.

5

Tuesday, June 28, 2011

Page 21: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK Over the past decade a number of important

language resources and tools have been developed.

5

Tuesday, June 28, 2011

Page 22: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK Over the past decade a number of important

language resources and tools have been developed.

5

Tuesday, June 28, 2011

Page 23: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK

6

Bulgarian National Corpus - app. 500M words

Bulgarian POS-annotated Corpus

Bulgarian Sense-annotated Corpus

Dependency part of BulTreeBank

Tuesday, June 28, 2011

Page 24: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK

SEE-ERA.net Administrative and Literally Corpus

Bilingual collection of cultural texts in Greek and Bulgarian

Bulgarian-Polish-Lithuanian Corpus Bulgarian-English-X language parallel

corpus - app. 100M words for Bulgarian ...

Tuesday, June 28, 2011

Page 25: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

BLARK

8

Several large inflectional dictionaries

Bulgarian WordNet

Bulgarian FrameNet

Tuesday, June 28, 2011

Page 26: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Companies

Development of tools and solutions based on semantic technologies

Ontology design Data integration, management and

publishing9

Tuesday, June 28, 2011

Page 27: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Companies

10

Tuesday, June 28, 2011

Page 28: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Companies

Web applications (dynamic web content) Content Management Systems (CMS) Tools for web site content management Multilingual tools and services for natural

language processing

11

Tuesday, June 28, 2011

Page 29: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Companies

12

Tuesday, June 28, 2011

Page 30: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Companies

WebTrance - a translation software package from English, French, German, Spanish, Italian and Turkish to Bulgarian and vice versa.

SkyCode is one of the partners of iTranslate4.

Tuesday, June 28, 2011

Page 31: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

http://www.meta-net.eu

Furthering NLP in Bulgaria

Approximate status

14

Technology MedianTokenization, Morphology 4

Parsing 3

Information Retrieval 2

Speech Synthesis 2

Text semantics 2

Information extraction 2

Summarization, QA 2

Machine translation 2

Language generation 1

Resources Median

Reference Corpora 4

Thesauri, WordNets 4

Lexicons, Terminologies 3

Semantic corpora 3

Parallel Corpora, TM 2

Syntax-Corpora 2

Discourse-Corpora 1

Multimedia/multimodal data 1

Tuesday, June 28, 2011

Page 32: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

http://www.meta-net.eu

Furthering NLP in Bulgaria

Approximate status

14

Technology MedianTokenization, Morphology 4

Parsing 3

Information Retrieval 2

Speech Synthesis 2

Text semantics 2

Information extraction 2

Summarization, QA 2

Machine translation 2

Language generation 1

Resources Median

Reference Corpora 4

Thesauri, WordNets 4

Lexicons, Terminologies 3

Semantic corpora 3

Parallel Corpora, TM 2

Syntax-Corpora 2

Discourse-Corpora 1

Multimedia/multimodal data 1

Tuesday, June 28, 2011

Page 33: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

http://www.meta-net.eu

Furthering NLP in Bulgaria

Approximate status

14

Technology MedianTokenization, Morphology 4

Parsing 3

Information Retrieval 2

Speech Synthesis 2

Text semantics 2

Information extraction 2

Summarization, QA 2

Machine translation 2

Language generation 1

Resources Median

Reference Corpora 4

Thesauri, WordNets 4

Lexicons, Terminologies 3

Semantic corpora 3

Parallel Corpora, TM 2

Syntax-Corpora 2

Discourse-Corpora 1

Multimedia/multimodal data 1

Tuesday, June 28, 2011

Page 34: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

http://www.meta-net.eu

Furthering NLP in Bulgaria

Approximate status

14

Technology MedianTokenization, Morphology 4

Parsing 3

Information Retrieval 2

Speech Synthesis 2

Text semantics 2

Information extraction 2

Summarization, QA 2

Machine translation 2

Language generation 1

Resources Median

Reference Corpora 4

Thesauri, WordNets 4

Lexicons, Terminologies 3

Semantic corpora 3

Parallel Corpora, TM 2

Syntax-Corpora 2

Discourse-Corpora 1

Multimedia/multimodal data 1

Tuesday, June 28, 2011

Page 35: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

http://www.meta-net.eu

Furthering NLP in Bulgaria

Approximate status

15

QuantityAvailability Quality Coverage Maturity Sustaina

bilityAdaptab

ility

Technology 2 2 2.5 2.5 2 2 2.5

Resources 2 2.5 3 3.5 2.5 2.5 2.5

Total 2 2 2.5 3 2 2 2.5

Tuesday, June 28, 2011

Page 36: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in BulgariaState contribution to R&D

Japan Korea USA Singapore China EC 27 Bulgaria 2000 2008 per year in %

16

Tuesday, June 28, 2011

Page 37: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in BulgariaState contribution to R&D

Strategic research agenda: Cultural-historical heritage - language being

a central part of it ICT as a horizontal instrument

17

Tuesday, June 28, 2011

Page 38: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

European dimensions META, Multilingual Europe Technology Alliance

Institute for Bulgarian, BAS, a member of Institute for Literature, BAS Institute of Information and Communication Technologies, BAS

Sofia University St. Kliment Ohridski University of Plovdiv

Ontotext, Bulgaria Musala Soft, Bulgaria Tetracom Interactive Solutions, Bulgaria TransGlobe International Ltd., Bulgaria

18

Tuesday, June 28, 2011

Page 39: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Furthering NLP in Bulgaria

Conclusions

Several factors are mutually related for the success: clear formulation of target goals and strategies for their

accomplishment stable financing effective management of the resources beneficial relations between education - research -

business - end users networking

META-NET as a concerted, substantial, continent-wide effort in language technology research and engineering is relevant for all of these factors.19

Tuesday, June 28, 2011

Page 40: Furthering Natural Language Processing in Bulgaria · 2011-06-28  · Furthering NLP in Bulgaria Companies Web applications (dynamic web content) Content Management Systems (CMS)

Thank you very much for your attention.

20

Tuesday, June 28, 2011