meta-net white paper series - cesar...

51
www.meta-net.eu [email protected] Tel: +49 30 3949 1833 Fax: +49 30 3949 1810 META-NET White Paper Series !"#$%$ & "’()*+,)- $./)(-0%$).)- 1(&23’& 4’"+,0 X – 4(*+,$

Upload: dohanh

Post on 27-Jul-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

www.meta-net.eu [email protected] Tel: +49 30 3949 1833 Fax: +49 30 3949 1810

META-NET White Paper Series !"#$%$ & "'()*+,)- $./)(-0%$).)- 1(&23'& 4'"+,0 X – 4(*+,$

Preface

The development of this white paper has been funded by the Seventh Framework Programme of the European Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899).

5("16)')( !"# $%&'(# )%*'+ ,-'.# / (%0'1'2# (% 3#2%-%3# 3/"'3#&'2#, 4/*'5'6#&'2#, (%0'6,/( 0#(%73'1', 3#$5#"3'1'2# (%0',# ,#/ ' $"'2# ,/(' 8%*% 7# $% 9$4/$5#"' '$5'3$,' "':%(%0'63# ;"&/4#. !"# $%&'(# 4&/2/"':% 03#-% / (%0'6,'2 5%+3/*/.'(#2# (<=) ' -'+/"'2 2/.9>3/$5'2#. ?/,&'"%3/$5 (%0'6,'2 5%+3/*/.'(#2# ' 3#6'3 -'+/"% 94/5&%)% $% 9 ;"&/4' &#0*',9(% /7 (%0',# 7/ (%0',#. @)/. 5/.# $% &#0*',9(9 ' #,5'"3/$5' ,/(% (% 4/5&%)3/ $4&/"%$5' 7# )' $% 4/7&8#*# '$5&#8'"#-# ' &#0"/(, # 3%/4+/73' ,/&#1' 0#"'$% /7 23/.'+ A#,5/&# ,#/ :5/ $9 $*/8%3/$5 (%0',# '*' "%*'6'3# 0#(%73'1% ,/(# .# ,/&'$5'. META-NET $% $9/6#"# $# /"'2 '0#0/"/2 5#,/ :5/ 0#4/6'-% #3#*'09 5%,9>%. $5#-# (%0'6,'+ &%$9&$# ' 5%+3/*/.'(#. B3#*'0# (% 9$2%&%3# 3# 23 0"#3'63# %"&/4$,# (%0',# ' 3%,/*',/ 03#6#(3'+ &%.'/3#*3'+ (%0',#. C%09*5#5' #3#*'0% $9.%&':9 4/$5/(#-% 23/.'+ "#83'+ 3%7/$5#5#,# 0# $"#,' (%0',. D%5#E3# %,$4%&5$,# #3#*'0# ' 4&/1%3# $'59#1'(% 0# $"#,' (%0', 4/2/>' >% 7# $% 2#,$'2'09(% 95'1#( (%0'6,'+ 5%+3/*/.'(# ' 2'3'2'09(9 4&#5%>' &'0'1'. META-NET (% 2&%8# '0"&$3/$5' ;"&/4$,% ,/2'$'(% ,/(# $% $#$5/(' /7 44 '$5&#8'"#6,# 1%35&# '0 31 0%2E%. META-NET &#7' $# 0#'35%&%$/"#3'2 $5&#3#2# '0 23/.'+ 4/7&96(# 7&9:5"#, 4&'"&%7% ' '$5&#8'"#-# 9 1'E9 $5"#&#-# $5&#5%:,'+ "'0'(# ' 4&/'0"/F%-# $5&#5%:,'+ '$5&#8'"#6,'+ 4&/.&#2# ,/(' 4/,#09(9 ,#,/ 4&'2%3% (%0'6,% 5%+3/*/.'(% 2/.9 7# /5,*/3% $"% 3%7/$5#5,% 7/ 2020. ./7'3%.

Impresum B95/&' ' 9&%73'1': Dr. Aljoscha Burchardt, DFKI Kathrin Eichler, DFKI Dr. Georg Rehm, DFKI Prof. Dr. Hans Uszkoreit, Universität des Saarlandes and DFKI @# $&4$,' 7%/: ?&/A. D9:,/ G'5#$, H3'"%&0'5%5 9 I%/.&#79 ?&/A. J9)/2'& ?/4/"'>, H3'"%&0'5%5 9 I%/.&#79 ?&/A. K"%5#3# L&$5%", H3'"%&0'5%5 9 I%/.&#79 D& M*#7%3 N5#3/(%"'>, O3$5'595 "M'+#(*/ ?94'3" ?&/A. O"#3 !)&#7/"'>, H3'"%&0'5%5 9 I%/.&#79

Contents

3 INTERNAL DRAFT

401(708 5("16)')(............................................................................................................................................................................2

Impresum..........................................................................................................................................................................................2 401(708................................................................................................................................................................................3 9"#$-"...................................................................................................................................................................................4 9$#$, *) .02" 8"#$," $ $#0#)' *("1 8"#$:,)- 3";.)<)6$8)- ..................................................................................6 !"#$%&" '()*$+" ,-".)/0 "1(,23&, $*4,(-)+$,*, 5(06.1,.......................................................................................................7 7)6$ /"#$+$ 0 ,2)3*,3.$ ...............................................................................................................................................................7 !"#$%&) ."8*,9,'$/) /" &:0%*) ."8*,9,'$/)..................................................................................................................................8 ;,'0<*,3.$ /"#$%&" ."8*,9,'$/"...................................................................................................................................................9 =#)#,1$ 2("5 /"#$%&,- ."8*,9,'$/,-.........................................................................................................................................10 >31)/)?" /"#$&) .............................................................................................................................................................................10

Serbian in the European Information Society ...............................................................................................................12 Opšti podaci ....................................................................................................................................................................................12 Specifi@nosti srpskog jezika............................................................................................................................................................13 Fonetika, fonologija, morfofonologija...........................................................................................................................................13 Morfologija (vrste re@i, promena re@i, tvorba re@i)........................................................................................................................14 Leksika, frazeologija, terminologija, onomastika ..........................................................................................................................15 Sintaksa, lingvistika teksta.............................................................................................................................................................16 Pravopis (alfabet, tip pravopisa, interpunkcija, pisanje stranih re@i) .............................................................................................18 Srpski i drugi standardni jezici štokavske provenijencije ..............................................................................................................19 Recent developments ......................................................................................................................................................................19 Negovanje jezika u Srbiji................................................................................................................................................................20 Rad na normiranju i negovanju jezika u skladu sa novim zvani@nim identitetom jezika..............................................................................................................................................................................................20 Osavremenjavanje norme ..............................................................................................................................................................20 Odgovor na sve veAi uticaj engleskog jezika .................................................................................................................................20 Poboljšanje stanja u oblasti leksikografije .....................................................................................................................................20 Jezik i obrazovanje .........................................................................................................................................................................21 MeBunarodni aspekti ......................................................................................................................................................................21 Srpski jezik na internetu .................................................................................................................................................................22 Izabrana literatura ...........................................................................................................................................................................26 5)1(2,0 8"#$:,$- 3";.)<)6$80-0 #0 +(*+,$ 8"#$, .................................................................................................27 !"#$%&" ."8*,9,'$/" ......................................................................................................................................................................27 C(8$."&.0(" )29$&)+$/) /"#$%&$8 ."8*,9,'$/)..........................................................................................................................27 D3*,1*) 2,:) 2($-"*" ................................................................................................................................................................28 E(,1"() /"#$&)..............................................................................................................................................................................28 E(".()F$1)?" 1"G) .....................................................................................................................................................................29 =*."()&+$/) ',1,(,- ...................................................................................................................................................................31 ;)6$*3&, 2("1,H"?"..................................................................................................................................................................33 !"#$%&" ."8*,9,'$/" ‘$#) 3+"*"’...................................................................................................................................................36 !"#$%&" ."8*,9,'$/" 0 ,G()#,1)?0 ..............................................................................................................................................38 E(,'()-$ #) /"#$%&" ."8*,9,'$/".................................................................................................................................................38 I,3.02*,3. )9).) $ ("30(3) #) 3(23&$ ........................................................................................................................................41 J)G"9) )9).) $ ("30(3) .................................................................................................................................................................44 K)&:0%+$ .......................................................................................................................................................................................46 =>?@-A>? (META-NET)..............................................................................................................................................47 J($ 2()1+) )&+$/" ;LJC-7LJC .................................................................................................................................................47 M)3.)1 -("F" $#1(3*,3.$ ;LJC-7LJ .......................................................................................................................................49 How to Participate?.........................................................................................................................................................................50

Comment [mpl1]: #,5'"#1'/3# 5%+3/*/.'(# (% 2/( 4&%"/7 0# "ena-bling technology"

A Risk for Our Languages and a Challenge for Language Technology

4 INTERNAL DRAFT

9"#$-" M3/.'2 %"&/4$,'2 (%0'1'2# 4&%5' /4#$3/$5 7# 4/$5#39 8&5"% 7'.'5#*3/. 7/)# 0#5/ :5/ $9 $*#)/ 4&%7$5#"E%3' ' 3%2#(9 7/"/E3/ !"#$%" &%$9&$#. !.&/23% &%.'/3#*3% 5&8':3% 2/.9>3/$5' 7#3#$ $9 3%'$,/&':>%3% 0)/. (%0'6,'+ )#&'(%&#. B,/ /72#+ 3% ,&%3%2/ 9 #,1'(9, 23/.' $5#3/"3'13' ;"&/4% )'>% 7/"%7%3' 9 7&9:5"%3/ ' %,/3/2$,' 3%4/"/E#3 4/*/8#( 0#5/ :5/ ./"/&% $"/('2 2#5%&-'2 (%0',/2. O3/"#5'"3# (%0'6,# 5%+3/*/.'(# (<=) (% 4/$&%73', ,/(' >% /2/.9>'5' $5#3/"3'1'2# ;"&/4% 7# 96%$5"9(9 9 %,/3/2$,' 9$4%:3/2 7&9:5"9 03#-# ' '3A/&2#1'(#. G':%*'3."#*3# (%0'6,# 5%+3/*/.'(# /$5"#&'>% 2/2%35#*39, (%A5'39 ' *#,9 ,/293',#1'(9 ' '35%&#,1'(9 '0"#3 (%0'6,'+ .&#3'1#. <%0'6,% 9$*9.% 7#3#$ 4&%"#$+/73/ 397% ,/2%&1'(#*3' 7/)#"E#6' '0 NBD. Google Translate, )%$4*#53# O35%&3%5 9$*9.#, $#2/ (% (%7#3 4&'2%&. N,/&#:-' 9$4%+ IBM-/"/. &#693#&$,/. $'$5#2# Watson ,/(' (% 4/)%7'/ 9 %4'0/7' ,"'0# Jeopardy 4&/5'" E97', '*9$5&9(% 3%'02%&3% 4/5%31'(#*% *%0'6,% 5%+3/*/.'(%. L#/ ;"&/4E#3', 4/5&%)3/ (% $%)' 7# 4/$5#"'2/ 3%,/*',/ 3%/7*/83'+ 4'5#-#:

! D# *' 3#:# ,/293',#1'(# ' '3A&#$5&9,59&# 03#-# 5&%)# 7# 0#"'$' /7 2/3/4/*'$5'6,'+ ,/24#3'(#?

! M/8%2/ *' 9'$5'39 7# $% /$*/3'2/ 3# 9$*9.% "%0#3% 0# (%0', ,/(% 7&9.' 2/.9 5&%3953/ 7# 3#2 9$,&#5%?

! D# *' $% #,5'"3/ 3#72%>%2/ 3# .*/)#*3/2 5&8':59 0# '$5&#8'"#-% ' &#0"/( 9 (%0'6,/( 5%+3/*/.'('?

! D# *' $9 5&%># *'1# $# 7&9.'+ ,/35'3%3#5# "/E3# 7# &%:#"#(9 3#:% 4&/)*%2% 4&%"/F%-# ' 7&9.# 4'5#-# 9 "%0' $# %"&/4$,/2 "':%(%63/:>9?

! M/8% *' 3#:% %"&/4$,/ ,9*59&3/ /)&#0/"#-% 7# 4/2/.3% 9 /)*',/"#-9 7&9:5"# 03#-# 397%>' )/E9, $'.9&3'(9, 4&%1'03'(9, '3/"#5'"3'(9 ' &/)9$3'(9 5%+3/*/.'(9 "'$/,/. ,"#*'5%5#?

!"# )%*# ,-'.# 0# $&4$,' (%0', 4/,#09(% 7# $9 3%,% /7 /$3/"3'+ (%0'6,'+ 5%+3/*/.'(# &#0"'(%3% (,/24/3%35% ,/(% 0#"'$% /7 2/&A/*/.'(%), 4/$%)3/ 9 '$5&#8'"#6,/2 /,&98%-9, ' 7# $9 3%,% /7 -'+ 4&'2%-%3% 9 '379$5&'(' ' 4/$*/"#-9, 34&. /3% 0#$3/"#3% 3# ./"/&9. H4&,/$ 5/2%, '35%&%$/"#-% 5&8':5# 0# (%0'6,% 5%+3/*/.'(% (% (/: 9"%, $*#)/. ?&%2# /1%3' ,/(# (% '03%5# 9 /"/2 '0"%:5#(9, 3%/4+/73# (% 2/2%35#*3# #,1'(# 7# )' $% 4/$5'.#/ )'*/ ,#,#" 5%+3/*/:,' 4&/)/( 9 7/2%39 $&4$,/. (%0',#. META-NET 7/4&'3/$' '0.&#7-' $3#83/., "':%(%0'6,/. %"&/4$,/. 7'.'5#*3/. '3A/&2#1'/3/. 4&/$5/&#. N+"#5#(9>' /"#( 1'E, 29*5',9*59&#*3' $#"%0 3#1'(# 2/8% 7# 3#4&%79(% ' 7# 4/$5#3% 90/& 0# 2'&/E9)'"9 2%F93#&/739 $#&#7-9 0#$3/"#39 3# (%73#,/$5'. B,/ /"#( 1'E 3% )97% 2/.#/ 7# $% /$5"#&', E"&/4# >%

A Risk for Our Languages and a Challenge for Language Technology

5 INTERNAL DRAFT

2/&#5' 7# )'&# '02%F9 8&5"/"#-# $"/. ,9*59&3/. '7%35'5%5# ' %,/3/2$,/. 4/&#0#.

A Risk for Our Languages and a Challenge for Language Technology

6 INTERNAL DRAFT

9$#$, *) .02" 8"#$," $ $#0#)' *("1 8"#$:,)- 3";.)<)6$8)- L#/ :5/ '*9$5&9(9 $,/&#:-' 7/.#F#(' 9 N%"%&3/( BA&'1', $"%7/1' $2/ 7'.'5#*3% &%"/*91'(% ,/(# 7&#2#5'63/ 95'6% 3# ,/293',#1'(% ' 7&9:5"/. N#"&%2%3' &#0"/( 2&%83'+ 5%+3/*/.'(# 4//&%7' $% $# P95%3)%&./"'2 '092/2 :5#24#&'(%. Q5# /"# #3#*/.'(# 2/8% 7# 3#2 ,#8% / )979>3/$5' %"&/4$,/. '3A/&2#1'/3/. 7&9:5"# ' 4/$%)3/ 3#:'+ (%0',#? ?/$*% P95%3)%&./"/. '092#, $5"#&3% 4&/7/&% 9 ,/293',#1'(' ' &#02%3' 03#-# /$5"#&'*# $9 7%*# ,#/ :5/ (% R95%&/" 4&%"/7 )')*'(%. H 3#&%73'2 "%,/"'2# &#0"'(%3% $9 ,9*59&3% 5%+3',% 0# )/E9 /)&#79 (%0',# ' &#02%39 03#-#:

! 4&#"/4'$3# ' .&#2#5'6,# $5#37#&7'0#1'(# .*#"3'+ (%0',# /2/.9>'*# (% )&0/ 4&%3/:%-% 3/"'+ 3#963'+ ' '35%*%,59#*3'+ '7%(#;

! &#0"/( 0"#3'63'+ (%0',# /2/.9>'/ (% $5#3/"3'1'2# 7# 2%F9$/)3/ ,/293'1'&#(9 9395#& (6%$5/ 4/*'5'6,'+) .&#3'1#;

! 4/796#"#-% ' 4&%"/F%-% (%0',# /2/.9>'*/ (% &#02%-'"#-% 2%F9 (%0'1'2#;

! ,&%'&#-% 3/"'3#&$,'+ ' )')*'/.&#A$,'+ $2%&3'1# /)%0)%7'*/ (% ,"#*'5%5 ' &#$4/*/8'"/$5 :5#24#3/. 2#5%&'(#*#;

! ,&%'&#-% &#0*'6'5'+ 2%7'(# ,#/ :5/ $9 3/"'3%, &#7'/, 5%*%"'0'(#, ,-'.% ' 7&9.' /)*'1' 0#7/"/E'*' $9 &#0*'6'5% 4/5&%)% 0# ,/293',#1'(/2.

H 4/$*%7-'+ 7"#7%$%5 ./7'3# '3A/&2#1'/3# 5%+3/*/.'(# 4/2/.*# (% 7# $% #95/2#5'09(% ' /*#,:# 493/ 4&/1%$#:

! $/A5"%& 0# $5/3/ '07#"#:5"/ 0#2%-9(% 4'$#>9 2#:'39 ' $*#.#-% $*/.#;

! Microsoft PowerPoint 0#2%-9(% .&#A/$,/4$,% A/*'(%;

! 7/,92%35# $% :#E9 ' 4&'2#(9 %*%,5&/3$,/2 4/:5/2 6%$5/ )&8% 3%./ A#,$ 2#:'3/2;

! Skype $% ,/&'$5' 0# 5%*%A/3'&#-% 4&%,/ O35%&3%5# ' /&.#3'0/"#-% "'&5%*3'+ $#$5#3#,#;

! A/&2#5' #97'/ ' "'7%/ 0#4'$# /*#,:#"#(9 &#02%39 29*5'2%7'(#*3'+ $#7&8#(#;

! 4&%5&#8'"#6,% 2#:'3% /)%0)%F9(9 4&'$594 "%) $5&#3#2# 4&%,/ ,E963'+ &%6';

! !"#$%" 9$*9.% ,#/ :5/ (% Google Translate 4&/'0"/7% )&0/ ' 4&')*'83/ 4&%"/F%-%;

! 7&9:5"%3' 2%7'(' /*#,:#"#(9 $#&#7-9 ' 7%E%-% '3A/&2#1'(#.

M#7# $9 $"' /"' #*#5' ' #4*',#1'(% /7 "%*',% 4/2/>', 7# *' $9 /3' 7/"/E3' 7# /$5"#&% /7&8'"/ "':%(%0'63/ %"&/4$,/

>2()1, 3-, 31"5,+$ 5$'$.)9*" ("1,90+$/" &,/) 3" -,F" 02,("5$.$ 3) N0."*G"(',1$- $#0-,- 6.)-2)($/".

A Risk for Our Languages and a Challenge for Language Technology

7 INTERNAL DRAFT

'3A/&2#1'/3/ 7&9:5"/, 2/7%&3/ 7&9:5"/ $# $*/)/73'2 4&/5/,/2 '3A/&#1'(% ' &/)#?

!"#$:," 6(0.$%" )-"308& "'()*+,) $./)(-0%$).) 1(&23') S% 2/8%2/ 4&%1'03/ 7# 03#2/ ,#,/ >% '0.*%7#5' )979>% '3A/&2#1'/3/ 7&9:5"/. L#7# $% 7'$,959(% / 0#(%73'6,/( %"&/4$,/( %3%&.%5$,/( $5&#5%.'(' '*' 2%F93#&/73/( 4/*'5'1', 2/87# >%2/ 8%*%5' 7# 69(%2/ %"&/4$,% 2'3'$5&% '3/$5&#3'+ 4/$*/"# ,#,/ ./"/&% 3# $"/('2 2#5%&-'2 (%0'1'2#. M/87# >%2/ 8%*%5' 5&')'3% 3# ,/('2# >% E97' ,/(' ./"/&% 23/./ &#0*'6'5'+ (%0',# ' ,/(' '2#(9 &#0*'6'5# (%0'6,# 03#-#, 2/>' 7# 7'$,959(9 / $4%1'A'63/( 5%2' # 7# 5%+3/*/.'(# #95/2#5$,' 4&',94E# -'+/"% $5#"/"% ' .%3%&':% ,&#5,% &%0'2%%. M/8%2/ 7# 4/8%*'2/ ' 7# &#0./"#&#2/ $# #.%35/2 07&#"$5"%3/. /$'.9&#-# ,/(' $% 3#*#0' 9 $5&#3/( 0%2E'. <#$3/ (% 7# 3#:% ,/293',#5'"3% 4/5&%)% '2#(9 7&9.#6'(' ,"#*'5%5 9 /73/$9 3# 4&% 3%,/*',/ ./7'3#. H .*/)#*3/( %,/3/2'(' ' '3A/&2#1'/3/2 4&/$5/&9 $9/6#"#2/ $% $# "%>'2 )&/(%2 (%0',#, ./"/&3',# ' $#7&8#(# # /7 3#$ $% 0#+5%"# 7# )&0/ '35%&#.9(%2/ $# 3/"'2 5'4/"'2# 2%7'(#. B,59%*3# 4/49*#&3/$5 7&9:5"%3'+ 2%7'(# ,#/ :5/ (% Wikipedia, Facebook, Twitter ' YouTube, $#2/ $9 "&+ *%7%3/. )&%.#. D#3#$ 2/8%2/ 7# 4&%3%$%2/ .'.#)#(5% 5%,$5# :'&/2 $"%5# 0# $"%.# 3%,/*',/ $%,937', 4&% 3%./ :5/ 9$5#3/"'2/ 7# (% 3#4'$#3 3# (%0',9 ,/(' 3% &#092%2/. ?&%2# 3%7#"3/2 '0"%:5#(9 ,/(' (% 9&#F%3 3# 0#+5%" ;"&/4$,% ,/2'$'(%, 57% O35%&3%5 ,/&'$3',# 3#&969(% &/)9 ' 9$*9.% 3# 3%-2#5%&-'2 (%0'1'2#. (;3.*%$,' (% 3#(6%:>' $5&#3' (%0', # 0# -'2 $*%7% A&#319$,', 3%2#6,' ' :4#3$,'.) 55% ,/&'$3',# 6'5# $#7&8#(%.1 D/ 4&% 3%,/*',/ ./7'3# %3.*%$,' (%0', (% )'/ &#73' (%0', (lingua franca) "%)# – /.&/23# "%>'3# $#7&8#(# 3# "%)9 )'*# (% 3# %3.*%$,/2 (%0',9. N'59#1'(# (% 7#3#$ 7&#$5'63/ 4&/2%-%3#. L/*'6'3# /3*#(3 $#7&8#(# 3# 7&9.'2 (%0'1'2# (4/$%)3/ #0'($,'2 ' #&#4$,'2) 7/8'"%*# (% %,$4*/0'(9. N"%4&'$953# 7'.'5#*3# 4/7%*# '0#0"#3# (%0'6,'2 .&#3'1#2# '0#0"#*# (% '03%3#F9(9>% 2#*/ 4#8-% 9 (#"3/$5'; '4#,, /3# 4/$#5"E# 3%/7*/83/ 4'5#-% "L/(' >% %"&/4$,' (%0'1' 3#4&%7/"#5' ' /4$5#5' 9 92&%8%3/2 7&9:5"9 '3A/&2#1'(# ' 03#-#?”

A02$ 8"#$%$ & )*0+.)+3$ ?&/3#*#0#, :5#24#&'(% 7/4&'3%/ (% 3%4&/1%-'"/( &#02%3' '3A/&2#1'(# 9 ;"&/4', #*' (% 5#,/F% 7/"%/ 7/ but it also lead to the '02'&#-# 23/.'+ %"&/4$,'+ (%0',#. S# &%.'/3#*3'2 (%0'1'2# ' (%0'1'2# 2#-'3# &%5,/ $% :5#24#*/. C%09*5#5 5/.# )'/ (% 7# $9 23/.' (%0'1', ,#/ :5/ $9 ,/&3"#*$,' '*' 7#*2#5'3$,', 6%$5/

1 European Commission Directorate-General Information Society and Media, User language preferences online, Flash Eurobarometer #313, 2011 (http://ec.europa.eu/public_opinion/flash/fl_313_en.pdf).

O,/$ "1(,23&$ /"#$+$ <" *)2("5,1).$ $ ,23.).$ 0 0-("F"*,- 5(06.10 $*4,(-)+$/) $ #*)?)?

N9,G)9*) "&,*,-$/) $ $*4,(-)+$,*$ 2(,3.,( 30,%)1)/0 *)3 3) 1$6" /"#$&), ',1,(*$&) $ 3)5F)/).

A Risk for Our Languages and a Challenge for Language Technology

8 INTERNAL DRAFT

/.&#3'6#"#3' 3# 9$2%3/ 4&%3/:%-%, :5/ (% 7#E% /.&#3'6'*/ -'+/"/ 4&'*#./F#"#-%, :'&%-% ' 94/5&%)9. !,/ 60 (%0',# ;"&/4% 4&%7$5#"E# (%73/ /7 -%3'+ 3#()/.#5'('+ ' 3#("#83'('+ ,9*59&3'+ 7/)#&#. ;"&/4$,/ 23/:5"/ (%0',# (% 4&%$973' 6'3'*#1 -%3/. 7&9:5"%3/. 9$4%+#.2 ?/49*#&3' (%0'1' ,#/ :5/ $9 %3.*%$,' '*' ,'3%$,' $'.9&3/ >% $% /7&8#5' 3# &#$59>%2 7'.'5#*3/2 5&8':59, #*' )' 23/.' %"&/4$,' (%0'1' 2/.*' 7# )979 '0)&'$#3' 7'.'5#*3/2 ,/293',#1'(/2 ' 7# 4/$5#39 )%03#6#(3' 0# O35%&3%5 7&9:5"/. !"#,#" &#0"/( $5"#&' $"#,#,/ )' )'/ 3%4/8%E#3. N (%73% $5&#3%, )'*# )' '0.9)E%3# $5&#5%:,# 4&%73/$5 :5/ )' /$*#)'*/ .*/)#*3' %"&/4$,' 4/*/8#(. N 7&9.% $5&#3%, 5#,#" &#0"/( )'/ )' 9 $9,/)9 $# 1'E%2 (%73#,/. 96%:># 0# $"#,/. %"&/4$,/. $5#3/"3',# )%0 /)0'&# 3# (%0',. ?&%2/ UNESCO-"/2 '0"%:5#(9 / "':%(%0'63/$5', (%0'1' $9 $9:5'3$,' 2%7'(92 0# ,/&':>%-% /$3/"3'+ E97$,'+ 4&#"# ,#/ :5/ $9 4/*'5'6,/ /4&%7%E%-%, /)&#0/"#-% ' 96%$5"/"#-% 9 7&9:5"9.3

!"#$:,0 3";.)<)6$80 8" ,B&:.0 3";.)<)6$80 H 4&/:*/$5', '3"%$5'1'(% $9 9$2%&#"#3% 3# 96%-% ' 4&%"/F%-% (%0',#. S# 4&'2%&, 4&%2# 3%,'2 4&/1%3#2#, %"&/4$,/ 5&8':5% 4&%"/F%-#, '35%&4&%5#1'(%, */,#*'0#1'(% $/A5"%&# ' .*/)#*'0#1'(% "%) */,#1'(# '03/$'*/ (% 8.4 2'*'(#&7% %"&# 9 2008. ./7'3' $# /6%,'"#3'2 &#$5/2 /7 10% ./7':-%.4 ?# '4#,, 4/$5/(%>' /)'2 $&%7$5#"# 3'(% 7/"/E#3 7# 0#7/"/E' 5%,9>% ' )979>% 4/5&%)%. <%0'6,# 5%+3/*/.'(# (% ,E963# 5%+3/*/.'(# ,/(# 2/8% 7# 0#:5'5' ' 3%.9(% %"&/4$,% (%0',%. <%0'6,# 5%+3/*/.'(# 4/2#8% E97'2# 7# $#&#F9(9, 4/$*9(9, 7%*% 03#-% ' 96%$5"9(9 9 4/*'5'6,'2 ' 7&9:5"%3'2 &#$4&#"#2# )%0 /)0'&# 3# (%0'6,% 4&%4&%,% '*' ,/24(95%&$,9 "%:5'39. <%0'6,# 5%+3/*/.'(# "%> 4/2#8% 9 $"#,/73%"3'2 0#7#1'2# ,#/ :5/ $9 4'$#-% %*%,5&/3$,'+ 4/&9,#, !"#$%" 4&%5&#8'"#-% '3A/&2#1'(# '*' &%0%&"#1'(# *%5/"#. Language technology already assists everyday tasks, such as writing e-mails, searching for information online or booking a flight. <%0'6,# 5%+3/*/.'(# 3#2 4&98# ,/&'$5 ,#7./7 4&'$594'2/ 3%,/( /7 $*%7%>'+ #,5'"3/$5':

! 5&#8'2/ "%) $5&#3'19 '*' -%3 4&%"/7, ! ,/&'$5'2/ 4&#"/4'$3% ' .&#2#5'6,% 4&/"%&%;

! &#0.*%7#2/ 4&%4/&9,% 0# 3%,' 4&/'0"/7 9 !"#$%" 4&/7#"3'1';

! $*9:#2/ .*#$/"3# 9495$5"# $'35%5'0/"#3/. ./"/&# 9 $'$5%29 0# 3#"'.#1'(9;

! 4&%"/7'2/ "%) $5&#3'1% 4/2/>9 !"#$%" $%&"'$#.

2 European Commission, Multilingualism: an asset for Europe and a shared commitment, Brus-sels, 2008 (http://ec.europa.eu/education/languages/pdf/com/2008_0566_en.pdf). 3 UNESCO Director-General, Intersectoral mid-term strategy on languages and multilingual-ism, Paris, 2007 (http://unesdoc.unesco.org/images/0015/001503/150335e.pdf). 4 European Commission Directorate-General for Translation, Size of the language industry in the EU, Kingston Upon Thames, 2009 (http://ec.europa.eu/dgs/translation/publications/studies).

P"9$&) ()#*,1(3*,3. /"#$&) 0 L1(,2$ /"3." /"5*, ,5 ?"*$8 *)/#*)%)/*$/$8 &09.0(*$8 5,G)() $ 306.$*3&$ %$*$9)+ "1(,23&,' 032"8).

!"#$%&) ."8*,9,'$/) 2,-)F" :05$-) 5) 3)()H0/0, 2,390/0, 5"9" #*)?" $ 0%"3.10/0 0 5(06.1"*$- $ 2,9$.$%&$- ()32()1)-) *) ()#9$%$.$- /"#$+$-).

A Risk for Our Languages and a Challenge for Language Technology

9 INTERNAL DRAFT

<%0'6,% 5%+3/*/.'(% / ,/'(2# $% 7%5#E3/ ./"/&' 9 /"/2 7/,92%359 4&%7$5#"E#(9 $9:5'3$,' 7%/ '3/"#5'"3'+ )979>'+ #4*',#1'(#. <%0'6,# 5%+3/*/.'(# (% 4/ 4&#"'*9 #,5'"#1'/3# 5%+3/*/.'(#, 5(. 5%+3/*/.'(# ,/(# /2/.9>9(% &#0"/( 7&9.'+ 5%+3/*/.'(# 9 /,"'&9 :'&'+ /,"'&# 4&'2%3% ,#/ :5/ $9 $'$5%2' 3#"'.#1'(% '*' 4&%5&#8'"#6,% 2#:'3%. !"% )%*% ,-'.% 9$&%7$&%F%3% $9 3# $4&%23/$5 /$3/"3'+ (%0'6,'+ 5%+3/*/.'(# 9 $"#,/2 (%0',9. H )*'$,/( )979>3/$5' )'>% 3#2 4/5&%)3#, 0# $"% %"&/4$,% (%0',%, (%0'6,# 5%+3/*/.'(# ,/(# (% &#$4/*/8'"#, available, 7/$5943# ' 5%$3/ 4/"%0#3# $# :'&'2 $/A5"%&$,'2 /,&98%-'2#. I%0 (%0'6,% 5%+3/*/.'(% 3'(% 2/.9> 3' '35%&#,5'"3', 29*5'2%7'(#*3' ' "':%(%0'63' ,/&'$3'6,' 7/8'"E#(.

=)6&C.)+3$ 8"#$:," 3";.)<)6$8" <%0'6,# 5%+3/*/.'(# 2/8% 7# /2/.9>' #95/2#5$,/ 4&%"/F%-%, 4&/'0"/F%-% $#7&8#(#, /)&#79 '3A/&2#1'(# ' 94&#"E#-% 03#-%2 9 $"'2 %"&/4$,'2 (%0'1'2#. !3# 2/8% 7# 93#4&%7' &#0"/( '359'5'"3'+ (%0'6,'+ '35%&A%($# 0# ,9>3% %*%,5&'63% #4#&#5%, 2#:'3%, #95/2/)'*%, &#693#&% ' &/)/5%. O#,/ 4/$5/(' "%> 7/$5# 4&/5/5'4/"#, ,/2%&1'(#*3% ' '379$5&'($,% 4&'2%3% $9 (/: 9 &#3/( A#0' &#0"/(#. The current rate of progress creates a genuine window of opportunity with research steadily progressing during the last few years. @# 23/.% %"&/4$,% (%0',% 2#:'3$,/ 4&%"/F%-% (M?), 3# 4&'2%&, "%> 4/$5'8% &#09239 2%&9 4&%1'03/$5' 9 /,"'&9 $4%1'A'63'+ 7/2%3#, # %,$4%&'2%35#*3% #4*',#1'(% /)%0)%F9(9 "':%(%0'63% '3A/&2#1'(%, 94&#"E#-% 03#-%2 ' 4&/'0"/F%-% $#7&8#(#. =&#7'1'/3#*3/, (%0'6,% #4*',#1'(%, .*#$/"3' ,/&'$3'6,' '35%&A%($' ' 7'(#*/:,' $'$5%2' $&%>9 $% 9 "'$/,/ $4%1'(#*'0/"#3'2 7/2%3'2#, ' /)'63/ '2#(9 /.&#3'6%3' 96'3#,. <%73/ #,5'"3/ '$5&#8'"#6,/ 4/7&96(% (%$5% ,/&':>%-% (%0'6,% 5%+3/*/.'(% 9 /4%&#1'(#2# $4#:#"#-# 9 /)*#$5'2# 4/./F%3'2 ,#5#$5&/A#2#. H 5#,"'2 "'$/,/ &'0'63'2 /,&98%-'2#, 4&%1'03/$5 4&%"/7# 2/8% 03#6'5' 8'"/5 '*' $2&5. O$5/ &#$9F'"#-% 2/8% $% 4&'2%3'5' 3# ,/&':>%-% (%0'6,% 5%+3/*/.'(% 9 '379$5&'(' 07&#"$5"%3% 0#:5'5%. O35%*'.%353' &/)/5' $# "':%(%0'63'2 $4/$/)3/$5'2# 9 2/.9>3/$5' $9 7# $4#$9 8'"/5%. !.&/23% 5&8':3% 2/.9>3/$5' *%8% 9 /)*#$5' /)&#0/"#-# ' 0#)#"%, ' 5/ ,&/0 '35%.&#1'(9 (%0'6,'+ 5%+3/*/.'(# 9 ,/24(95%&$,% '.&%, 4&/'0"/7% 0# 0#)#"9 $# /)&#0/"3'2 $#7&8#('2#, $'29*#1'(9 '*' 4&/.&#2% /)9,%. M/)'*3% '3A/&2#1'/3% 9$*9.%, $/A5"%& 0# ,/24(95%&$,9 4/7&:,9 96%-9 (%0',#, /,&98%-# 0# %*%,5&/3$,/ 96%-% ' 96%-% 3# 7#8'39, #*#5' 0# $#2//1%-'"#-% ' /5,&'"#-% 4*#.'(#5# $#2/ $9 (/: 3%,' 4&'2%&' .7% (%0'6,% 5%+3/*/.'(% 2/.9 7# /7'.&#(9 "#839 9*/.9. ?/49*#&3/$5 #4*',#1'(# 7&9:5"%3'+ 2%7'(# ,#/ :5/ $9 ="'5%& (Twitter) ' T%($)9, (Facebook) $9.%&':% 4/5&%)9 0# 9$#"&:#"#-%2 (%0'6,'+ 5%+3/*/.'(# ,/(% 2/.9 7# 3#7.*%7#(9 4/:59, &%0'2'&#(9 7'$,9$'(%, $9.%&':9 5&%37/"%

D /"#$%&,/ ."8*,9,'$/$ -,F" 5) 3" ()#-$6:) &), , ,2"().$1*,- 3$3."-0 #) 3)5(F)/ $ &,($3*$%&0 $*."()&+$/0.

Comment [mpl2]: 2/( 4&%"/7

A Risk for Our Languages and a Challenge for Language Technology

10 INTERNAL DRAFT

2-%-#, /5,&'"#(9 %2/1'/3#*3% /7./"/&%, '7%35'A',9(9 4/"&%7% #95/&$,'+ 4&#"# '*' 4&#5% 0*/94/5&%)%. <%0'6,# 5%+3/*/.'(# 4&%7$5#"E# /.&/239 4&'*',9 0# ;"&/4$,9 93'(9 ' 9 %,/3/2$,/2 ' 9 ,9*59&3/2 $2'$*9. G':%(%0'63/$5 (% 9 ;"&/4' 4/$5#*/ 4&#"'*/. ;"&/4$,/ 4/$*/"#-%, /&.#3'0#1'(% ' :,/*% 5#,/F% $9 "':%(%0'63% ' &#03/*',%. N5#3/"3'1' 8%*% 7# ,/293'1'&#(9 '0"#3 (%0'6,'+ .&#3'1# ,/(% (/: 9"%, 4/$5/(% 3# (%7'3$5"%3/2 %"&/4$,/2 5&8':59. <%0'6,# 5%+3/*/.'(# 2/8% 7# 4/2/.3% 9 4&%"#0'*#8%-9 4&%/$5#*'+ 4&%4&%,# ,&/0 4/7&:,9 $*/)/73/( ' /5"/&%3/( 94/5&%)' (%0',#. Q5# "':%, '3/"#5'"3#, 29*5'*'3."#*3# (%0'6,# 5%+3/*/.'(# &#0"'(%3# 0# %"&/4$,% 4/5&%)% 2/8% 7# 3#2 4/2/.3% 9 ,/293',#1'(' $# 3#:'2 .*/)#*3'2 4#&53%&'2# ' -'+/"'2 "':%(%0'63'2 0#(%73'1#2#. <%0'6,% 5%+3/*/.'(% 4/7&8#"#(9 /)'E% 2%F93#&/73'+ %,/3/2$,'+ 2/.9>3/$5'.

D#0#)'$ *("1 8"#$:,)- 3";.)<)6$8)- M#7# (% (%0'6,# 5%+3/*/.'(# 4/$5'.*# 03#6#(#3 3#4&%7#, 4/$*%7-'+ 3%,/*',/ ./7'3#, 5%,9>' 5%24/ 5%+3/*/:,/. 3#4&%5,# ' '3/"#1'(% 4&/'0"/7# (% $9"':% $4/&. S% 2/8%2/ 7# 6%,#2/ 7%$%5 '*' 7"#7%$%5 ./7'3# 7# )'$2/ 3#4&#"'*' 03#6#(3# 4/)/E:#-# ,/(# >% 93#4&%7'5' ,/293',#1'(9 ' 4&/79,5'"3/$5 9 3#:%2 "':%(%0'6,/2 /,&98%-9. <%0'6,% 5%+3/*/.'(% ,/(% $9 9 :'&/,/( 94/5&%)', ,#/ :5/ $9 .&#2#5'6,% ' 4&#"/4'$3% 4&/"%&%, 4/ 4&#"'*9 $9 (%73/(%0'63%, ' 4/$5/(% $#2/ 0# 2#*' )&/( (%0',#. B4*',#1'(% 0# "':%(%0'639 ,/293',#1'(9 0#+5%"#(9 /7&%F%3' 3'"/ 4&/A'-%3/$5'. M#:'3$,/ 4&%"/F%-% ' !"#$%" 9$*9.% ,#/ :5/ $9 Google Translate '*' Bing Translator, /7*'63' $9 9 4&/'0"/F%-9 7/)&% #4&/,$'2#1'(% $#7&8#(# 7/,92%35#. M%F95'2, 5#,"% /3*#(3 9$*9.% ' 4&/A%$'/3#*3% M? #4*',#1'(% $5"#&#(9 493/ 4/5%:,/># ,#7# (% 4/5&%)3/ 7# $% 4&/'0"%79 "'$/,/ 4&%1'03' ' 4/5493' 4&%"/7'. O2# 493/ 7/)&/ 4/03#5'+ 4&'2%&# 4/.&%:3'+ 4&%"/7# ,/(' 0"96% $2%:3/, ,#/ :5/ (%, 3# 4&'2%&, )9,"#*3' 4&%"/7 '2%3# Bush (8)93) '*' Kohl (,%*%&#)#); /3' '*9$5&9(9 '0#0/"% $# ,/('2# (%0'6,# 5%+3/*/.'(# 5%, 5&%)# 7# $% $9/6'.

E+'080F" 8"#$,0 D# )'$2/ '*9$5&/"#*' 3#6'3 3# ,/(' &#693#&' 4/$594#(9 $# (%0',/2 ' 0#:5/ (% 9$"#(#-% (%0',# "&*/ 5%8#, 0#7#5#,, 4/.*%7#>%2/ 9,&#5,/ ,#,/ E97' 9$"#(#(9 4&"' ' 7&9.' (%0', # /37# >%2/ 7# $,'1'&#2/ 3#6'3 3# ,/(' &#7' $'$5%2 2#:'3$,/. 4&%"/F%-# – 4/$5/(' &#0*/. 0#:5/ (% /)*#$5 (%0'6,% 5%+3/*/.'(% 5%$3/ 4/"%0#3# $# /)*#:>9 "%:5#6,% '35%*'.%31'(%. J97' $5'69 (%0'6,% "%:5'3% 3# 7"# &#0*'6'5# 3#6'3#. ?&"/, )%)# 96' $"/( 2#5%&-' (%0', ,&/0 4&'2%&%. O0*/8%3/$5 ,/3,&%53'2 (%0'6,'2 /)&#$1'2# ./"/&3',# ,#/ :5/ $9 &/7'5%E', )&#># ' $%$5&% ' 7&9.' 6*#3/"' 4/&/7'1%, 4/2#8% )%)#2# 7# 9 90&#$59 /7 4&')*'83/ 7"% ./7'3% 4&/'0"%79 $"/(% 4&"% &%6' '*' ,&#5,%

J"&0<$ ."-2, ."8*,9,6&,' *)2(".&) 301$6" /" 32,( 5) G$ 3" 3" 0 *)("5*$8 5"3". $9$ 51)5"3". ',5$*) 3.$'9, 5, 306.$*3&,' 3,4.1"(3&,' 2(,$#1,5).

P$6"/"#$%*,3. /" 2()1$9,, *" $#0#".)&.

Q05$ 3.$%0 /"#$%&" 1"6.$*" *) 51) ()#9$%$.) *)%$*): 0%"?"- 2($-"() $ 0%"?"- /"#$%&$8 2()1$9).

A Risk for Our Languages and a Challenge for Language Technology

11 INTERNAL DRAFT

A&#0%. =/ (% 2/.9>% $#2/ 0#+"#E9(9>' 4/$%)3/( .%3%5$,/( $,*/3/$5' E97' 0# 96%-% 4&"/. (%0',#. H6%-% 7&9./. (%0',# /)'63/ 0#+5%"# 23/./ "':% 3#4/&#. H :,/*$,/2 90&#$59 $5&#3' (%0'1' $% /)'63/ 9$"#(#(9 96%-%2 .&#2#5'6,'+ $5&9,59&#, &%63',# ' 4&#"/4'$# '0 ,-'.# ' /)&#0/"3'+ 2#5%&'(#*# ,/(' /4'$9(9 *'3."'$5'6,/ 03#-% 9 5%&2'3'2# #4$5&#,53'+ 4&#"'*#, 5#)%*# ' 5%,$5/"#. H6%-% $5&#3/. (%0',# 0#+5%"# 23/./ "&%2%3# ' 3#4/&#, ' $# ./7'3#2# 4/$5#(% $"% 5%8%. D"# .*#"3# 5'4# $'$5%2# (%0'6,'+ 5%+3/*/.'(# 9$"#(#(9 (%0'6,% $4/$/)3/$5' 3# $*'6#3 3#6'3 ,#/ E97'. N5#5'$5'6,' 4&'$594' $5'69 *'3."'$5'6,/ 03#-% '0 /.&/23'+ ,/*%,1'(# ,/3,&%53'+ 4&'2%&# 5%,$5/"# 3# (%73/2 (%0',9 '*' 50". 4#&#*%*3'+ 5%,$5/"# 3# 7"# '*' "':% (%0',#. B*./&'52' 2#:'3$,/. 96%-# 2/7%*'&#(9 3%,' /)*', (%0'6,% $4/$/)3/$5' ,/(/2 $% 2/.9 '0"%$5' /)&#$1' ,/&%,53% 94/5&%)% &%6', ,&#5,'+ A&#0# ' ,/24*%53'+ &%6%3'1# 9 (%73/2 (%0',9, '*' 4&%"%$5' '0 (%73/. (%0',# 9 7&9.'. N5#5'$5'6,' 4&'$594' 0#+5%"#(9 /.&/2#3 )&/( &%6%3'1#. L"#*'5%5 &%09*5#5# &#$5% $# 4/&#$5/2 )&/(# #3#*'0'&#3'+ 5%,$5/"#. S'(% 3%9/)'6#(%3/ 7# $% 5#,"' $'$5%2' /)96#"#(9 3# 5%,$5/"'2# ,/(' $% $#$5/(% /7 2'*/3# &%6%3'1#. =/ (% (%7#3 /7 &#0*/.# :5/ 7/)#"E#6' 4&%5&#8'"#6,'+ 2#:'3# 8%E3/ 4&',94E#(9 :5/ (% 2/.9>% "':% 4'$#3/. 2#5%&'(#*#. O$4&#",# 4&#"/4'$3'+ .&%:#,# 9 4&/1%$/&'2# &%6', !"#$%" 7/$5943% '3A/&2#1'(% ' 4&%"/7'*#6,% 9$*9.% ,#/ :5/ $9 Google Search ' Google Translate, /$*#-#(9 $% 3# $5#5'$5'6,' 4&'$594 ("/F%3 4/7#1'2#). N'$5%2' 0#$3/"#3' 3# 4&#"'*'2# $9 7&9.' .*#"3' 5'4 (%0'6,% 5%+3/*/.'(%. ;,$4%&5' '0 *'3."'$5',%, &#693#&$,% *'3."'$5',% ' &#693#&$,% 3#9,% %3,/7'&#(9 .&#2#5'6,9 #3#*'09 (4&#"'*# 4&%"/F%-#) ' $#$5#"E#(9 &%63'6,% *'$5% (*%,$',/3%). @#$3'"#-% $'$5%2# 0#$3/"#3/. 3# 4&#"'*'2# (% "&%2%3$,' "&*/ 0#+5%"3/ ' &#73/ '35%30'"3/. N'$5%2' 0#$3/"#3' 3# 4&#"'*'2# 0#+5%"#(9 ' "'$/,/ $4%1'(#*'0/"#3% %,$4%&5%. S%,' /7 "/7%>'+ $'$5%2# 2#:'3$,/. 4&%"/F%-# 0#$3/"#3'+ 3# 4&#"'*'2# 9 $5#*3/2 $9 &#0"/(9 "%> "':% /7 7"#7%$%5 ./7'3#. ?&%73/$5 $'$5%2# 0#$3/"#3'+ 3# 4&#"'*'2# (% 9 5/2% :5/ %,$4%&5' 2/.9 7%5#E3'(% 7# ,/35&/*':9 /)&#79 (%0',#. =/ /2/.9>9(% $'$5%2#5$,9 ,/&%,1'(9 .&%:#,# 9 $/A5"%&9 ' 7%5#E39 4/"&#539 '3A/&2#1'(9 0# ,/&'$3',#, 4/$%)3/ ,#7# $% $'$5%2 0#$3/"#3 3# 4&#"'*'2# ,/&'$5' 0# 96%-% (%0',#. @)/. A'3#3$'($,'+ /.&#3'6%-#, (%0'6,# 5%+3/*/.'(# 0#$3/"#3# 3# 4&#"'*'2# '0"/7E'"# (% $#2/ 0# .*#"3% (%0',%.

I1) '9)1*) .$2) 3$3."-) /"#$%&$8 ."8*,9,'$/) 031)/)/0 /"#$& *) 39$%)* *)%$* &), :05$.

Serbian in the European Information Soci-ety

12 INTERNAL DRAFT

Serbian in the European Information Society

Opšti podaci Srpski standardni jezik je nacionalni standardni jezik Srba i glavni jezik u Republici Srbiji. Formiran je na osnovici mlaUih ekavskih i ijekavskih štokavskih južnoslovenskih dijalekata u formi koju mu je odredio reformator pisanog jezika kod Srba Vuk KaradžiV (1787-1864), koji je istovremeno reformisao i ViriliWki alfabet i pravopis. U 20. veku, u zajedniWkoj državi Jugoslaviji taj jezik je obuhvaVen nazivom srpskohrvatskim koji je implicirao jeziWko zajedništvo sa Hrvatima (kasnije i drugim narodima Wiji je standarni jezik baziran na štokavskim dijaletima). U poslednjoj deceniji 20. veka umesto naziva srpskohrvatski u Srbiji je u opštoj upotrebi naziv srpski jezik5. Prema popisu stanovništva iz 2002, u Srbiji ima 7 498 001 stanovnika6, a srpski je maternji jezik za 88,3% stanovništva7. Tome treba dodati i stanovništvo srpske nacionalnosti u drugim krajevima bivše SR Jugoslavije (Wiji broj nije lako odrediti). Srpska dijaspora, veVinom nastala odlaskom na rad u inostranstvo i iseljavanjem zbog ekonomskih razloga, živi pre svega u pojedinim zemljama centralne i zapadne Evrope, u SAD, Kanadi i Australiji8 (znanje srpskog jezika najviše je uslovljeno time o kojoj se generaciji iseljenika radi). Srbija je višejeziWna zajednica. Prema popisu iz 2002, nacionalne manjine9 su MaUari (3.91%), Bošnjaci (2.1%), Romi (1.44%), Hrvati (0.94%), Crnogorci (0.92%), Albanci (0.82%), Slovaci (0.79%), Jugosloveni (1.08%) kao i druge manjine (Aškalije, Bugari, Bunjevci, Cincari, Xesi, Goranci, Jevreji, makedonci, Nemci, Muslimani, Rumuni, Rusini, Slovenci, Turci, Ukrajinci i Vlasi, 2.45%). Struktura manjinskog stanovništva prema jeziku je sledeVa: 3,8% maUarski, 1,8% bošnjaWki, 1,1% romski, 0,8% albanski, 0,8% slovaWki, 0,7% vlaški, 0,5% rumunski, 0,4% hrvatski, 0,2% bugarski i 0,2% makedonski. Ostale jezike govori 0,5% stanovnika, dok za 0,8% stanovnika ovi podaci nisu poznati. Za neke manjinske jezike u Srbiji postoji osnovno i srednje obrazovanje, i to za albanski (55 osnovnih/4 srednje škole), maUarski (108/38), bugarski(26/-), rumunski (27/2), rusinski (3/2), slovaWki (15/2), hrvatski (7/1)10. Nastava je praVena i izdavanjem udžbenika i lektire (npr. u 2005 izdato je ukupno 526 udžbenika za osnovnu i 283 za srednju školu11). Službena upotreba jezika12 manjina je ureUena zakonom o službenoj upotrebi jezika i pisama koji

5 The Constitution of the Republic of Serbia from 2006 prescribes: “Serbian language and Cyrillic script shall be in official use in the Republic of Serbia http://www.srbija.gov.rs/cinjenice_o_srbiji/ustav.php?change_lang=en 6 prema Popisu stanovništva iz 2002. 7 prema Izveštaju o humanom razvoju Srbija 2005, UNDP, ISBN: 86-7728-012-x 8 Prema Popisu iz 2002. najviše stanovnika je u NemaWkoj (102799), zatim Austriji (87844) i Švajcarskoj (65751). 9 http://www.ombudsman.rs/pravamanjina/index.php/sr_YU/podaci 10 http://webrzs.stat.gov.rs/WebSite/repository/documents/00/00/18/48/god2010pog22.pdf 11 According to the 2005 UNDP Human Development Report for Serbia, ISBN: 86-7728-012-x, p. 69 12 "Sl. glasnik RS", br. 45/91, 53/93, 67/93, 48/94, 101/2005 - dr. zakon i 30/2010

Serbian in the European Information Soci-ety

13 INTERNAL DRAFT

obezbeUuje da se zakoni i propisi se objavljuju i na jezicima nacionalnih manjina, u skladu sa posebnim zakonom. Ovo ukljuWuje pravo obraVanja republiWkim organima na svom jeziku i pravo da dobiju odgovor na tom jeziku (u zavisnosti od veliWine manjinske zajednice). Prevodi na srpski ili sa srpskog su znaWajna aktivnost. Tokom 2010. godine je prevedeno 2549 naslova (sa engleskog 1438, sa francuskog 215, sa nemackog 170, sa italijanskog 191, sa španskog 74, sa maUarskog 149). Deo prevoda je sa slovenskih jezika (sa ruskog 225, sa Weškog 4, sa poljskog 13, sa slovaWkog 21, sa slovenaWkog 19, sa makedonskog 18, sa bugarskog 12). Sa srpskog na druge jezike je u Srbiji tokom 2010. objavljen 591 naslov.

SpecifiGnosti srpskog jezika Srpski jezik ima svoje specifiWnosti koje Wine njegovu raWunarsku obradu kompleksnim zadatkom. Ove specifiWnosti Ve biti navedene po lingvistiWkim oblastima.

Fonetika, fonologija, morfofonologija Vokalski sistem je jednostavan (5 vokala), a konsonantski relativno kompleksan (25 konsonanata). Vibrant r se u odreUenim pozicijama izgovara kao vokal i funkcioniše kao nosilac sloga (silabem), npr. u reWima prst ili vrsta. U promeni reWi i tvorbi reWi postoji veliki broj fonemskih alternacija (konsonantskih, vokalskih i kombinovanih) koje se u nekim sluWajevima kombinuju na takav naWin da dva oblika jedne reWi mogu biti veoma udaljena, npr. nominativ singulara imenice misao je misao, a instrumental singulara mišlju (alternacije a/!, o/l, l+j/lj/ s/š). Akcenatski sistem od 4 akcenta zasnovan je na dva ukrštena parametra: opozicija po dužini (kratki : dugi) i po tonu (silazni i uzlazni). Distribucija uzlaznih i silaznih akcenata je regulisana posebnim pravilima. U promeni i u tvorbi reWi Weste su akcenatske alternacije. Posšto se akcenatski znaci ne beleže, u pisanom tekstu se javljaju homografi. Na primer, znaWenje reWi luk se razlikuje prema tome da li je akcenat kratkosilazni ili dugosilazni. U dosta reWi i gramatiWkih oblika kodifikovana norma predviUa izgovor postakcenatskih dužina, ali se u uzusu sve manje izgovaraju. Skoro sve reWi su naglašene, ali postoje i klitike: proklitike (veVina veznika i predloga i negacija uz glagol) i enklitike (nenaglašeni oblici zamenica i glagola i upitna partikula li). Izgovor pozajmljenica je fonetski prilagoUen srpskom jeziku. Kombinacije fonema (pre svega konsonanata) u pozajmljenicama Westo odstupa od grupa koje su tipiWne za izvorne štokavske reWi kao u primerima softver, hardver, interfejs. Ima takoUe, naroWito u svakodnevnom uzusu, odstupanja i od normativne distribucije akcenata.

Serbian in the European Information Soci-ety

14 INTERNAL DRAFT

Kod jednog broja leksema i oblika postoje dve varijante izgovora – ekavska i ijekavska – etimološki vezane za nekadašnji vokal zvani jat kao što je pokazano u donjoj tabeli.

ekavski ijekavski

singular cvet (dugo e) cvijet cvet plural cvetovi (kratko e) cvjetovi

Morfologija (vrste reGi, promena reGi, tvorba reGi) Postoji deset vrsta reWi, sa velikim brojem podvrsta. Posebno su kompleksni sistemi zamenica i brojeva. Ne postoji Wlan. Imenice imaju rod kao klasifikacionu kategoriju (muški, ženski ili srednji). Od znaWaja je i klasifikacija prema semantiWkom rodu (muški ili ženski). Na primer, imenica gazda se menja kao imenica ženskog roda ali oznaWava uvek mušku osobu. Glagoli imaju vid kao klasifikacionu kategoriju (svršeni ili nesvršeni). Izvestan broj glagola ima oba vida. Postoji više vrsta takozvanih refleksivnih glagola. 4 tipa imeni!ke

fleksije singular paukal(2-4) plural

prozor (m.) prozor prozora prozori jaje (s.) jaje jajeta jaja žena (ž.) žena žene vest (ž.) vest vesti Postoje tri tipa promene, uglavnom fleksijskog tipa: (a) deklinacija (po broju i padežu kod imenica, po rodu, broju, padežu i pridevskom vidu kod prideva), (b) veoma razvijana konjugacija i (c) kod gradabilnih prideva i priloga – komparacija. Sve promene imaju manji ili veVi broj užih tipova, i izvestan broj izuzetaka. Svude postoje brojne fonemske (i akcenatske) alternacije. A posebno treba istaVi veliki broj podudarnih oblika, tj. obliWki sinkretizam (morfološku homonimiju). Posledica fleksije je da reWniku od 120.000 lema odgovara oko 4.5 milona flektivnih gramatiWkih oblika (ipak nema toliko formalnih reWi jer su neki oblici u pojedinim paradigmama istovetni). LiWne zamenice (ukljuWujuVi i refleksivnu zamenicu) i pomoVni, kopulativni i egzistencijalni glagol „jesam“ i pomoVni glagoli „biti“ i „hteti“ imaju i enklitiWke oblike, koje se WešVe koriste od odgovarajuVih naglašenih oblika. Na primer, mu je enklitiWki oblik od njemu, a to je oblik dativa zamenice on. Kod imenica, glagola i prideva postoji veoma razvijena sufiksalna tvorba reWi. Kod glagola je veoma razvijena i prefiksacija (dobrim

Serbian in the European Information Soci-ety

15 INTERNAL DRAFT

delom povezana i sa aspekatskim znaWenjima). Kompozicija, u celini gledano, manje je razvijena. Postoji puristiWki odnos prema kalkovima i kovanicama, kao i prema tzv. esocentriWnim imeniWkim složenicama, kao neWem što ne spada u autentiWnu štokavsku tvorbu. Ovakav odnos otežava leksiWku i terminološku elaboraciju korišVenjem tvorbe reWi i jedan je od razloga veoma velikog broja pozajmljenica. Pozajmljenice se veVinom uklapaju u postojeVe morfološke i tvorbene tipove, ali od toga ima odstupanja. Na primer, neke strane reWi se ne menjaju kao što su imenice Meri i skvo ili pridevi fer ili braon.

Leksika, frazeologija, terminologija, onomastika Razvijena tvorba reWi (sufiksacija, prefiksacija, u manjoj meri kompozicija i razni kombinovani tvorbeni naWini) Wine da se najveVi broj leksema mogu grupisati u tvorbene porodice odnosno leksikografska gnezda. Tu je posebno važno da jedan deo tvorbenih veza dovodi do sistematske (kategorijalne) modifikacije znaWenja osnovne reWi, što znatno olakšava leksikografsku obradu takvih sluWajeva. Na primer, za reW prst tvori se deminutiv prsti& i augmentativ prstetina i pridevi prstni i prstast, prilog prstasto, itd. Pozajmljenice su u principu fonološki i morfološki adaptirane, tj. prilagoUene izgovoru i morfologiji srpskog jezika. I od njih se obrazuju tvorbene porodice. Sastav leksike odražava, s jedne strane, štokavsku osnovicu, i to ne samo u pogledu originalnog inventara nego i u pogledu novih reWi tvorenih prema novoštokavskim tvorbenima modelima. S druge strane, fond leksema odražava i jeziWku i kulturnu istoriju srpskog naroda, poWev od orijentalizama, slavizama, rusizama, germanizama, hungarizama kojih je bilo u novoštokavskim dijalektima pa do germanizama, rusizama, galicizama, italijanizama i, pogotovu u današnje vreme, anglicizama vezanih za kulturne, ekonomske, nauWne, tehnološke veze i uticaje. Tome treba dodati, pogotovu u struWnim terminologijama, internacionalizme zasnovane na klasiWnim jezicima (grWkom i latinskom) primljene preko velikih evropskih jezika. U oblasti frazeologije posebno treba spomenuti idiomatske izraze, slikovita poreUenja, izreke i sl. koji odražavaju autohtonu imaginaciju i jeziWku kreativnost. S druge strane, veliki broj leksikalizovanih izraza je nastao i nastaje i dalje kalkiranjem stranih izraza, danas pre svega engleskih. Terminologija (i nomenklatura) raznih struka i raznih proizvoda materijalne civilizacije, bilo da se radi o reWima ili o terminološkim sintagmama, dobrim delom se oslanjala i oslanja se i dalje na pojedine strane terminologije, putem prevoUenja ili pozajmljivanja (naroWito kad su u pitanju terminološki internacionalizmi), pri Wemu se ponekad odstupa od zahteva normativne gramatike. Napori da se naUu izvorna

Serbian in the European Information Soci-ety

16 INTERNAL DRAFT

srpska rešenja ili da se postojeVi termini srbiziraju imaju odreUene rezultate, ali ne mogu da idu u korak sa sve veVim potreba u oblasti terminologije i nomenklature. Otuda je Westo prevoUenje ili adaptirano pozajmljivanje, koje na strukturnom planu ume da prevaziUe autohtone tvorbene modele. Onomastika (antroponimija, hidronimija, oronimija, ojkonimija itd.) predstavlja važan segment vokabulara srpskog jezika, utoliko više što se i ovde stvaraju tvorbene porodice reWi.

Sintaksa, lingvistika teksta Što se tiWe rasporeda reWeniWnih konstituenta (subjekta, predikata, objekta itd.), srpski jezik spada u tzv. SVO jezike sa slobodnim redom reWi (taWnije reWeno: sa slobodnim rasporeUivanjem reWeniWnih konstituenata). To znaWi da su u principu sve permutacije reWeniWnih konstituenata dozvoljene, a da je preferentni raspored: subjekat – predikat – objekat. MeUutim, slobodan ne znaWi i anarhiWan; naprotiv, izbor konkretnog rasporeda je regulisan kombinacijama razliWitih sintaksiWkih, semantiWkih, pragmatiWkih i stilskih faktora, tj. ma koliko raznovrsni, rasporedi Wine jedan veoma kompleksan funkcionalni sistem. Posmatrajmo reWenicu na engleskom:

! Mary gave John an apple. [Marija dade Jovanu jabuku.]

In Serbian, this idea can be expressed in 24 = 4! = 1*2*3*4 (number of permutations of four words) different ways

U srpskom se ova misao može izraziti na 24 = 4! = 1*2*3*4 (broj permutacija od Wetiri reWi) razliWitih naWina

! Marija dade Jovanu jabuku. ! Marija dade jabuku Jovanu. ! Marija Jovanu dade jabuku. ! Marija jabuku dade Jovanu. ! Jovanu dade Marija jabuku. ! Jovanu Marija dade jubuku. ! Jabuku Marija dade Jovanu. ! Jabuku Jovanu dade Marija. ! Dade Marija jabuku Jovanu. ! Dade Jovanu jabuku Marija, etc.

Pojedini konstituenti se iskazuju i enklitikama, koje se rasporeUuju na sasvim specifiWan naWin. ZameniWki subjekat se ne mora iskazati, nego se može samo podrazumevati (tzv. nulti subjekat). Na primer, Ja se zovem Marko

Serbian in the European Information Soci-ety

17 INTERNAL DRAFT

prema Zovem se Marko. ZnaWajan broj reWeniWnih obrazaca je formiran sa raznim tipovima semantiWkih subjekata. Pored aktiva i pasiva, postoji i specijalan naWin formulisanja reWenice sa nespecifikovanim humanim subjektom. Negacija se primenjuje i na glagol i na zameniWki konstituent (tzv. dvostruka negacija), npr. Ovde ne poznajem nikog. U srpskom postoji sedam padeža: niominativ, genitiv, dativ, akuzativ, vokativ, instrumental i lokatov.

primer imeni!ke deklinacije

singular paukal plural

nominativ prozor prozori genitiv prozora prozora prozora dativ prozoru prozorimaakuzativ prozor prozora prozore vokativ prozore prozora prozori instrumental prozorom prozorimalokativ prozoru prozorima

U srpskom jeziku postoji pet zavisnih padeža, koji se svi kombinuju i sa predlozima. Svi ti padeži i predloško-padežne kombinacije su polisemiWni. I obrnuto, isto znaWenje se Westo može iskazati razliWitim padežima odnosno predloško-padežnim konstrukcijama (padežna sinonimija). Ovome treba dodati i veliki broj predloških izraza koji se kombinuju sa zavisnim padežima. U srpskom jeziku postoji razvijen sistem liWnih glagolskih oblika za iskazivanje vremenskih i modalnih znaWenja (aspekat je klasifikaciona kategorija); svi ti oblici su polisemiWni; a s druge strane, postoji odreUeni alternativne upotrebe liWnih glagolskih oblika. Jedna od specifiWnosti glagolskog sistema je da konstrukcija da + prezent sve više istiskuje infinitiv. Kongruencija u rodu, broju, padežu i licu je jedan od bitnih aspekata sintakse srpskog jezika, a znaWajna i za uspostavljanje tekstualnih veza. Kategorizacija kontrolora kongruencije (naroWito pojedinih tipova imenica, konstrukcija sa brojevima i koordiniranih izraza), kao i naWini na koji se ta kontrola ispoljava u raznim kongruentnim pozicijama predstavlja izuzetno kompleksno podruWje VeVina tipovi zavisnih reWenica (naroWito odnosne, vremenske, uslovne i uzroWne) imaju više formalnih i semantiWkih podtipova. Kod koordinativnih reWenica posebno je kompleksan inventar veznika za kopulativne i za adverzativne odnose. Veze meUu iskazima u tekstu se uspostavljaju tekstualnim koordinatorima i tekstualnim konektorima raznih vrsta. Izbor

Serbian in the European Information Soci-ety

18 INTERNAL DRAFT

rasporeda reWeniWnih konstituenata važan je za informativnu koherenciju i progresiju, s jedne, a za emfazu i isticanje, s druge strane. Tzv. nulti subjekat i enklitiWki zameniWki oblica su važna sredstva za kontekstualizaciju reWenica.

Pravopis (alfabet, tip pravopisa, interpunkcija, pisanje stranih reGi) Tradicionalni srpski alfabet je Virilica, koju Wini 30 grafema. Danas se koristi – sve više – i latinica. Ona takoUe ima 30 grafema (tri od njih su digrami), koji su u bijektivnom odnosu sa ViriliWkim grafemama. MeUutim, zvaniWno pismo je samo Virilica.

Srpska slova

"irilica A #

$ %

& ' ( ) * + , - . / 0 1

2 3 4 5

6 7 8 9:;< =

> ?

latinica @ a

B b

V v

G g

D d A B

E e

Ž ž

Z z

I i

J j

K k

L l

Lj lj

M m

"irilica C D E F

G H I J K L M N O P Q R

S T U V

W X YZ[\

] ^

_`

latinica N n

Nj nj

O o

P p

R r

S s

T t a "

U u

F f

H h

C cb !

Dždž

Š š

Što se tiWe grafije (odnosa grafemskog i fonemskog sistema), grafeme i foneme stoje u bijektivnom odnosu. Na nivou kodnih shema, latiniWni digrafi lj, nj, dž mogu biti kodirani kao ligature ili kao digrafi. U prvom sluWaju, Unicode obezbeUuje, na primer, posebno kodove za ligature LJ, Lj i lj koji su u sluWaju digrafa predstavljeni kao kombinacija dva ASCII koda, npr. L i J. Ovo vodi u problemesa transliteracijom koja se, u opštem sluWaju, može izvršiti automatski u veVini sluWajeva. Na primer, svaki Wlanak na srpskoj Vikipediji se može prikazati i ViriliWnim i latiniWnim pismom. LatiniWna azbuka u srpskom ne predviUa upotrebu latinskih karaktera q, x, y, w niti latiniWnih karaktera za zamisivanje rimskih brojeva što može da dovede do degradacije informacije prilikom transliteracije iz latinie u Virilicu. Tako, na primer, www može postati ''', a latiniWno Petar II može postati ()*$+ ,, umesto ()*$+ II. Obe azbuke se koriste u savremenoj izdavaWkoj produkciji. Prema po-dacima iz Nardone biblioteke Srbije, tokom 2010. objavljeno su ukupno 12574 knjige. Od tog broja, 6459 je na Virilici, 6050 na latinici, a 65 u drugim alfabetima. MeUu dnevnim listovima sa širokim krugom Witalaca, Politika i Ve-ernje novosti izlaze na Virilici, dok je veVina drugih listova (Blic, Kurir, Danas,...) na latinici. The orthography is of a quasiphonemic type: with a few exceptions, the word is written the same way it is pronounced (the rule: “Write as you speak!”), more precisely, according to its phonemic composition.

Serbian in the European Information Soci-ety

19 INTERNAL DRAFT

Pravopis je (kvazi)fonemskog tipa: sa malim izuzecima, reW se piše onako kako se izgovara (pravilo: “Piši kao što govoriš!”), taWnije reWeno, prema svom fonemskom sastavu. Interpunkcija je logiWkog, a ne gramatiWkog tipa (sliWna francuskoj i engleskoj). Prema pravopisu, strane reWi se i Virilicom i latinicom pišu onako kako se izgovaraju, tj. transkribovano. I strana imena se takoUe transkribuju (npr. umesto Shakespeare piše se, i izgovara, Šekspir.

Srpski i drugi standardni jezici štokavske provenijencije ZajedniWka štokavska osnovica, meUusobni uticaji i koegzistencija u okviru iste države i – konceptualno – u okviru zajedniWkog srpskohrvatskog jezika Wine da za raWunarsku obradu drugih jezikâ štokavske provenijencije (hrvatskog, bošnjaWkog, crnogorskog) treba razrešiti sliWne probleme. To otvara velike moguVnosti za sinergiju ili bar za produktivnu saradnju, kao racionalan i ekonomiWan naWin rešavanja zajedniWkih problema. Tome doprinosi i postojanje znatnih jeziWkih resursa za nekadašnji zajedniWki srpskohrvatski jezik (gramatike i reWnici), u kojima, istina, nije poklanjana dužna pažnja diferencijacijama unutar štokavskog standardnojeziWkog prostora. Ukupno uzev, ovde se ne radi o prevoUenju tekstova s jednog stranog jezika na drugi, nego o adaptiranju tekstova sastavljenih na jezicima sa istom dijalekatskom osnovicom i sa tesno povezanim razvojem. Glavni problemi se, u stvari, tiWu pojava vezanih za elaboraciju štokavskog jezgra i, posebno, za terminologiju.

Recent developments Promene krajem dvadesetog i poWetkom dvadeset prvog veka obuhvataju sledeVe Umesto zajedniWkog srpskohrvatskog tandardnog jezika sada zvaniWno postoje Wetiri nacionalna standardna jezika. Konkretno, u Srbiji je sada zvaniWni jezik srpski, a ne više srpskohrvatski. Zbog nedavnih seoba izazvanih ratnim zbivanjima delimiWno je promenjena dijalekatska slika u Hrvatskoj i Bosni i Hercegovini (u podruWjima zahvaVenim ratnim zbivanjima). UoWavaju se sve veVe promene u leksici i frazeologiji i u terminologiji, vezane za politiWke, društvene i ekonomske promene u Srbiji i otvaranje prema svetu, ali i za usklaUivanje zakonodavstva, standarda i terminologije sa zakonodavstvom, standardima i terminologijom koji važe u Evropskoj Uniji. Posebno se uoWava uticaj engleskog jezika, i to ne samo zbog kulturoloških i ekonomskih momenata koji važe i za druge evropske zemlje nego i zato što se za usklaUivanje sa Evropskom Unijom kao izvornici uzimaju tekstovi/verzije na engleskom jeziku. Latinica se sve više upotrebljava (sem u zvaniWnim tekstovima).

Serbian in the European Information Soci-ety

20 INTERNAL DRAFT

Tekstovi na srpskom jeziku se sve više realizuju u digitalnom obliku (upotreba raWunara, elektronsko izdavaštvo, internet, SMS-poruke).

Negovanje jezika u Srbiji Rad na normiranju i negovanju jezika u skladu sa novim zvaniGnim identitetom jezika 1997. stvoreno je meUuakademijsko i meUuuniverzitetskog telo pod nazivom Odbor za standardizaciju srpskog jezika, u kome su predstavnici odgovarajuVih institucija iz Srbije, Crne Gore i Republike srpske (u BiH).

Umesto ranije opšte srpskohrvatske norme, sada se specifikuje norma srpskog jezika. Nema purizma u odnosu na kroatizme (reWi preuzete iz hrvatskog uzusa). IzraUen je pravopis srpskog jezika.

Podržava se upotreba Virilice, koja se smatra ugroženom sve veVom upotrebom latinice, naroWito kod mlaUih generacija. Nastavni programi i udžbenici u osnovnoj i srednjoj školi usklaUeni su sa novom standardnojeziWkom situacijom.

Osavremenjavanje norme Odbor za standardizaciju srpskog jezika je organizovao izradu serije opisno-normativnih monografija koje treba da prikažu savremeno stanje jezika i ponude normativna rešenja (dosada su obraUene: tvorba reWi i fonologija). Donete je veVi broj normativnih preporuka. ZvaniWni pravopis je dva puta osavremenjenjivan

Negovanje jeziGkog uzusa Odbor za standardizaciju srpskog jezika (svojim preporukama), Društvo za srpski jezik i književnost (publikacijama i organizovanjem takmiWenja iz srpskog jezika za uWenike osnovnih i srednjih škola), Marica srpska (organizovanjem rada na izradi pravopisa, svojim publikacijama i organizovanjem savetovanja o jeziku), Vukova zadužbina (svojim publikacijama i organizovanjem tribina i savetovanja o jeziku) i razne druge institucije, pojedine izdavaWke kuVe i redakcije dnevnih listova i redakcije radio i TV programa, kao i jeziWki struWnjaci i ljubitelji maternjeg trude se da daju svoj doprinos Wuvanju pravilnosti i Wistote srpskog jezika u pisanoj i usmenoj upotrebi.

Odgovor na sve veHi uticaj engleskog jezika IstiWe se potreba za zamenjivanjem engleskih reWi i izraza srpskim, kao i kalkiranih prevoda sa engleskog (autentiWnim) srpskim reWima i izrazima. (Šire uzev, ovde spada i otpor sve veVoj upotrebi latinice.)

Poboljšanje stanja u oblasti leksikografije

Serbian in the European Information Soci-ety

21 INTERNAL DRAFT

Poklanja se sve veVa pažnja lekikografiji, jednojeziWnoj i dvojeziWnoj. IzraUen je veliki jednotomni reWnik savremenog srpskog jezika, za kojim se oseVala velika potreba. Modernizuje se rad na izradi velikog akademijskog reWnika srpskog jezika.

U cilju usklaUivanja terminologije i nomenklature proizvoda sa rešenjima usvojenim u Evropskoj uniji, Zavod za standardizaciju ubrzano prevodi standarde iu Evropske unije. TakoUe se prevode zakoni i propisi koji važe u Evropskoj uniji

Jezik i obrazovanje Predmet Srpski jezik i književnost je jedan od bitnih predmeta u osnovnoj i srednjoj školi. MeUutim, nastava je koncentrisana na korektno pisanje i govor, znanje o jeziku (o gramatici i leksici), znanje o istoriji književnih (pisanih) jezika kod Srba i o postanku srpskog standardnog jezika. Na ovakvu nastavu su usmerena masovna takmiWenja iz maternjeg jezika (poWev od viših razreda osnovne škole). Nedovoljno pažnje se poklanja praktiWnoj upotrebi jezika i funkcionalnoj pismenosti. Želja da se nastava po svojim ciljevima i standardima približi nastavi u Evropskoj zajednici, kao i nezadovoljavajuVi uspesi uWenika na PISA testiranju predstavljaju podsticaje za modernizacijom nastave jezika i za insistiranje na funkcionalnoj pismenosti i komunikcionim sposobnostima. To se odražava i na tekuVoj reformi školstva (ciljevi nastave jezika, standardi postignuVa, silabusi), kao i na poboljšanju kvaliteta udžbenika. Na fakultetima uglavnom nedostaju kursevi iz srpskog jezika koji bi sistematski osposobljavali buduVe struWnjake za uspešnu profesionalnu komunikaciju i odgovarajuVu funkcionalnu pismenost. Primena jeziWkih tehnologija svakako može doprineti modernizovanju nastave, npr. programi na raWunaru computer-assisted language learning (CALL) systems.

MeIunarodni aspekti Upotreba i nastava srpskog jezika za delove srpskog naroda koji živi u susednim zemljama regulisana je zakonodavstvom tih država. Nestanak zajedniWkog srpskohrvatskog jezika i zvaniWno postojanje posebnih jezika štokavske provenijencije odrazilo se na organizaciju nastave nekadašnjeg srpskohrvatskog jezika, kao i na nazive odseka na kojima se držala ta nastava: sada za te jezika, dakle i za srpski jezik (i književnost), postoje posebni programi i diplome, sa veVim ili manjim kombinovanjem predmeta, a odseci imaju zbirne nazive. U Srbiji se nastavlja praksa organizovanja letnjih škola za strance, ali sada za srpski, a ne srpskohrvatski jezik. TakoUe se šalju domaVi struWnjaci da rade kao lektori na katedrama u inostranstvu.

Serbian in the European Information Soci-ety

22 INTERNAL DRAFT

Za decu srpskog porekla organizuju se u pojedinim zemljama dodatna nastava iz maternjeg jezika. Potreba usklaUivanja zakonodavstva i terminologije sa onim u Evropskoj uniji, uticaj anglo-ameriWke kulture u oblasti zabave i medija i opšta atmosfera globalizacije sve jaWe dovode srpski jezik u vezu sa drugim jezicima, naroWito engleskim, i daje prevoUilaštvu sve veVi podsticaj i znaWaj.

Srpski jezik na internetu A survey13 from 2010 showed that 50.8% of the population uses the computer and Internet on a regular basis, whereas 43.7% of the popu-lation never used a computer. According to another source,14 as much as 55.9% of the population uses Internet with an increase rate of 926.8% in the period 2000-2010. According to the same source, there were 2,237,680 Facebook users in Serbia on August 31 2010 which represents 30.5% of the total population. Public services (e-government) are used by only 13.2% of the population, whereas 38.5% claimed they would never use such services. Trading via Internet has been used by only 13% of the population. According to the Statistical Office of the Republic of Serbia15 the usage of ICT equipment shows the following growth: Anketa16 izvršena 2010. godine govori da 50.8% stanovništva redovno koristi raWunar i Internet, a 43.7% stanovništva nikada nije koristilo raWunar. Usluge javne administarcije (e-gouvernment) koristi svega 13.2% stanovništva, dok 38.5% ne bi nikada koristilo ove usluge. Trgo-vinu preko Interneta koristilo je svega 13% stanovništva. Prema istom izvoru, the number of firms using Internet is 96.8% in 2010 (compared to 90.2% in 2006); the number of firms with their web site is 67.5% in 2010 (compared to 52.9% in 2006). U 2010. godini, 70.6% koristi us-luge e-gouvernment-a.

Doma"instva u Srbiji imaju 2006 2010 raWunar 26.5% 50.4%laptop 1.5% 11.2% pristup Internetu 18.5% 39% kablovska TV 30.2% 42.6%mobilni telefon 71.2% 82%

According to the same source, the number of companies using Internet was 96.8% in 2010 (compared to 90.2% in 2006); the number of com-panies having their own web site was 67.5% in 2010 (compared to 52.9% in 2006). In 2010, 70.6% of them used e-government services.

13 http://webrzs.stat.gov.rs/WebSite/Public/PageView.aspx?pKey=204 14 http://www.internetworldstats.com/europa2.htm#rs 15 http://webrzs.stat.gov.rs/WebSite/ 16 http://webrzs.stat.gov.rs/WebSite/Public/PageView.aspx?pKey=204

Serbian in the European Information Soci-ety

23 INTERNAL DRAFT

The data of the Statistical Office of the Republic of Serbia (RZS) from a last year survey on a sample of 2,400 households and the same number of individuals aged from 16 to 74, show that 39% of respondents have an Internet connection, the highest percentage of 51% being in Bel-grade.17 Access to internet is income dependent, as 83% of households with a monthly income over 600 euro have Internet, while for house-holds with monthly income less than 300 euro the percentage de-creases to only 29%. The majority of population accesses the global web from desktop computers, one fifth from cell phones, and a little less from laptops. As for connection type, almost one half of the households in Serbia have an ADSL connection, one quarter have cable internet, whereas 29% of the respondents use mobile devices for connection. In the ma-jority of cases access is from home (84%), then from work, from some other person’s home, from school or university, and as little as 3,8% from internet cafes. Students are the most largely represented category on the web, with as much as 95%. Other than for business purposes, internet is most commonly used for e-mail (78%), then for entertain-ment (games, movies, music – 55%), for reading electronic press (41%) and for learning (23%). The most popular web sites on the Serbian part of the Internet are Ser-bian news portals (Blic,18 B92,19 Naslovi,20 RTS21). The most visited domestic portal is Krstarica22 which includes a search engine, up-to-date daily news from Serbia, a directory of local sites grouped by topics and a variety of other content. An experiment initiated in 2005 with the introduction of a local search engine Pogodak, where the search was adjusted to morphology of Serbian, ended in 2010 as unprofitable. Podaci RepubliWkog zavoda za statistiku (RZS) iz prošlogodišnjeg is-traživanja na uzorku od 2.400 domaVinstava i isto toliko pojedinaca starosti od 16 do 74 godine pokazuju da internet prikljuWak ima 39 od-sto anketiranih, najviše u Beogradu – 51 odsto23. Da pristup globalnoj mreži ne zavisi samo od tehniWkih moguVnosti, nego i od zarade, vidi se iz podatka da 83 odsto domaVinstava sa me-seWnim prihodima višim od 600 evra ima internet, dok ga kod onih sa primanjima nižim od 300 evra ima 29 odsto domaVinstava. Najviše ljudi, 91 odsto, svetskoj mreži pristupa sa desktop raWunara, petina sa mobilnog telefona, a nešto manje od toga sa laptopa. Kad je reW o tipu veze, skoro polovina domaVinstava u Srbiji ima ADSL 17 http://webrzs.stat.gov.rs/WebSite/repository/documents/00/00/10/40/PressICT2010.pdf 18 http://www.blic.rs/ 19 http://www.b92.net/ 20 http://www.naslovi.net/ 21 http://www.rts.rs/ 22 http://www.krstarica.com/ 23 http://webrzs.stat.gov.rs/WebSite/repository/documents/00/00/10/40/PressICT2010.pdf

Serbian in the European Information Soci-ety

24 INTERNAL DRAFT

prikljuWak, Wetvrtina kablovski internet, a mobilne ureUaje za povezivanje koristi 29 odsto ispitanika. NajWešVe se pristupa od kuVe (84 odsto), zatim s posla, od kuVe druge osobe, u školi i na fakultetu, i tek 3,8 odsto iz internet kafea. Najzastupljenija kategorija na mreži su studenti, Wak 95 odsto. Ako nije reW o poslovnim obavezama, internet se najviše, 78 odsto, koristi za elektronsku poštu, zatim za zabavu (igre, filmovi, muzika) 55 odsto, za Witanje štampe 41 odsto i za uWenje 23 odsto. Serbian Wikipedia represents a source of various language data. It con-tains a little more than 142,000 articles, an it holds the 28th position24 in the world regarding the number of articles. The alternative Wikipe-dia in Serbo-Croatian25 is smaller and contains about 40,000 articles. Free content language data projects can also be found within the por-tals Rastko,26 Antologija srpske književnosti27 (Antology of Serbian Literature) and Transpoetika28 where primarily literary texts are stored. Vikipedija na srpskom predtzsvalja izvor raznovrsnih jeziWkih po-dataka. Ona sadrži oko 142.000 Wlanaka i nalazi se na 28. mestu u svetu u pogledu broja objavljenih Wlanaka. Vikipedija na srpsko-hrvatskom29 je manja i ima oko 42.000 Wlanaka.Slobodan pristup jeziWkim podacima je moguV i preko portala Rastko,30 Antologija srpske književnosti31 i Transpoetika koji sadrže uglavnom književne tekstove. The visibility of a number of pages with content in Serbian has dra-matically fallen during 2010, due to the change of the domain from .yu to .rs. Vidljivost pojedinih strana sa sadržajem na srpskom je dramatiWno privremeno pala tokom 2010. kao posledica prelaska sa top-domena yu na rs. The most commonly used web application is web search, which in-volves automatic processing of language on multiple levels, as will be described in more detail in the second part of this paper. It involves sophisticated Language Technology, differing for each language. For Serbian, as we have already mentioned, the problem arises from the relation between Cyrillic and Latin alphabet, ekavian and ijekavian dialects, graphemic variations in the form of the lemma, as well as morphological richness.

24 Wikipedia metadata: http://meta.wikimedia.org/wiki/List_of_Wikipedias. 25 http://sh.wikipedia.org/ 26 http://www.rastko.rs/ 27 http://www.ask.rs/ 28 http://transpoetika.org/ 29 http://sh.wikipedia.org/ 30 http://www.rastko.rs/ 31 http://www.ask.rs/

Serbian in the European Information Soci-ety

25 INTERNAL DRAFT

Internet users and providers of web content can also profit from Lan-guage Technology in less obvious ways, e.g., if it is used to automati-cally translate web contents from one language into another. Consider-ing the high costs associated with manually translating these contents, comparatively little usable Language Technology is developed and ap-plied, compared to the anticipated need. This may be due to the com-plexity of Serbian and the number of technologies involved in typical Language Technology applications. In the next chapter, we will present an introduction to Language Technology and its core application areas as well as an evaluation of the current situation of Language Technol-ogy support for Serbian.

Serbian in the European Information Soci-ety

26 INTERNAL DRAFT

Izabrana literatura

o Enciklopedija Jugoslavije, knj. 6, Zagreb: Jugoslavenski leksikografski zavod, 1990, str. 48-94.

o IviV, Pavle, Srpski narod i njegov jezik, Beograd: Srpska književna zadruga, 1971.

o Piper, Predrag, Srpski izme.u velikih i malih jezika, III izd., Beograd: Beogradska knjiga, 2010.

o PopoviV Ljubomir, Od srpskohrvatskog do srpskog i hrvatskog standardnog jezika: srpska i hrvatska verzija, in: G. Neweklowsky (ed), Bosanski-Hrvatski-Srpski / Bosnisch-Kroatisch-Serbisch, Aktuelna pitanja jezika Bosnjaka, Hrvata, Srba i Crnogoraca (=Wiener slawistischer Almanach, 57), Wien, 2003. 201-224

o PopoviV Ljubomir, From standard Serbian through standard Serbo-Croatian to standard Serbian, in: Ranko Bugarski and Celia Hawkesworth (eds), Language in the former Yugoslav lands, Bloomington, Indiana: Slavica, 2004, 25-40.

o RadovanoviV, Milorad (ed.), Srpski jezik na kraju veka, Beo-grad: Institut za srpski jezik SANU – Službeni glasnik, 1996.

o Cvetana Krstev, Processing of Serbian – Automata, Texts and Electronic dictionaries Faculty of Philology, University of Bel-grade, Belgrade, 2008.

Language Technology Support for Serbian

27 INTERNAL DRAFT

5)1(2,0 8"#$:,$- 3";.)<)6$80-0 #0 +(*+,$ 8"#$,

!"#$:," 3";.)<)6$8" <%0'6,% 5%+3/*/.'(% $9 '3A/&2#1'/3% 5%+3/*/.'(% ,/(% $9 $4%1'(#*'0/"#3% 0# &#7 $# (%0'1'2# E97'. @)/. 5/.# $% /"% 5%+3/*/.'(% 6%$5/ 4/7"/7% 4/7 5%&2'3 „5%+3/*/.'(# E97$,'+ (%0',#“. Y97$,' (%0'1' $% (#"E#(9 9 ./"/&3/2 ' 4'$#3/2 /)*',9. O#,/ (% ./"/& 3#($5#&'(' ' 3#(4&'&/73'(' 3#6'3 (%0'6,% ,/293',#1'(%, ,/24*%,$3% '3A/&2#1'(% ' $"%/)9+"#53/ E97$,/ 03#-% $% )%*%8% ' 4&%3/$% 9 4'$#3/2 /)*',9. P/"/&3% ' 5%,$59#*3% 5+3/*/.'(% /)&#F9(9 ' 4&/'0"/7% (%0', 9 /"# 7"# /)*',#. B*' (%0', '2# #$4%,5% ,/(' $9 0#(%73'6,' /)#2# A/&2#2#, ,#/ :5/ $9 &%63'1', "%>' 7%/ .&#2#5',% ' 03#6%-% &%6%3'1#. ?&%2# 5/2%, 3#("%>' 7%/ (%0'6,'+ 5%+3/*/.'(% 3% 2/8% $% 4/7"%$5' '$,E96'"/ 4/7 ./"/&3%, /73/$3/, 5%,$59#*3% 5%+3/*/.'(%. =%+3/*/.'(% 03#-# 9,E969(9 5%+3/*/.'(% ,/(% 4/"%09(9 (%0', $# 03#-%2. Figure 1 '*9$5&9(% 4%'$#8 (%0'6,'+ 5%+3/*/.'(#. L#7# ,/293'1'&#2/ 2' 2%:#2/ (%0', $# 7&9.'2 3#6'3'2# ,/293',#1'(% ' 7&9.'2 '3A/&2#1'/3'2 2%7'('2#. M' ,/2)'39(%2/ ./"/& $# .%$5',9*#1'(/2 ' '0&#0'2# *'1#. =%,$5/"' 2/.9 7# $#7&8% ' $*',% ' 0"9,. T'*2/"' 2/.9 7# $#7&8% (%0', ' 9 ./"/&3/2 ' 9 4'$#3/2 /)*',9. ?&%2# 5/2%, ./"/&3% ' 5%,$59#*3% 5%+3/*/.'(% $% 4&%,*#4#(9 ' $#&#F9(9 $# 23/.'2 7&9.'2 5%+3/*/.'(#2# ,/(% /*#,:#"#(9 /)&#79 29*5'2/7#*3% ,/293',#1'(% ' 29*5'2%7'(#*3'+ 7/,92%3#5#.

@(;$3",3&(" 0*<$,0%$80 8"#$:,$; 3";.)<)6$80 ='4'63% $/A5"%&$,% #4*',#'1(% 0# /)&#79 (%0',# $#$5/(% $% /7 3%,/*',/ ,/24/3%3#5# ,/(% /7&#8#"#(9 &#0*'6'5% #$4%,5% (%0',# ' 0#7#5#,# 3# ,/(% $% 4&'2%-9(9. Figure 2 4&',#09(% "%/2# 4/(%73/$5#"E%39 #&+'5%,59&9 3# ,/(9 $% 2/8% 3#'>' 9 $'$5%29 0# /)&#79 5%,$5#. ?&"# 5&' 2/79*# $% )#"% $5&9,59&/2 ' 03#6%-%2 9*#03/. 5%,$5#:

! ?&%5+/73# /)&#7#: 6':>%-% 4/7#5#,#, 9,*#-#-% A&/2#5'&#-#, /5,&'"#-% 9*#03/. (%0',# ' 3#6'3# ,/7'&#-#, ' 5#,/ 7#E%.

! P&#2#5'6,# #3#*'0#: 4&/3#*#8%-% .*#./*# ' -%./"'+ /)(%,#5#, 2/7'A',#5/&#, '57; /5,&'"#-% $5&9,59&% &%6%3'1%.

! N%2#35'6,# #3#*'0#: &#0&%:#"#-% "':%03#63/$5' (L/(% 03#6%-% apple (% 4&#"/ 9 7#5/2 ,/35%,$59?), &#0&%:#"#-% #3#A/&# ' &%A%&%31' 9 '0&#0'2# ,#/ :5/ $9 she, the car, '57; 4&%7$5#"E#-% 03#6%-# 9 2#:'3$,' :'5E'"/2 /)*',9.

M/79*' 4/$"%>%3' $4%1'A'63'2 0#7#1'2# 0#5'2 /)#"E#(9 23/.% &#0*'6'5% /4%&#1'(% ,#/ :5/ $9 #95/2#5$,# $92#&'0#1'(# 3%,/. 9*#03/. 5%,$5#, 4&%5&#8'"#-% )#0# 4/7#5#,# ' 23/.% 7&9.%. H 7#E%2 7%*9 5%,$5# >%2/ '*9$5&/"#5' /$3/"3% 7/2%3% 4&'2%3' ' '$5#>' -'+/"% /$3/"3% 2/79*%. =&%)# 4/34/"'5' 7# $9 #&+'5%,59&% #4*',#1'(# "%/2# 4/(%73/$5#"E%3% ' '7%#*'0/"#3% 7# )' $% 4/,#0#*# $*/8%3/$5 #4*',#1'(# (%0'6,'+ 5%+3/*/.'(# 3# &#092E'" 3#6'3.

Figure 1: E"$3)F /"#$%&$8 ."8*,9,'$/)

Input Text

Pre-processing

Grammatical Analysis

Semantic Analysis

Task-Specific Modules

Output

Figure 2: J$2$%*) )(8$."&.0() )29$&)+$/" #) ,G()50 ."&3.)

Language Technology Support for Serbian

28 INTERNAL DRAFT

?/:5/ 9"%7%2/ /$3/"3# 4/E# 4&'2%3% 7#>%2/ ,&#5#, 4&%.*%7 $5#-# 9 '$5&#8'"#-9 ' /)&#0/"#-9 0# (%0'6,% 5%+3/*/.'(%, # 0#,E96'>%2/ $# 4&%.*%7/2 4&/:*'+ ' 5%,9>'+ '$5&#8'"#6,'+ 4&/.&#2#. S# ,&#(9 /"/. /7%E,#, 4&%7$5#"'>%2/ ,#,/ 4/ 4&/1%3#2# $5&96-#,# '0.*%7# 4/0'1'(# /$3/"3'+ (%0'6,'+ #*#5# ' &%$9&$# 9 4&/$5/&9 6'(% 7'2%30'(% 2%&% 7/$5943/$5, 0&%*/$5, ,"#*'5%5 ' $*'63/. !"# 5#)%*# 7#(% 7/)#& 4&%.*%7 $5#-# (%0'6,'+ 5%+3/*/.'(# 0# $&4$,' (%0',. S#("#83'(' #*#5' ' &%$9&$' / ,/('2# $% ./"/&' $9 9 5%,$59 4/7"96%3' # 2/.9 $% 3#>' ' 9 5#)%*' 3# ,&#(9 4/.*#"E#.

J+.)'.0 *)B0 *($-"." 5()'"(0 8"#$,0 N"#,/ ,/ ,/&'$5' #*#5 0# /)&#79 &%6' ,#,#" (% Microsoft Word (% 3#':#/ 3# ,/24/3%359 0# 4&/"%&9 ,/(# 9,#09(% 3# .&%:,% 9 4&#"/4'$9 ' 397' '$4&#",%. D#3#$, 40 ./7'3# 4/$*% 4&"/. 4&/.&#2# 0# ,/&%,1'(9 4&#"/4'$# C#*A# P/&'3#, 4&/.&#2' 0# 4&/"%&9 (%0',# 3% &#7% 5#*,/ :5/ (%73/$5#"3/ 4/&%7% *'$59 &%6' '0"#F%3'+ '0 5%,$5# $# &%63',/2 4&#"'*3/ '$4'$#3'+ &%6'6 3#4&/5'", /3' $9 4/$5#*' "%/2# 3#4&%73'. !$'2 :5/ $9 9 -'+ 9.&#F%3' 3%,' (%0'6,' 0#"'$3' #*./&'52' ,/(' &9,9(9 2/&A/*/.'(/2 (34&, /)*'1' 23/8'3%) 3%,' /7 -'+ 2/.9 7# 4&%4/03#(9 .&%:,% "%0#3% 0# $'35#,$9, ,#/ :5/ (% 3%7/$5#(#-% .*#./*#, 3%$*#.#-% .*#./*# $# $9)(%,5/2 4/ *'19 ' )&/(9, 3# 4&'2%& 9 ‘She *write a letter.’ ?# '4#,, "%>'3# 4&/.&#2# 0# 4&/"%&9 4&#"/4'$#, 9,E969(9>' ' Microsoft Word) 3%>% 4&/3#>' .&%:,% 9 4&"/( $5&/A' 4%$%2% Z%&/*7# @#&# (Jerrold H. Zar) '0 1992:

Eye have a spelling chequer, It came with my Pea Sea. It plane lee marks four my revue Miss Steaks I can knot sea.

D# )' $% 2/.*% 9/:'5' /"#,"% .&%:,% 9 23/.'2 $*96#(%"'2# (% 4/5&%)3# ' #3#*'0# ,/35%,$5#, 3# 4&'2%& 7# )' $% 95"&7'*/ 7# *' &%6 2/&# 7# )97% 3#4'$#3# "%*','2 $*/"/2, ,#/ 9 $*%7%>%2 4&'2%&9:

Divio se Ruži. [He admired Rose.] Divio se ruži. [He admired the rose.]

D# )' $% /"/ 4/$5'.*/ 5&%)# '*' 7# $% A/&29*':9 .&#2#5'6,# 4&#"'*# $4%1'A'63# 0# (%0',, :5/ 0#+5%"# 23/./ &#7# "&+93$,'+ $5&96-#,#, '*' 7# $% ,/&'$5% 5#,/0"#3' $5#5'$5'6,' (%0'6,' 2/7%*'. =#,"' 2/7%*' &#693#(9 ,/*',# (% "%&/"#53/># 7# $% /7&%F%3# &%6 4/(#"' 9 $4%1'A'63/2 /,&98%-9 (34&, '$4&%7 '*' '0# 3%,'+ &%6'). S# 4&'2%&, $%,"%31'(# &%6' /+0)"$ 10)12$ ,/(# /03#6#"# A97)#*$,' ,*9) (% 23/./ "%&/"#53'(# /7 $%,"%31'(% 3+0)"$ 10)12$ 9 )9,"#*3/2 03#6%-9. N5#5'$5'6,' (%0'6,' 2/7%*' $% 2/.9 #95/2#5$,' '0"%5$' '0 "%*',% ,/*'6'3% ('$4&#"3'+) (%0'6,'+ 4/7#5#,# (34&, ,/&49$#). D/ $#7# $9 /"' 4&'$594' ,/&':>%3' ' 4&/1%-'"#3' 0# 4/7#5,% 3# %3.*%$,/2 (%0',9. !3' $%,

Correction proposals

Grammar check

Input text

Spelling check

Statistical language

model

Figure 3: E(,1"() /"#$&) (9"1,: #)3*,1)*) *) 2()1$9$-); 5"3*,: 3.).$3.$%&))

Language Technology Support for Serbian

29 INTERNAL DRAFT

2%F95'2, 3% 4&'2%-9(9 9"%, 7'&%,53/ 3# $&4$,' (%0', '2#(9>' 9 "'79 -%./" $*/)/7#3 &%7 &%6' ' )/.#59 A*%,$'(9. ?&"' 4/,9:#(' 7# $% &#0"'(% $/A5"%& 0# 4&/"%&9 4&#"/4'$# 0# $&4$,' (%0', 96'-%3' $9 (/: 1970-5'+32 ' )'*' $9 2/5'"'$#3' 4&/)*%2'2# 3# ,/(% $9 3#'*#0'*% "%*',% '07#"#6,% ,9>%. D#3#$ (% $*/)/7#3 2/79* 0# 4&/"%&9 4&#"/4'$# 0# $&4$,' (%0', 7/$594#3 0# OpenOffice33 3# &#0*'6'5'2 /4%&#5'"3'2 $'$5%2'2#, # 4/$5/(' ' &963/ '0&#F%3' 4&/'0"/7, 4#,%5 RAS34 ,/(' (% &#0"'*# ,/24#3'(# Srbosof ' ,/(' $% 2/&# 0#$%)3/ '3$5#*'&#5' 0# $"#,/. ,/&'$3',#. ?&/"%&# (%0',# $% 3% ,/&'$5' $#2/ 9 #*#5'2# 0# /)&#79 &%6' "%> $% 4&'2%-9(% ' 9 $'$5%2'2# 0# 4/7&:,9 #95/&'2#. L/*'6'3# 5%+3'6,% 7/,92%35#1'(% (% 7&#2#5'63/ 4/&$#*# 9 4&%5+/73'2 7%1%3'(#2#, ,#/ 4/$*%7'1# $"% "%>%. )&/(# 5%+3'6,'+ 4/&'0"/7#. ?*#:%>' $% 8#*)' ,94#1# 0)/. 4/.&%:3/. ,/&':>%-# ' 0#+5%"# 0# /7:5%5/2 7/ ,/('+ )' 2/.*/ 7/>% (%& $9 '3$5&9,1'(% 0# 94/5&%)9 )'*% */:% '*' '+ /3' 3'$9 7/)&/ &#092%*', ,/24#3'(% $9 4/6%*% $"% "':% 4#8-% 7# 4/$"%>9(9 5%+3'6,/( 7/,92%35#1'(' 9$&%7$&%F9(9>' $% '$5/"&%2%3/ 3# 2%F93#&/73/ 5&8':5%. S#4&%7#, 9 /)&#7' 4&'&/73'+ (%0',# 7/"%/ (% 7/ $5"#&#-# $/A5"%&# 0# 4/7&:,9 #95/&'2# ,/(' 4/2#8% 4'$1'2# 5%+3'6,% 7/,92%35#1'(% 7# ,/&'$5% &%63', ' &%6%3'63% $5&9,59&% ,/(' )' )'*' 9 $,*#79 $# /7&F%3'2 4&#"'*'2# ' 7# 4/:59(9 5%&2'3/*/:,# /.&#3'6%-# ,/(# -'+/"# ,/24#3'(# 3#2%>%. !"#,"' $'$5%2' 0# $&4$,' (%0', 3'$9 &#0"'(%3'. =&%)# '4#, 7# $% 0#)%*%8' (%7#3 3%7/"&:%3' %,$4%&'2%35 ,/(' (% ':#/ 9 /"/2 4&#"19 # ,/(' (% 5&%)#*/ 7# 9"%7% ,/35&/*9 3#7 (%0',/2 9[)%3',# 0# /$3/"3% ' $&%7-% :,/*% $# 1'E%2 7# $% /.&#3':' 4&%5%&#3# 94/5&%)# $5&963% 5%&2'3/*/.'(%. ?&/"%&# (%0',# 3'(% 4/5&%)3# $#2/ 9 $'$5%2'2# 0# 4&/"%&9 4&#"/4'$# ' 0# 4/7&:,9 #95/&'2# "%> (% "#83# ' 0# &#693#&$,' 4/7&8#3/ 96%-% (%0',#35, # 4&'%-9(% $% ' #95/2#5$,9 ,/&%,1'(9 94'5# ,/(' $% 4/$5#"E#(9 2#:'3#2# 0# 4&%5&#8'"#-% "%)#, ,#/ :5/ $9 P9.*/"' 4&%7*/0' 5'4# 'D# *' $5% 2'$*'*' 3#...'.

5("3(07$'0F" '"K0 D#3#$ (% 4&%5&#8'"#-% "%)#, '35&#3%5# ' 7'.'5#*3'+ )')*'/5%,# "%&/"#53/ 3#(&#$4&/$5&#-%3'(% ,/&':>%-% (%0'6,'+ 5%+3/*/.'(#, ,/(% (% '$5/"&%2%3/ 3%7/"/E3/ &#0"'(%3/. M#:'3# 0# 4&%5&#8'"#-% P9.* (Google), ,/(# (% /54/6%*# $# &#7/2 1998, $% 7#3#$ ,/&'$5' 0# /,/ 80% $"'+ 94'5# 3# "%)9 :'&/2 $"%5#.36 P*#./*' 454#$*6/61454#$*6 $9 9 &%7/"3/( 94/5&%)' 0# $&4$,' (%0',. S' $92%F# 0# 4&%5&#8'"#-% 3' 4&',#0 4&/3#F%3'+ &%09*5#5# 3'$9 $% 03#6#(3/ 4&/2%3'*' /7 4&"% "%&0'(%. H 5%,9>/( "%&0'(' P9.* 397' 2/.9>3/$5 '$4&#",% 4/.&%:3/ 3#4'$#3'+ &%6',

32 Zoran UroševiV: Statisti-ka metoda otkrivanja i korekcije slovnih grešaka supstitucionog tipa u tekstu na srpskohrvatskom jeziku, BIGZ, Beograd, 1975. 33 http://extensions.services.openoffice.org/en/node/1572/releases 34 http://www.rasprog.com/html/3_0_korektor.html 35 <%7#3 /7 &%5,'+ 5#,"'+ ,9&$%"# (% )'/ 4/39F%3 3# http://www.azbukum.org.rs/index.php 36 http://www.spiegel.de/netzwelt/web/0,1518,619398,00.html

Search Results

Semantic Processing

Query Analysis

Web pages

Pre-processing

User query

Pre-processing

Indexing

Matching & Relevance

Figure 4: C(8$."&.0() 2(".()F$1)?) 1"G)

Language Technology Support for Serbian

30 INTERNAL DRAFT

# 5#,/F% (% 2009. ./7'3% 9.&#7'/ 9 2%:#"'39 $"/('+ #*./&'5#2#37 ' /$3/"3% 2/.9>3/$5' 0# $%2#35'6,9 4&%5&#.9 ,/(% 2/.9 7# 4/)/E:#(9 5#63/$5 4&%5&#.% #3#*'0'&#-%2 03#6%-# 94'53'+ 5%&2'3# 9 ,/35%,$59. H$4%+ P9.*# 4/,#09(% 7# 90 "%*',9 ,/*'6'39 4/7#5#,# 4&' &91' ' 90 ,/&':>%-% %A',#$3'+ 5%+3',# 0# '37%,$'&#-% 5'+ 4/7#5#,#, 4&'$594 ,/(' $% 0#$3'"# 9.*#"3/2 3# $5#5'$5'1' 2/8% 7# 7/"%7% 7/ 0#7/"/E#"#(9>'+ 4/7#5#,#. ?# '4#,, 0# 0#+5%"3'(% 0#+5%"% 0# '3A/&2#1'(#2# 3%/4+/73/ (% '35%.&'$#-% 79)E%. *'3."'$5'6,/. 03#-#. H '$5&#8'"#6,'2 *#)/&#5/&'(#2# $9 %,$4%&'2%35' $# ,/&':>%-%2 5%0#9&9$# 9 2#6'3$,' 6'5E'"/2 /)*',9 ' /35/*/:,'+ (%0'6,'+ &%$9&$# ,#,#" (% WordNet (' -%./" $&4$,' %,"'"#*%35 N&4S%5), 7/"/7'*' 7/ 4/)/E:#-# 9 4&/3#*#8%-9, 3# 4&'2%& ,&/0 4&/3#*#8%-% $5&#3'1# $# $'3/3'23'2 5%&2'3'2#, 34&. $*!789$ )")+46%$ ' "59#)$+"$ )")+46%$, '*' 4&%5&#8'"#-%2 4&%,/ (/: $*#)'(% 4/"%0#3'+ 5%&2'3# ,#,"' $9 :)#6 #59 ' ;)<'$9. N*%7%># .%3%&#1'(# 2#:'3# 0# 4&%5&#8'"#-% >% 2/&#5' 7# 9,E96' (/: 23/./ 3#4&%73'(9 (%0'6,9 5%+3/*/.'(9. B,/ $% 94'5 $#$5/(' /7 4'5#-# '*' 3%,% 7&9.% "&$5% &%6%3'1% 92%$5/ /7 *'$5% ,E963'+ &%6', 4&/3#*#8%-% &%*%"#353'+ /7./"/&# 3# 94'5 0#+5%"# #3#*'09 &%6%3'1% 3# $'35#,$'6,/2 ' $%2#35'6,/2 3'"/9, ,#/ ' 4/$5/(#-% '37%,$# ,/(' /2/.9>#"# )&0/ 4&/3#*#8%-% &%*%"#53'+ 7/,92%3#5#. S# &4'2%&, 0#2'$*'2/ 7# ,/&'$3', 0#7#(% 94'5 'D#( 2' *'$59 ,/24#3'(# ,/(% $% 4&%90%5% /7 $5&#3% 7&9.'+ ,/24#3'(# 9 4/$*%7-'+ 4%5 ./7'3#'. D# )' $% 7/)'/ 0#7/"/E#"#(9>' /7./"/&, 5&%)# 7# $% 4&'2%3' $'35#,$'63/ 4#&$'&#-% 7# )' $% #3#*'0'&#*# .&#2#5'63# $5&9,59&# &%6%3'1% ' 7# )' $% 95"&7'*/ 7# $% 5&#8% ,/24#3'(% ,/(% $9 4&%90%5% # 3% /3% ,/(% $9 4&%90%*% 7&9.% ,/2#43'(%. =#,/F%, '0&#0 5 =!8#)2'6> =)* 4!26"$ 5&%)# 7# $% /)&#7' 7# )' $% 95"&'*/ 3# ,/(% $% ./7'3% /73/$'. L/3#63/, 94'5 ,/(' $% /)&#F9(% 5&%)# 7# $% $&#"3' $# /.&/23/2 ,/*'6'3/2 3%$5&9,59&'&#3'+ 4/7#5#,# 7# )' $% 4&/3#6*' 7%*'>' '3A/&2#1'(# ,/(% ,/&'$3', 5&#8'. !"/ $% /)'63/ 3#0'"# 4&/3#*#8%-% '3A/&2#1'(# :5/ 9,E969(% 4&%5&#8'"#-% ' &#3.'&#-% &%*%"#53'(' 7/,92%3#5#. !$'2 5/.#, 7# )' $% .%3%&'$#*# *'$5# ,/24#3'(# 4/5&%)3/ (% 7# $% '0 7/,92%35# '0"#7% '3A&/2#1'(% 7# $% /7&%F%3# 3'$,# &%6' /73/$' 3# '2% ,/24#3'(%. =#,/0"#3' $'$5%2' 0# 4&%4/03#"#-% '2%3/"#3'+ %35'5%5# /)%0)%F9(9 /"9 "&$59 '3A/&2#1'(#. </: 0#+5%"3'(' $9 4/,9:#(' $&#"-'"#-# 94'5# $# 7/,92%35'2# ,/(' $9 0#4'$#3' 3# &#0*'6'5'2 (%0'1'2#. @# "':%(%0'63/ 4&/3#*#8%-% '3A/&2#1'(# 4/5&%)3/ (% 7# $% #95/2#5$,' 4&%"%7% 94'5 3# $"% 2/.9>% '0"/&3% (%0',%, # 0#5'2 7# $% 4&%3%$% 4&/3#F%3# '3A/&2#1'(# 3# 1'E3/2 (%0',9. N"% "%>' 4&/1%3#5 4/7#5#,# (% 7/$594#3 9 A/&2#59 ,/(' 3'(% 5%,$59#*#3 :5/ 4/"%>#"# 0#+5%"% 0# $%&"'$'2# ,/(' /2/.9>#"#(9 29*5'2%7'(#*3/

37 http://www.pcworld.com/businesscenter/article/161869/google_rolls_out_semantic_ search_capabilities.html

Language Technology Support for Serbian

31 INTERNAL DRAFT

4&/3#*#8%-% '3A/&2#1'(#, 34&. 4&/3#*#8%-% '3A/&2#1'(# 9 $*',#2#, #97'/ ' "'7%/ 4/7#1'2#. @# #97'/ ' "'7%/ 7#5/5%,% 5/ 9,E969(% 2/79* 0# 4&%4/03#"#-% ./"/&# 7# )' $% ,/3"%&5/"#/ ./"/&3' $#7&8#( 9 5%,$5 '*' A/3%5$,9 &%4&%0%35#1'(9 $# ,/(/2 $% ,/&'$3',/" 94'5 2/8% $&#"-'"#5'. ?/49*#&3% */,#1'(% 9 N&)'(' ,/(% 397% 2/.9>3/$5' 4&%5&#8'"#-#, ,#/ :5/ $9 B92 ' L&$5#&'1# /$*#-#(9 $% 9.*#"3/2 3# $%&"'$% P9.*#38. ?/,9:#( 7# $% 9"%7% 2#:'3# 0# 4&%5&#8'"#-% ,/(# )' /)#"E#*# '$,E96'"/ 4&%5&#.9 3#7/*% 7/2%3# .rs 7/2%3# ' ,/(# )' )'*# 7%*'2'63/ 4&'*#./F%3# $4%1'A'63'2 $"'($5"'2# $&4$,/. (%0',# (% 3#49:5%3 2010. ./7'3% ,#/ 3%4&/A'5#)'*#3. !7&%F%3 )&/( 2#*'+ ' $&%7-'+ 4&%790%># &#7' 3# 4&/:'&'"#-9 4&%5&#8'"#6,'+ $%&"'$#, #*' 9.*#"3/2 0# $5&#3% 4#&53%&% ' 0# %3.*%$,' (%0',. H '$5&#8'"#6,/2 /,&98%-9 $9 /)#"E%3' %,$4%&'2%35' $# 4&/:'&'"#-%2 94'5# ,/(' $9 2#:'3#2# 0# 4&%5&#8'"#-% $*#*' 94'5% 4&/:'&'"#3% 2/&A/*/:,'2 &%63'1'2# ' "':%(%0'63'2 $%2#35'6,'2 2&%8#2#. !"' %,$4%&'2%35' $9 7#*' 0#3'2E'"% ' ,/&'$3% &%09*5#5% 9 &#03/"&$3'2 7/2%3'2#39.

D.3"(0,%$80 6)')()- =%+3/*/.'(# 0# '35%&#,1'(9 ./"/&/2 (% /$3/"# 0# '0&#79 $92%F# ,/(% 7/0"/E#"#(9 ,/&'$3',9 7# ,/293'1'&# $# 2#:'3#2# ,/&'$5%>' ./"/&3' (%0', 92%$5/ .&#A'6,/. 7'$4*%(#, 5#$5#59&% '*' 2':#. D#3#$ $% 5#,"% .*#$/"3% ,/&'$3'6,% $92%F% (voice user interfaces - VUIs) /)'63/ ,/&'$5% 0# 4/5493/ '*' 7%*'2'63/ #95/2#5'0/"#3% $%&"'$% ,/(% ,/24#3'(% 4&%,/ 5%*%A/3# 397% 29:5%&'(#2#, 0#4/$*%3'2# '*' 4#&53%&'2#. ?/$*/"3' 7/2%3' ,/(' $% 9 "%*',/( 2%&' /$*#-#(9 3# .*#$/"39 ,/&'$3'6,9 $92%F9 $9 )#3,#&$5"/, */.'$5',#, (#"3' 4&%"/0 ' 5%*%,/293',#1'(%. =%+3/*/.'(# 0# '35%&#,1'(9 ./"/&/2 $% /$'2 5/.# ,/&'$5' ' 0# $92%F% ,# /7&%F%3'2 9&%F#('2#, 34&. 9 3#"'.#1'/3'2 $'$5%2'2# 9 ,/*'2#, ' 0# ,/&':>%-% ./"/&# ,#/ #*5%&3#5'"% 0# 9*#03//'0*#03% 2/.9>3/$5' .&#A'6,% ,/&'$3'6,% $92%F%, 34&. 9 4#2%53'2 5%*%A/3'2#. H (%0.&/ $'$5%2# 0# '35%&#,1'(9 ./"/&/2 9,E96%3% $9 $*%7%>% 6%5'&' &#0*'6'5% 5%+3/*/.'(%:

• B95/2#5$,/ 4&%4/03#"#-% ./"/&# (Automatic speech recogni-tion - ASR) (% 0#798%3/ 0# 95"&F'"#-% ,/(% &%6' $9 $5"#&3/ '0./"/&%3% ,#7# (% 7#5# $%,"%31'(# 0"9,/"# ,/(9 (% 4&/'0"%/ ,/&'$3',.

• N'35#,$3# #3#*'0# ' $%2#35'6,# '35%&4&%5#1'(# $% )#"% #3#*'0/2 $'35#,$3% $5&9,59&% ,/&'$3',/"/. '$,#0#, # 4/5/2 ' -%./"/2 '35%&4&%5#1'(/2 9 $,*#79 $# 3#2%3/2 /7&%F%3/. $'$5%2#.

• H4&#"E#-% 7'(#*/./2 (% 4/5&%)3/ 7# )' $% /7&%7'*/ $# $5&#3% $'$5%2# $# ,/('2 ,/&'$3', ,/293'1'&# ,/(9 #,1'(9

38 http://www.alexa.com/topsites/countries/CS 39 http://www.ncd.matf.bg.ac.rs/casopis/12/NCD12065.pdf Natural language understanding

& dialogue

Phonetic lookup & Intonation planning

Recognition

Speech output

Speech synthesis

Speech input

Signal processing

Language Technology Support for Serbian

32 INTERNAL DRAFT

5&%)# 4&%790%5' 0# 7#5' ,/&'$3',/2 9*#0 ' 7#5% A93,1'/3#*3/$5' $'$5%2#.

• =%+3/*/.'(# 0# $'35%0# ./"/&# (5%,$5 9 ./"/&, '*' Text-to-Speech, TTS) $% ,/&'$5' 0# 5&#3$A/&2#1'(9 &%6' '$,#0# 9 0"9,/"% ,/(% >% ,/&'$3', 4&'2'5' ,#/ '0*#0.

P*#"3' '0#0/" (% 4/$%7/"#-% $'$5%2# 0# #95/2#5$,/ 4&%4/03#"#-% ./"/&# ,/(' 4&%4/03#(% &%6' ,/(% ,/(% ,/&'$3', '0./"/&'/ :5/ (% 4&%1'03'(% 2/.9>%. !"/ 0#+5%"# '*' 7# $% /.&#3'6' /4$%. 2/.9>'+ ,/&'$3',/"'+ '$,#0# 3# /.&#3'6%3 $,94 ,E963'+ &%6' '*' 7# $% &963/ '0.&#7' (%0'6,' 2/7%* ,/(' 4/,&'"# :'&/,' /4$%. ,/&'$3',/"'+ '$,#0# 3# 4&'&/73/2 (%0',9. D/, 4&'2%3# 4&"/. 4&'$594# 7#(% ,#/ &%09*5#5 4&'*'63/ &'.'739 ' 3%A*%,$')'*39 .*#$/"39 ,/&'$3'6,9 $92%F9 ,/(9 ,/&'$3'1' 3%"/E3/ 4&'+"#5#(9, 7/5*% ,&%'&#-%, 4/7%:#"#-% ' /7&8#"#-% (%0'6,'+ 2/7%*# 2/8% 03#6#(3/ 7# 9"%># 5&/:,/"%. ?# '4#,, .*#$/"3% ,/&'$3'6,% $92%F% ,/(% ,/&'$5% (%0'6,% 2/7%*% ,/(% 3# 4/6%5,9 7/0"/E#"#(9 ,/&'$3',9 7# $*/)/73/ '0&#0% $"/(% 3#2%&% – 4/7$5#,395', 3# 4&'2%& 4'5#-%2 ‘L#,/ G#2 2/.9 4/2/>'’ – 4/,#09(9 "%6' $5%4%3 #95/2#5'0/"#3/$5' ' 4&'+"#5#-# 4# $% 2/.9 $2#5&#5' 4/8%E3'('2 /7 7'(#*/:,'+ $'$5%2# 2#-% A*%,$')'*3/$5'. Q5/ $% 5'6% '0*#03/. 7%*# .*#$/"3% ,/&'$3'6,% $92%F% ,/24#3'(% 5%8% ,/&':>%-9 93#4&%7 $3'2E%3'+ '$,#0#, 9 '7%#*3/2 $*96#(9 "':%, 4&/A%$'/3#*3'+ ./"/&3',#. @# $5#5'6,% '$,#0% ,/7 ,/('+ ,/&':>%3% &%6' 3% 0#"'$% /7 ,/3,&%53/. ,/35%,$5# 9 ,/2% $% ,/&'$5% 3'5' /7 *'63'+ 4/7#5#,# 7#5/. ,/&'$3',#, &%09*5#5 2/8% )'5' 0# ,/&'$3',# $#$"'2 0#7/"/E#"#(9>'. M%F95'2, :5/ (% $#7&8#( '$,#0# 7'3#2'63'(' 95/*',/ "':% 2/8% 7# &#$5% ,/&'$3',/"/ 3%0#7/"/E$5"/ 0)/. */:% 4&/0/7'(% 7/ ,/(% 7/*#0' 0)/. $4#(#-# 4/(%7'3#63'+ #97'/-7#5/5%,#. S#$94&/5 5/2%, 7#3#:-' $'$5%2' 0# 5&#3$A/&2#1'(9 5%,$5# 9 ./"/& $9 $94%&'/&3'(' 9 4/.*%79 4&/0/7'(#6,% 4&'&/73/$5' 7'3#2'6,'+ '$,#0#, '#,/ '+ (% (/: 4/5&%)3/ /45'2'0/"#5'. H 4/.*%79 5&8':5# 0# 5%+3/*/.'(% 0# '35%&#,1'(9 ./"/&/2, 5/,/2 4&/5%,*% 7%1%3'(% 7/:*/ (% 7/ "%*',% $5#37#&7'0#1'(% $92%F# ,/(% 4/"%09(9 ,/24/3%35% 0#$3/"#3% 3# &#0*'6'5'2 5%+3/*/.'(#2#, ,#/ ' 7/ 9$5#3/"#E#"#-# $5#37#&7# 0# '0.&#7-9 /7&%8%3'+ $/A5"%&$,'+ 4&/'0"/7# 0# 7#59 4&'2%39. H 4/$*%7-'+ 7%$%5 ./7'3# (% 7/:*/ ' 7/ "%*',% ,/3$/*'7#1'(% 5&8':5#, 4/$%)3/ 9 7/2%39 $'$5%2# 0# #95/2#5$,/ 4&%4/03#"#-% ./"/&# ' 0# 4&%5"#&#-% 5%,$5# 9 ./"/&. H /"/2 4/E9, 3#1'/3#*3'2 5&8':5'2# 0%2#E# G20 – :5/ 03#6' %,/3/2$,' (#,'+ 0%2#E# $# 03#6#(3/2 4/49*#1'(/2 – 7/2'3'&# 2#-% /7 5 '.&#6# '0 1%*/. $"%5#, 4&' 6%29 $9 Nuance ' Loquendo 3#(4&'$953'(' 9 ;"&/4'. M%5/7% 0# 4&%4/03#"#-% ' $'35%09 ./"/&# $9 9 N&)'(', ,#/ ' 3# :'&%2 4&/$5/&9 )'":% <9./$*#"'(%, &#0"'(#3# 9.*#"3/2 9 %*%,5&/'38%-%&$,/2 /,&98%-9 90 $#&#7-9 $5&96-#,# 0# A/3%5',9. ?&"' 3#4/&' $9 )'*' 9$2%&%3' 3# 4&%4/03#"#-% '0/*/"#3'+ A/3%2#. @3#6#(#3 4/2#, (% 9 /"/2 7/2%39 96'3'*# .&94# $# =%+3'6,/. A#,9*5%5# H3'"%&0'5%5# 9 S/"/2 N#79 ,#7# (% '0&#7'*# 4/&%7 ./"/&3'+ )#0# 4/7#5#,# *%,$'6,9 )#09 /7 4&%,/ 4 2'*'/3# #,1%35/"#3'+ /)*',# &%6' $&4$,/. (%0',# ' "':% /7 3

Figure 5: !"5*,3.)1*) )(8$."&.0() #) 5$/)9,' #)3*,1)* *) ',1,(0

Language Technology Support for Serbian

33 INTERNAL DRAFT

2'*'/3# /)*',# &%6' +&5"#5$,/. (%0',#. L/&':>%-%2 /"'+ &%$9&$# &#0"'(%3% $9 &#0*'6'5% #4*',#'1'(% '0 7/2%3# #95/2#5$,/. 4&%4/03#"#-# ./"/&# ' 4&%5"#&#-# 5%,$5# 9 ./"/&. ?&%4/03#"#-% ' $'35%0# ./"/&# 0# $&4$,' (% 9:*/ 9 ,/2%&1'(#*39 94/5&%)9 ,&/0 A'&29 AlfaNum, ,/(# (% 4/5%,*# 3# H3'"%&0'5%59 9 S/"/2 N#79. !"# ,/24#3'(# 9$4%:3/ 4/$*9(% ' 9 7&9.'2 7&8#"#2# ,/(% $9 3#$5#*% 3# 4&/$5/&9 )'":% <9./$*#"'(% – 9 \&"#5$,/(, M#,%7/3'(', I/$3' ' \%&1%./"'3' ' K&3/( P/&'. L/24#3'(# Al-faNum '2# 03#6#(#3 )&/( ,/&'$3',# 2%F9 $&4$,'2 ,/24#3'(#2# L#7# 4&%"/7' 3# $&4$,' P9.*/" 4&%"/7'*#1 5#,/F% 397' /$3/"3% 2/.9>3/$5' 4&%5"#&#-# 5%,$5# 9 ./"/& 0# &%09*5#5% 4&%"/F%-#, #*' )%0 9.&#F%3'+ #,1%3#5#. P*%7#(9>' 7#E% /7 7#3#:-%. 5%+3/*/:,/. $5#-#, 7/>' >% 7/ 03#6#(3'+ 4&/2%3# 0#+"#E9(9>' :'&%-9 4#2%53'+ 5%*%A/3# ,#/ 3/"% 4*#5A/&2% 0# 94&#"E#-% ,/&'$3'6,'2 /73/$'2# ,/(# >% $% ,/&'$5'5' 4/&%7 "%6 4/$5/(%>'+ ,#3#*# – 5%*%A/3#, '35%&3%5# ' %*%,5&/3$,% 4/:5%. !"# 5%37%31'(# >% 95'1#5' ' 3# ,/&':>%-% 5%+3/*/.'(% 0# '35%&#,1'(9 ./"/&/2. N (%73% $5&#3%, 3# 798% $5&#0% >% /4#7#5' 4/5&#8#-# 0# .*#$/"3/2 ,/&'$3'6,/2 $92%F/2 0# 5%*%A/3$,% 9$*9.%. N 7&9.% $5&#3%, ,/&':%-% ./"/&3'+ 2/.9>3/$5' 0# 4&'$594#6#3 9*#0 0# 4#2%53% 5%*%A/3% >% 7/)'5' 3# 03#6#(9. !"9 5%37%31'(9 4/7&8#"# 3#4&%7#, ,/(' $% "%> 2/8% 9/6'5' 9 5#63/$5' 4&%4/03#"#-# ./"/&# 3%0#"'$3'+ ./"/&3',# ,/7 ./"/&3'+ $%&"'$# 0# 7',5'&#-% ,/(' $% "%> 397% ,#/ 1%35&#*'0/"#3% 9$*9.% ,/&'$3'1'2# 4#2%53'+ 5%*%A/3#. ?&%49:5#-%2 0#7#5,# 4&%4/03#"#-# '3A&#$5&9,59&' #4*',#1'(#, ,/&':>%-% )#03'+ *'3."'$5'6,'+ 5%+3/*/.'(# $4%1'A'63'+ 0# #4*',#1'(9 )' 5&%)#*/ 7# 7/)'(% 3# 03#6#(9 9 4/&%F%-9 $# $#7#:-'2 $5#-%2.

=02$.+,) *("')L"F" O7%(9 7# )' $% 7'.'5#*3' &#693#&' 2/.*' ,/&'$5'5' 0# 4&%"/F%-% 4&'&/73'+ (%0',# (% *#3$'&#/ 1946. B. D. I95 (A. D. Booth) 4/$*% 6%.# (% 9$*%7'*/ 03#6#(3/ A'3#3$'&#-% '$5&#8'"#-# 9 /"/( /)*#$5' 4%7%$%5'+ ./7'3# ' 4/5/2 /$#27%$%5'+ ./7'3# 4&/:*/. "%,#. O 4/&%7 $"%.#, 2#:'3$,/ 4&%"/F%-% (Machine Translation -MT) ' 7#E% 3% 9$4%"# 7# '$493' "%*',# /6%,'"#-# ,/(# (% 4/$5#,#/ 9 5'2 &#3'2 7#3'2#. S# /$3/"3/2 3'"/9, 2#:'3$,/ 4&%"/F%-% (%73/$5#"3/ 0#2%-9(% &%6' '0 (%73/. 4&'&/73/. (%0',# &%6'2# '0 3%,/. 7&9./.. !"/ 2/8% 7# )97% ,/&'$3/ 9 3%,'2 4&%72%53'2 7/2%3'2# ,/(' ,/&'$5% "%/2# /.&#3'6%3 A/&2#*'0/"#3 (%0',, ,#/ :5/ (% (%0', "&%2%3$,'+ 4&/.3/0#. M%:95'2, 0# 7/)#& 4&%"/7 5%,$5/"# ,/(' 3'$9 5/*',/ $5#37#&7'0/"#3', 5&%)# $&#"3'5' "%>% 5%,$59#*3% (%7'3'1% (A&#0%, &%6%3'1%, '*' 1%*% 4#$9$%) $# 3#()*'8'2 4#&/2 9 1'E3/2 (%0',9. !"7% 3#("%># 4/5%:,/># *%8' 9 5/2% :5/ $9 (%0'1' E97' "':%03#63', :5/ $5"#&# '0#0/"% 3# &#0*'6'5'2 3'"/'2#, (%& 5&%)#, 3# 4&'2%&, /5,*/3'5' "':%03#63/$5 03#6%-# &%6' 3# *%,$'6,/2 3'"/9 (‘(#.9#&’ 2/8% 7# )97% 3#0'" 8'"/5'-% ' #95/2/)'*#) '*' '95"&7'5' 4/"%0#3/$5 4&%7*/:,'+ A&#0# 3# $'35#$3/2 3'"/9, ,#/ 9: Policajac je uspeo da primeti -oveka bez teleskopa.

Statistical machine

translation

Target text Source text

Translation rules

Post-editing (formatting, context, etc.)

Text analysis (formatting, morphology, syntax, etc.)

Figure 6: ;)6$*3&, 2("1,H"?" (*) 1(80: 3.).$3.$%&,: *) 5*0: #)3*,1)*, *) 2()1$9$-))

Language Technology Support for Serbian

34 INTERNAL DRAFT

[The policeman managed to notice the man without the telescope.] Policajac je uspeo da primeti -oveka bez revolvera. [The policeman managed to notice the man without the revolver.] <%7#3 3#6'3 7# $% 4&'$594' /"/2 0#7#5,9 $% 0#$3'"# 3# *'3."'$5'6,'2 4&#"'*'2#. @# 4&%"/F%-% '02%F9 $&/73'+ (%0',# 2/.9>% (% ' 7'&%,53/ 4&%"/F%-% 9 $*96#(%"'2# ,/(' 3#*',9(9 3#"%7%3'2 4&'2%&'2#. O4#,, $'$5%2' 0#$3/"#3' 3# 4&#"'*'2# ('*' 3# 03#-9) #3#*'0'&#(9 9*#03' 5%,$5 ' ,&%'&#(9 4/$&%739 $'2)/*'6,9 '35%&4&%5#1'(9 '0 ,/(% $% 4/5/2 .%3%&':% 5%,$5 3# 1'E3/2 (%0',9. H$4%+ /"'+ 2%5/7# "%/2# 0#"'$' /7 4/$5/(#-# '$1&43'+ *%,$',/3# $# 2/&A/*/:,'2, $'35#,$3'2 ' $%2#35'6,'2 '3A/&2#1'(#2#, ' "%*','+ $,94/"# .&#2#5'6,'+ 4&#"'*# ,/(% $9 4#8E'"/ '0&#7'*' "%:5' *'3."'$5'. L#,/ (% ,&#(%2 /$#27%$%5'+ ./7'3# 4&/:*/. "%,# &#693#&$,# $3#.# 4/&#$*# ' 4/(%A5'3'*# 7/:*/ (% 7/ "%6%. '35%&%$/"#-# 0# $5#5'$5'6,% 2%5/7% 9 2#:'3$,/2 4&%"/F%-9. ?#&#2%5&' /"'+ $5#5'$5'6,'+ 2/7%*# $% '0"/7% '0 #3#*'0% )'*'3."#*3'+ 5%,$59#*3'+ ,/&49$#, ,#,#" (%, 3# 4&'2%&, 4#&#*%*3' ,/&49$ Eu-roparl, ,/(' $#7&8' 5%,$5/"% ;"&/4$,/. 4#&*#2%35# 3# 11 (%0',#. ?/7 9$*/"/2 7# '2#(9 7/"/E3/ 4/7#5#,#, $5#5'$5'6,/ 2#:'3$,/ 4&%"/F%-% 2/8% 7# '0"%7% 7/"/E3/ 7/)&/ 4&')*'83/ 03#6%-% 5%,$5# 3# $5&#3/2 (%0',9. M%F95'2, 0# &#0*',9 /7 $'$5%2# 0#$3/"#3'+ 3# 03#-9, $5#5'$5'6,/ 2#:'3$,/ 4&%"/F%-% ('*' 4&%"/F%-% 0#$3/"#3/ 3# 4/7#1'2# 6%$5/ .%3%&':% 3%.&#2#5'6,' '0*#0. N 7&9.% $5&#3%, /$'2 :5/ 0#+5%"#(9 2#-% 3#4/&# E97' 0# 4'$#-% .&#2#5',#, 4&%"/F%-% 0#$3/"#3/ 3# 4/7#1'2# 2/8% 7# 4/,&'(% $4%1'A'63/$5' (%0',# ,/(% '02'69 $'$5%2'2# 0#$3/"#3'2 3# 03#-9, ,#/ :5/ $9 '7'/2#5$,' '0&#0'. ?/:5/ $% (#,% ' $*#)% $5&#3% 2#:'3$,'+ $'$5%2# 0#$3/"#3'+ 3# 03#-9, /73/$3/ 4/7#1'2#, 7/49-9(9, '$5&#8'"#6' 7#3#$ (%73/.*#$3/ 5%8% +')&'73'2 4&'$594'2# ,/('# ,/2)'(9(9 /)% 2%5/7/*/.'(%. =/ $% 2/8% 9&#7'5' 3# "':% 3#6'3#. <%7#3 3#6'3 (% 7# $% ,/&'$5% ' $'$5%2' 0#$3/"#3' 3# 03#-9 ' $'$5%2' 0#$3/"#3' 3# 4&#"'*'2#, # 7# 0#$%)#3 2/79* 0# $%*%,1'(9 /7*96' :5# (% 3#()/E' '0*#0 0# $"#,9 &%6%3'19. M%F95'2, 0# 79.#6,% &%6%3'1% ,/7 /"#,"/. 4&'$594# 3' (%7#3 &%09*5#5 3%>% )'5' $#"&:%3. I/E% (% &%:%-% ,/(% ,/2)'3'(% 3#()/E% 7%*/"% $"#,% &%6%3'1% 7/)'(%3% '0 &#0*'6'5'+ '0"/&#, :5/ 2/8% )'5' 7/$5# $*/>%3/ (%& 3'(% 9"%, /6'.*%73/ :5# $9 /7./"#&#(9>' 7%*/"' ,/7 "':%$5&9,'+ 2/.9>3/$5' ' 5&%)# '+ 4/&#"3#5'. Q5/ $% 5'6% "%0% $&4$,/. ' 7&9.'+ $5&#3'+ (%0',#, 4&/)*%2' 0#"'$% /7 4&'&/7% $4%1'A'63/. (%0',# (7# *' '2# &#0"'(%39 2/&A/*/.'(9, 7# *' '2# $*/)/739 '*' A',$'&#39 7'$5&')91'(9 &%6%3'63'+ ,/3$5'59%3#5#, 7# *' ,/&'$5' 6*#3/"%, 7# *' (% 0#4'$#3 >'&'*'63'2 '*' *#5'3'63'2 4'$2/2, 7# *' ,/&'$5' */.'6,9 '*' .&#2#5'6,9 '35%&493,1'(9, '57). M%F95'2, /"7% $% 3% &#7' $#2/ / 5/2% :5# $9 4&/)*%2' "%> ' / 2/.9>3/$5' 7# $% $#&#F9(% 3# &%:#"#-9 $*'63'+ 4&/)*%2#. H 5/2 $2'$*9 )' $#&#7-# $# 4&/(%,5'2# "%0#3'2 0# &#693#&$,9 /)&#79 7&9.'+ $*/"%3$,'+ (%0',# )'*# 4/$%)3/ ,/&'$3#. !"7% $9 5#,/F% "#83% *%,$'6,% ' 5%&2'3/*/:,% "%0%, 3#'2% 9 ,/*',/( 2%&' (% 3%,' $5#&3' (%0',

Language Technology Support for Serbian

35 INTERNAL DRAFT

95'1#/ 3# &#0"/( $4&$,/.. H /"/2 4/E9 )' 5&%)#*/ 5&#8'5' $#&#7-9 $# 4&/(%,5'2# 6'(' (% 1'E &#693#&$,# /)&#7# /3'+ (%0',# ,/(' $9 $*98'*' ' (/: 9"%, $*98% ,#/ ,'62# &#0"/(# $&4$,/., # 5/ $9, 4&% $"%.#, %3.*%$,', A&#319$,', 3%2#6,' ' &9$,'. =&%)#*/ )' 7/7#5' 7# $% /7"'(#(9 ' ,/35&#$5'"3# '$5&#8'"#-# "%0% $&4$,/. ' 3%,'+ $5&#3'+ (%0',#. S#8#*/$5, '4#, '2# 3%7/"/E3/ $#&#7-% '02%F9 *'."'$5# ,/(' (% )#"% $&4$,'2 ,#/ 2#5%&-'2 (%0',/2, ' /3'+ *'3."'$5# ,/(' $% ,#/ $5&96-#1' 0# $5&#3% (%0',% 9,E969(9 9 ,/35&#$5'"3# '$5&#8'"#-#. D&9.' 4&/)*%2 (% 3%7/"/E#3 )&/( "%*','+ 7"/(%0'63'+ &%63',#. S(#"%># 4/5&%)# 0# (%0'6,'2 5%+3/*/.'(#2# 9 N&)'(' (% 3# 4/E9 4&%"/F%-#. ?/$5/(% 3%,# $4%1'(#*'0/"#3# 7&9:5"# (D&9:5"/ ,-'8%"3'+ 4&%"/7'*#1# N&)'(%, D&9:5"/ 3#963'+ ' $5&963'+ 4&%"/7'*#1# N&)'(%), 3%,# */,#*3# 2#*# ' $&%7-# 4&%790%># (34&, Elitence ' Proverbum) ' 3%,% $5&#3% ,/24#3'(% (34&, WorldLingo) ,/(% 397% 4&/A%$'/3#*3% 4&%"/7'*#6,% 9$*9.% '*' $*/)/7#3, 2#:'3$,' 4&%"/7 0#$3/"#3 3# A&#0#2# (34&, Google Translate, WorldLingo). S%,% /7 -'+ ,/&'$5% "*#$3'6,% %*%,5&/3$,% &%63',% 0# $"/( &#7, 7/, WorldLingo 397' ' :'&% 9$*9.% 2#:'3$,/. 4&%"/F%-# ("%) */,#1'(%, 5%,$5, 7/,92%35#, %*%,5&/3$,% 4/&9,%, API, '57). !$'2 7/)&/ 4/03#5/. ' $*/)/73/ 7/$5943/. P9.*/"/. $5#5'$5'6,/. $'$5%2# 0# 4&%"/F%-% ,/(' 9,E969(% ' $&4$,', 3' (%7#3 7&9.' $'$5%2 0# 2#:'3$,/ 4&%"/F%-% 0# $&4$,' 3'(% 4&/'0"%7%3, /$'2 3%,'+ 4/6%53'+ &#7/"# (34&. 9 /,"'&9 4&/(%,5# SEE-ERA) ' 2#*'+ %,$4%&'2%35#*3'+ $'$5%2#. M%F95'2, .%3%&'6,' $5#5'$5'6,' $'$5%2' 0# 2#:'3$,/ 4&%"/F%-% ,#,#" (% Google Translate 4/7&F#"#(9 $&4$,' 9 03#6#(3/( 2%&', 4/$%)3/ 0# 4&%"/F%-% 3# %3.*%$,' ' $# %3.*%$,/.. O4#,, 0# 7&9.% (%0'6,% 4#&/"% Nevertheless, for other language pairs, 4%&A/&2#$% $9 $*#)%, &%09*59(9>' 4&%"/7 (% 6%$5/ 3%&#092E'" 4# 6#, ' $2%:#3. =/ (% &%09*5#5 3%7/"/E3% "%*'6'3% 4#&#*%*3/. ,/&49$# ,/(' (% 0# 5% (%0'6,% 4#&/"% ,/&':>%3 0# /)9,9 $'$5%2# 0# $5#5'$5'6,/ 2#:'3$,/ 4&%"/F%-%. </: 9"%, $% $2#5&# 7# $% 23/./ 2/8% &#7'5' 3# 4/)/E:#-%9 ,"#*'5%5# $'$5%2# 0# 2#:'3$,/ 4&%"/F%-%. O0#0/"' 9,E969(9 4&'*#./F#"#-% (%0'6,'+ &%$9&$# 7#5/2 4&%72%53/2 '*' ,/&'$3'6,/2 7/2%39 ' '35%.&#1'(# 9 4/$5/(%>' &#73' 4&/1%$ $ -%./"'2 5%&2'3/*/:,'2 )#0#2# ' 4&%"/7'*#6,'2 2%2/&'(#2#. !$'2 5/.#, "%>'3# 5%,9>'+ $'$5%2# $% ,/31%35&':% 3# %3.*%$,' 4/7&8#"# 4&%"/7 3# $&4$,' ' $# $&4$,/. 0# 2#*/ 7&9.'+ (%0',# :5/ "/7' ,# 5%:,/>#2# 9 9,943/2 5/,9 4&%"/F%-# (%&, 3# 4&'2%&, 5%&# ,/&'$3',% $'$5%2# 0# 2#:'3$,/ 4&%"/F%-% 7# 96% &#0*'6'5% #*#5% 0# ,/7'&#-% &%63',# &#0*'6'5'+ $'$5%2#. B,1'(% 0# 4&/1%-'"#-% /2/.9>#"#(9 7# $% 4/&%7' ,"#*'5%5 $'$5%2# 0# 2#:'3$,/ 4&%"/F%-%, &#0*'6'5' 4&'$594' ,#/ ' $5#59$ $'$5%2# 0# 2#:'3$,/ 4&%"/F%-% 0# &#0*'6'5% (%0',%. Table 1, 4&%7$5#"E%3# 9 /,"'&9 4&/(%,5# ;"&/4$,% ,/2'$'(% Euromatrix+

Language Technology Support for Serbian

36 INTERNAL DRAFT

4&',#09(% 4%&A/&2#$% 4/ 4#&/"'2# 0# 22 $*98)%3# %"&/4$,# (%0',# (3%7/$5#(% '&$,' .%*$,') 4&%2# BLEU 4&/1%3'.40

S#()/E' &%09*5#5' (4&',#0#3' 0%*%3/2 ' 4*#"/2 )/(/2) $9 4/$5'.395' 0# (%0',% ,/(' '2#(9 ,/&'$5' /7 03#6#(3'+ '$5&#8'"#6,'+ 3#4/&# 9 /,"'&9 $#&#73'6,'+ 4&/.&#2# ' 0# ,/(% 4/$5/(% 2#3/.' 4#&#*%*3' ,/&49$' (34&, %3.*%$,', A&#319$,', +/*#3$,', :4#3$,' ' 3%2#6,'), # 3#(*/:'(' (4&',#0#3' 1&"%3/2 )/(/2) 0# (%0',% ,/(' 3'$9 2/.*' 7# ,/&'$5% $*'633% 4&%5+/73% 3#4/&% '*' ,/(' $9 "%/2# &#0*'6'5' /7 7&9.'+ (%0',# (34&, 2#F#&$,', 2#*5%:,', A'3$,').

Table 1: E"(4,(-)*3" -)6$*3&,' 2("1,H"?) 2, 2)(,1$-) 5,G$/"*" #) 22 390FG"*) "1(,23&) /"#$&) ($#1,(: Euromatrix+)

!"#$:," 3";.)<)6$8" ‘$#0 +%"."’ O0.&#7-# #4*',#1'(# 0#$3/"#3'+ 3# (%0'6,'2 5%+3/*/.'(#2# 9,E969(% /4$%. 4/70#7#5#,# ,/(' $% 3% "'7% 9"%, 3# 3'"/9 '35%&#,1'(% $# ,/&'$3',/2, #*' ,/(' /)%0)%F9(9 03#6#(3% A93,1'/3#*3/$5' ‘'$4/7 +#9)%’ $'$5%2#. ?&%2# 5/2%, /3' 4&%7$5#"E#(9 "#83% '$5&#8'"#6,% 0#7#5,% ,/(' $9 4/$5#*' 0#$%)3% 4/77'$1'4*'3% 9 /,"'&9 &#693#&$,% *'3."'$5',%. !7./"#&#-% 3# 4'5#-# (% 4/$5#*/ #,5'"3/ '$5&#8'"#6,/ 4/7&96(%, 0# ,/(% $9 '0.&#F%3' #3/5'&#3' ,/&49$' ' /54/6%*/ (% 3#963/ 3#72%5#-%. O7%(# (% 7# $% ,&%3% 7#E% /7 4&%5&#8'"#-# 0#$3/"#3/. 3# ,E963'2 &%6'2# (3# ,/(% 2#:'3% /7./"#&#(9 1%*/2 ,/*%,1'(/2 &%*%"#353'+ /7./"/&#) ,# $'59#1'(' 9 ,/(/( ,/&'$3', 4/$5#"E# ,/3,&%53/ 4'5#-%, # $'$5%2 4&98# (%7#3 /7./"/&: ‘N# ,/*',/ ./7'3# (% S'* B&2$5&/3. $594'/ 3# M%$%1?’ — ‘38’. O#,/ (% /"/ /6'.*%73/ 4/"%0#3/ $# "%> 4/2%395'2 /$3/"3'2 4&%5&#8'"#-%2 "%)#, /7./"#&#-% 3# 4'5#-# (% 7#3#$ 4&% $"%.# 0#(%73'6,' 5%&2'3 0# &#0*'6'5% '$5&#8'"#6,% 5%2% ,#/ :5/ $9 ,/(% *6=!0) 4'5#-# 5&%)# &#0*',/"#5' ' ,#,/ 5&%)# $ -'2# 4/$594#5', ,#,/ 5&%)# #3#*'0'&#5' ' 4/&%7'5' 7/,92%35# ,/(# 4/5%31'(#*3/ $#7&8%

40 Q5/ (% "%>' $,/&, 4&%"/7 (% )/E', /7, )' 6/"%, 4&%"/7'*#1 7/)'/ /,/ 80. K. Papineni, S. Roukos, T. Ward, W.-J. Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of ACL, Philadelphia, PA.

Language Technology Support for Serbian

37 INTERNAL DRAFT

/7./"/&% (7# *' /3# $#7&8% $94&/5$5#"E%3% /7./"/&%?), ' ,#,/ $% $4%1'A'63# '3A/&2#1'(# – 0#4&#"/ /7./"/& - 2/8% 4/907#3/ '0"9>' '0 7/,92%35#, 3% 0#4/$5#"E#(9>' 4&' 5/2% ,/35%,$5 9 ,/2% $% 3#*#0'. !"# /)*#$5 (% 4/"%0#3# $# 0#7#5,/2 %,$5&#,1'(% '3A/&2#1'(#, /)*#:>9 ,/(# (% )'*# '090%53/ 4/49*#&3# ' 95'1#(3# 9 "&%2% ‘$5#5'$5'6,/. 0#/,&%5#’ 9 &#693#&$,/( *'3."'$5'1' 4/6%5,/2 7%"%7%$%5'+ ./7'3#. K'E %,$5&#,1'(% '3A/&2#1'(# (% 7# $% '7%35'A',9(9 $4%1'A'63' 7%*'>' '3A/&2#1'(# 9 $4%1'A'63'2 ,*#$#2# 7/,92%3#5#; 5/ 2/8% 7# )97%, 3# 4&'2%&, /5,&'"#-% ,E963'+ '.&#6# 9 4&%90'2#-9 ,/24#3'(# 3# /$3/"9 '0"%:5#"#-# 9 3/"'3#2#. D&9.' $1%3#&'/ 3# ,/2% $% &#7'*/ $9 )'*' '0"%:5#(' / 5%&/&'$5'6,'2 '31'7%35'2#, .7% (% 4&/)*%2 )'/ 7# $% 5%,$5 4&%$*',# 9 :#)*/3 9 ,/2% $9 $4%1'A',/"#3' '0"&:'*#1, 2%5#, "&%2% ' 2%$5/ '31'7%35#, ' :5# (% -'2% 4/$5'.395/. K%35&#*3# ,#&#,5%&'$5',# %,5$&#,1'(% '3A/&2#1'(# (% 4/49-#"#-% :#)*/3# $4%1'A'63/. 0# 3%,' 7/2%3, 0)/. 6%.# (% 5/ (/: (%7#3 4&'2%& 5%+3/*/.'(% ‘'0# $1%3%’ ,/(# 4&%7$5#"E# (#$3/ &#0.&#3'63/ '$5&#8'"#6,/ 4/7&96(% #*' ,/(# '0 4&#,5'63'+ &#0*/.# 2/&# 7# $% 9.&#7' 9 /7./"#&#(9>% /,&98%-% #4*',#1'(%. D"# ‘.&#3'63#’ 4/7&96(#, ,/(# 4/3%,#7 '2#(9 9*/.9 $#2/$5#*3% #4*',#1'(%, # 4/3%,#7 4/2/>3% ,/24/3%35% (‘'$4/7 +#9)%’) $9 $92#&'0#1'(# 5%,$5# ' .%3%&'$#-% 5%,$5#. N92%&'0#1'(# $%, /6'.*%73/,2 /73/$' 3# 0#7#5#, $,&#>'"#-# 79.#6,/. 5%,$5#, ' -9 ,#/ A93,'19(9 397' MS Word. !3# &#7' 9.*#"3/2 3# $5#5'$5'6,'2 /$3/"#2#, 5#,/ :5/ 4&"/ '7%35'A',9(% ‘"#83%’ &%6' 9 5%,$59 (3# 4&'2%&, &%6' ,/(% $% 9 ,/3,&%53/2 5%,$59 6%$5/ (#"E#(9, 7/, $% 9 5%,$59 9/4:5% (#"E#(9 23/./ &%F%), # 0#5'2 95"&F9(9 9 ,/('2 $% &%6%3'1#2# (#"E# 7/$5# "#83'+ &%6'. !"% &%6%3'1% $% 0#5'2 9 5%,$59 /)%*%8#"#(9, '*' '0 -%.# "#7%, ' /3% 6'3% $#8%5#,. H /"/2 $1%3#&'(9, ,/(' (% 7#*%,/ 3#(4/49*#&3'(', $92#&'0#1'(# (% '0(%73#6%3# $# %,$5&#,1'(/2 &%6%3'1#: 5%,$5 $% $"/7' 3# 4/7$,94 $"/('+ &%6%3'1#. N"' ,/2%&1'(#*3' $'$5%2' 0# $92#&'0#1'(9 ,/&'$5% /"9 '7%(9. B*5%&3#5'"3' 4&'$594, ,/2% $% 5#,/F% 4/$"%>9(9 3%,' '$5&#8'"#6,' 3#4/&', $#$5/(' $% 9 $5"#&3/( $'35%0' "!06> &%6%3'1#, 5/ (%$5, 9 '0.&#7-' $#8%5,# ,/(' 6'3% &%6%3'1% ,/(% $% 9 5/2 /)*',9 3% (#"E#(9 9 5%,$59. !"/ 0#+5%"# 9 /7&%F%3/( 2%&' 79)E% &#092%"#-% 5%,$5# 4# (% $5/.# ' 2#-% &/)9$3/. N"% 9 $"%29, .%3%&'$#-% 5%,$5# 9 "%>'3' $*96#(%"# 3'(% $#2/$5#*3# #4*',#1'(# "%> (% 9.&#F%3/ 9 :'&% $/A5"%&$,/ /,&98%-%, ,#/ :5/ (% ,*'3'6,' '3A/&2#1'/3' $'$5%2 9 ,/2% $% 4/7#1' / 4#1'(%35'2# $,94E#(9, $,*#7':5% ' /)&#F9(9, # .%3%&'$#-% '0"%:5#(# (% $#2/ (%73# /7 23/.'+ A93,1'(# $'$5%2#. H395#& /"'+ 4/2%395'+ 4/7&96(#, "&*/ 9$4%:3' %,$4%&'2%35' $% $4&/"/7% 0# $&4$,' "%0#3' 0# 4&%4/03#"#-% '2%3/"#3'+ %35'5%5#, ,#/ 7%*# 4&/)*%2# %,5&#,1'(% '3A/&2#1'(#. !6%,9(% $% 9)&0#3' &#0"/( $'$5%2# 0# %,5&#,1'(9 '3A/&2#1'(# ' /7./"#&#-% 3# 4'5#-#, '2#(9>' 9 "'79 /4$%. '0.&#F%3'+ 2/&A/*/:,'+ &%63',# ' */,#*3'+ .&#2#5',#. ?/$5/(% ' 7&9.# 4/7&96(# 3# ,/('2# $% 4&'2%-9(9 (%0'6,% 5%+3/*/.'(%. <%73/ /7 -'+ (% /5,&'"#-% 4*#.'(#&'0#2#, ,/(%

Language Technology Support for Serbian

38 INTERNAL DRAFT

,/&'$5' (%0'6,' 3%0#"'$3% 5%+3/*/.'(%, #*' $% 2/8% 4&/:'&'5' 4&%5&#./2 (%73/$5#"3'2 4#&#A&#0#2# 5%,$5#. O$5&#8'"#-% ,/(% '7% 9 /"/2 4&#"19 0# 3#963% 6*#3,% 9 N&)'(% (% &%#*'0/"#*# ,/2#43'(# CEON41.

!"#$:," 3";.)<)6$8" & )K(0#)'0F& <%0'6,% 5%+3/*/.'(% $9 '35%&7'$1'4*'3#&3/ 4/7&96(% ,/(% 0#+5%"# 03#-# 23/.'+ $5&96-#,#, *'3."'$5#, $5&96-#,# 0# &#693#&$5"/, 2#5%2#5'6#&#, A'*/0/A#, 4$'+/*'3."'$5# ' neuroscientists, 7# 4/2%3%2/ $#2/ 3%,%. L#/ 5#,"/, /3/ (/: 3'(% 7/)'*/ $5#*39 4/0'1'(9 9 "'$/,/2 /)&#0/"#-9 9 N&)'(' ' 9.*#"3/2 (% /.&#3'6%3/ 3# 4/(%7'3#63% ,9&$%"% 9 /,"'&9 /4:5'('+ 4/$57'4*/2$,'+ $597'($,'+ 4&/.&#2#. ?#&#7/,$#*3/, 94&,/$ /"#,"/2 $5#-9, 9 /,"'&9 '$5&#8'"#6,% $5#3'1% ?%53'1#42 $% $"#,% ./7'3% /&.#3'09(9 2#*' '$5&#8'"#6,' $%2'3#&' 0# $&%7-/:,/*1% $# 5%2#2# '0 &#693#&$,% *'3."'$5',%. S# 3'"/9 93'"%&0'5%5$,'+ $597'(#, 5%2% '0 /)*#$5' &#693#&$,% *'3."'$5',% $9 4&'$953% 3# $597'(#2# '0 &#693#&$5"#, %*%,5&/3',%, )')*'/5%,#&$5"#, *'3."'$5',% ' 4$'+/*/.'(% ' 5/ 3# 93'"%&0'5%5'2# 9 I%/.&#79 ' S/"/2 N#79. ?&%72%5' ,/(' $9 4/39F%3' $597%35'2# 7#(9 /$3/"3% 4/(2/"% / 4&/1%$9 /)&#7% 4&'&/73'+ (%0',#, #*' $9 9 A93,1'(' A/&2'&#-# $597%3#5# 0# 7&9,6'(% 4&/A'*%. S# M#5%2#5'6,/2 A#,9*5%59 9 I%/.&#79, 3# &%7/"3'2 $597'(#2# $9 4&'$953' ,9&$%"' '0 *%,$'6,% #3#*'0% ' '$,/4#"#-# '0 5%,$5/"#, 4/&%7 ,9&$%"# ,/(' /)&#F9(9 A937#2%35#*3# 2#5%2#5'6,# 03#-# 4/5&%)3# 9 /)&#7' 4&'&/73'+ (%0',# (4/$%)3/ $5#5'$5',#, #*.%)&# ' */.',#), 7/, 3# 7/,5/&$,'2 $597'(#2# 4/$5/(' "%>' '0)/& 4&%72%5# '0 /)*#$5' 5%+3/*/.'(# 4&'&/73'+ (%0',#. S#(5%2%E3'(% /)&#0/"#-% 3# /"/2 4/7&96(9 $5'69 $597%35' P&94% 0# )')*'/5%,#&$5"/ ' '3A/&2#5',9 3# T'*/*/:,/2 A#,9*5%59 9 I%/.&#79, 7/, 3# 7&9.'2 .&94#2# 5/, A#,9*5%5# '2#(9 3#("':% (%7#3 9"/73' ,9&$. H /,"'&9 $597'(# $&4$,/. (%0',# 3'(% 4&%7"'F%3/ /)&#0/"#-% 9 4/7&96(9 /)&#7% 4&'&/73'+ (%0',#. S# T'*/0/A$,/2 A#,9*5%59 9 I%/.&#79 ' S/"/2 N7#9, 3# .&94#2# 0# 4$'+/*/.'(9 4/$5/(% ,9&$%"' '0 4$'+/*'3."'$5',% 3# ,/('2# $% $597%35' 94/03#(9 $# $5#5'$5'6,'2 2%5/7#2# 2%5/7#2# /)&#7% (%0',#. S# 5%+3'6,'2 A#,9*5%5'2# $% '096#"#(9 2%5/7% /7 03#6#(# 0# /)&#79 ./"/&#. L9&',9*92 ,/(' 7#(% $4%1'(#*3/$5 9 7/2%39 &#693#&$,% *'3."'$5',% '*' (%0'6,'+ 5%+3/*/.'(# 3% 4/$5/(' 3' 3# (%73/2 /7 A#,9*5%5#.

5()6(0-$ #0 8"#$:," 3";.)<)6$8" O379$5&'(# (%0'6,'+ 5%+3/*/.'(# (% 9 N&)'(' &%*#5'"3/ 3%&#0"'(%3# 9 4/&%F%-9 $# "/7%>'2 %"&/4$,'2 %,/3/2'(#2# '0 "':% &#0*/.#. P*#"3# 4/,&%5#6,# $3#.# '0# &#0"/(# (%0'6,'+ 5%+3/*/.'(# 9 N&)'(' $9 9.*#"3/2 7/2#># 2#*# ' $&%7-# 4&%790%>#, #*' ' 3%,% $5&#3% ,/2#43'(%, ,/(' 4/3%,#7# /)%0)%F9(9 4/7&:,9 0# $&4$,' (%0', 9 &#03/"&$3'2 #4*',#1'(#2# ,/(% 5&#8% 4/7&:,9 (%0'6,'+ 5%+3/*/.'(#. L#,/ 3% 4/$5/(' 3#1'/3#*3' 41 http://ceon.rs/index.php?option=com_content&task=view&id=224&Itemid=106 42 http://www.petnica.rs/

Language Technology Support for Serbian

39 INTERNAL DRAFT

4&/.&#2 4/7&:,% &#0"/(9 (%0'6,'+ 5%+3/*/.'(#2 -'+/" &#0"/( ' 4&'2%3# $% /7"'(#(9 6%$5/ 3# 3%,//&7'3'&#3 3#6'3. ?/$5/(% )#& 5&' 4&#"1# ,/('2# $% (%0'6,% 5%+3/*/.'(% 9"/7% 9 N&)'(9 (#) ,&/0 7&8#"3% 3#963% ' &#0"/(3% 4&/(%,5%, ()) 4&%,/ (4&"%3$5"%3/) $5&#3'+ A'&2' ,/(% 90 &#693#&$,9 /4&%29 4&98#(9 ' /7&%F%3' /)*', (%0'6,% 4/7&:,% ' (") ,&/0 '35%&3' &#0"/( 9 /,"'&9 7/2#>'+ /&.#3'0#1'(# ,#,"% $9 34&. '07#"#6,% ,9>% '*' 4&%"/7'*#6,% #.%31'(%. B,5'"3/$5' 9 /"# 5&' 4&#"1# $% /7"'(#(9, /$'2 '090%53/, 3%0#"'$3/ (%73% /7 7&9.'+. N 7&9.% $5&#3%, &#693#&$,' 4'$2%3/ $5#3/"3':5"/ 9 N&)'(' (% 3#"',*/ 7# ,/&'$5' .&#A'6,9 ,/&'$3'6,9 $92%F9 3# %3.*%$,/2 (%0',9 '#,/ 3%,' /7 -'+ 2/87# ' 3% 03#(9 %3.*%$,'. R/,#*'0/"#3% "%&0'(% -'2# 4/3%,#7 '0.*#7#(9 6973% ' 3%4&1'03% ' 3'$9 "/E3' 7# '+ ,/&'$5%. <%73'3% #4*',#1'(% ,/(% 9 "%*',/2 )&/(9 ,/&'$5% .&#A'6,9 ,/&'$3'6,9 $92%F9 3# $&4$,/2 $9 &#0*'6'5% 4/$*/"3%, A'3#3$'($,% ' &#693/"4/7$5"%3% #4*',#1'(% 9,E969(96' ' SAP ERP $'$5%2. O4#,, '2# 4&'2%&# */,#*'0/"#3% .&#A'6,% ,/&'$3'6,% $92%F% 4/03#5'+ $/A5"%&$,'+ 4&/7#"#1# ,#/ :5/ (% Microsoft (34&. MS Windows, MS Office), Google '*' Oracle (Open Office43). S#963' 4&/(%,5' ,/(% A'3#3$'&# M'3'$5#&$5"/ 0# /)&#0/"#-% ' 3#9,9 5%, 9 3#(3/"'(%2 1',*9$9 3#963'+ 4&/(%,#5# (4%&'/7 2011-2014) 4&%4/03#(9 '35%&7'$1'4*'3#&3/$5. D/ 201/. ./7'3% 3#963' 4&/(%,5' (4# 5'2% ' ,&'5%&'(92' 0# -'+/"9 %"#*9#1'(9) $9 )'*' /:5&/ &#07"/(%3' 3# 4/7&96(# 2#5%2#5',% (,/2% (% 4/7&%F%3/ &#693#&$5"/), (%0',# ' 5%+3/*/:,'+ 7'$1'4*'3#. H 5#,"/2 #2)'(%359 (% )'*/ 5%:,/ &%#*'0/"#5' 4&'&/73' $4/( 7'$1'4*'3# ,/(% $9 9 /$3/"' &#0"/(# (%0'6,'+ 5%+3/*/.'(#. H /"#,"/2 ,/35%,$59 )'*/ (% 3%/4+/73/ 9$4/$5#"'5' "%0% '02%F9 '$5&#8'"#-# 3# 4/7&96(9 $&4$,/. (%0',# ' '3A/&2#5',%. ?&"' 5#,#" 4&/(%,#5, 4/7 3#0'"/2 „O35%&#,1'(% 5%,$5# ' &%63',#“, (% A/&2'&#3 2002. ./7'3% ,#/ 0#(%73'6,' 4&/(%,#5 ,#5%7#&# 0# $&4$,' (%0', T'*/*/:,/. A#,9*5%5# 9 I%/.&#79 ' T'*/0/A$,/. A#,9*5%5# 9 S/"/2 N#79 ' M#5%2#5'6,/. A#,9*5%5# 9 I%/.&#79. H /,"'&9 /"/. 4&/(%,5# (% A/&2'&#3 4&"' ,/&49$ $#"&%2%3/. $&4$,/. (%0',#44, 7/$594#3 4&%,/ "%)#, # ,/(' 7#3#$ '2# 4&%,/ 300 ,/&'$3',# $# &#0*'6'5'+ 93'"%&0'5%5# ' '3$5'595# 9 0%2E' ' '3/$5&#3$5"9. H /,"'&9 /"/. 4&/(%,5# (% 0#4/6%5# ' ,/3$5&9,1'(# %*%,5&/3$,/. 2/&A/*/:,/. &%63',# $&4$,/. (%0',# 4&%2# 50". LADL A/&2#59. !"#( 4&/(%,#5 (% 3#$5#"E%3 ,#/ 0#(%73'6,' 4&/(%,#5 L#5%7&% 0# $&4$,' T'*/*/:,/. A#,9*5%5# 9 I%/.&#79 ' M#5%2#5'6,/. A#,9*5%5# ' 9 4%&'/79 2006. 7/ 2010. 4/7 3#0'"/2 «=%/&'($,/-2%5/7/*/:,' /,"'& 0# 2/7%&3'0#1'(9 /4'$# $&4$,/. (%0',#» ' /7 2011. 7/ 2014. ,#/ «N&4$,' %0', ' -%./"' &%$9&$': 5%/&'(#, /4'$ ' 4&'2%3%». L&/0 /"% 4&/(%,5% (% 7/"&:%3# ,/3$5&9,1'(# %*%,5&/3$,/. &%63',# 4&/$5'+ &%6' ' 0#4/6%5 &#7 3# ,/3$5&9,1'(' &%63',# $*/8%3'+ &%6', &#0"/(%3' $9 43 R/,#*'0#1'(9 OpenOffice-# (% A'3#3$'&#*/ 9 4%&'7/9 /7 2008. 7/ 2011. ./7'3% M'3'$5&#$5"/ 0# 5%*%,/293',#1'(% ' '3A/&2#5'6,/ 7&9:5"/ ,&/0 4&/(%,#5 M#5%2#5'6,/. A#,9*5%5#: http://ooo.matf.bg.ac.rs/ 44 http://www.korpus.matf.bg.ac.rs/

Language Technology Support for Serbian

40 INTERNAL DRAFT

4#&#*%*3' A&#319$,/-$&4$,' ' %3.*%$,/-$&4$,' ,/&49$ *'5%&#&3'+ 5%,$/5"#, /4'$#3% $9 */,#*3% .&#2#5',% 0# 4/(%7'3% $%.2%35% $&4$,/. (4/$%)3/ 0# '2%3/"#3# %35'5%5#) ,#/ ' &#0*'6'5' $/A5"%&$,' #*#5' /7 ,/('+ 4/$%)#3 03#6#( '2# &#73# $5#3'1# LeXimir ,/(# /2/.9>#"# '35%.&#1'(9 ' 5&#3$A/&2#1'(9 +%5%&/.%3'+ *%,$'6,'+ &%$9&$#. H4/&%7/ $# /"'2 '$5&#8'"#-'2# 9 /)*#$5' (%0',#, 9 /)*#$5' 7&9:5"%3'+ 3#9,# (% A'3#3$'&#3 4&/(%,#5 «T937#2%35#*3' ,/.3'5'"3' 4&/1%$' ' A93,1'(%» ,/(' (% &%#*'0/"#3 3# L#5%7&' 0# 4$'+/*/.'(9 T'*/0/A$,/. A#,9*5%5# 9 I%/.&#79, # ,/(' (%, 4/&%7 /$5#*/. '2#/ 0# 1'E 7# '$4'5# 2/.9>3/$5 #95/2#5$,% #3/5#1'(% 5%,$5# 4/*#0%>' /7 #3/5'&#3/. ,/&49$#45, &#0"'(%3/. (/: 5/,/2 4%7%$%5'+ ./7'3#2, # 7%"%7%$%5'+ 4&%"%7%3/. 9 %*%,5/&3$,' /)*',. N'35%0# ' 4&%4/03#"#-% ./"/&# 3# =%+3'6,/2 A#,9*5%59 H3'"%&0'5%5# 9 S/"/2 N#79 $% &%#*'09(% ,&/0 4&/(%,5% 5%+3/*/:,/. &#0"/(# 4/6%" /7 2005. ./7'3% ' 5/ «C#0"/( ./"/&3'+ 5%+3/*/.'(# 3# $&4$,/2 (%0',9 ' -'+/"# 4&'2%3# 9 ‘=%*%,/29 N&)'(#’» (2005-2007), «P/"/&3# ,/293',#1'(# 6/"%,-2#:'3#» (2008-2010), «C#0"/( 7'(#*/:,'+ $'$5%2# 0# $&4$,' ' 7&9.% (983/$*/"%3$,% (%0',%» (2011-2014). !3' 4&98#(9 4/7&:,9 &#0*'6'5'2 #4*',#1'(#2# ' $%&"'$'2# 0# 4&%5"#&#-% 5%,$5# 9 ./"/& ' #95/2#5$,/ 4&%4/03#"#-% ./"/&#, ,/(' 9,E969(9 $'$5%2% 0# '35%&#,5'"3% .*#$/"3% /7./"/&% (IVR), 4/$*/"3% 5%*%A/3$,% $'$5%2%, 4/0'"3% 1%35&%, 4&'(#"E'"#-% .*#$/2, 4&#>%-% &%,*#2#, 9/6#"#-% &%6', ' 7&. H /,"'&9 7&9.'+ /)*#$5' 3#9,% $9 &#0"'(#3' 4/(%7'3#63' &%$9&$' /7 03#6#(# 0# (%0'6,% 5%+3/*/.'(%, #*' )%0 3%4/$&%73% '35%&#,1'(% $# ./&% 3#"%7%3'2 4&/(%,5'2#. ?/2%3'2/ ,#/ 4&'2%&% .%/*/:,' $&4$,/-%3.*%$,' 5%0#9&9$46 ' A/*,*/&'$5'6,9 3#09 DBIO I#*,#3/*/:,/. '3$5'595# NBSH.47 H4/&%7/ $# 3#1'/3#*3'2 4&/(%,5'2#, $&4$,% 3#963% '3$5'591'(% $9 )'*% 9,E96%3% ' 9 &#0*'6'5% 2%F93#&/73% 4&/+(%,5% "%0#3% 0# 4/7&96(% (%0'6,'+ 5%+3/*/.'(#. =/,/2 4%&'/7# $#3,1'(# H(%7'-%3'+ 3#1'(#, /7&8#"#-% /7&%F%3/. 3'"/# #,5'"3/$5' (% )'*/ 2/.9>% 0#+"#E9(9>' 96%:>9 9 4&/(%,5'2# TELRI I ' II.48 O#,/ $&4$,% '$5&#8'"#6,% .&94% 9 5/ "&%2% 3'$9 2/.*% 7# 96%$5"9(9 3# 4&/(%,5949 /3% $9 '4#, 4&/'"%*% ,/&'$3% &%$9&$% 9 A/&2#59 ,/(' (% 5#( 4&/(%,#5 7%A'3'$#/: 2/&A/$'35#,$'6,' /4'$ $&4$,/. (%0',#, 4/&#"3#59 "%&0'(9 $&4$,/. 4&%"/7# &/2#3# 1984 Z/&[# !&"%*#, -%./"9 *%2#5'0'&#39 ' 2/&A/$'35#,$'6,' %5',%5'&#39 "%&0'(9 ' '$1&4#3 &%63', ,/(' 4/,&'"# ,/24*%539 *%,$',9 &/2#3# 1984.

45 http://www.serbian-corpus.edu.rs/ns/eindex.htm 46 http://www.rgf.bg.ac.rs/geolissterm/Index.aspx 47 http://www.balkaninstitut.com/srp/projekti/sikimic/stratifikacija_balkana.html 48 http://telri.nytud.hu/ 49 http://nl.ijs.si/ME/

Language Technology Support for Serbian

41 INTERNAL DRAFT

?/ 9,'7#-9 $#3,1'(#, 4/$%)3/ (% 03#6#(#3 )'/ 4&/(%,5# BalkaNet50 ,/(' (% /2/.9>'/ &#0"/( $%2#35'6,% 2&%8% 5'4# WordNet 0# $&4$,'. L&/0 )'*#5%&#*39 $#&#7-9 $# T&#319$,/2 (% &#0"'(%3 $4&$,' 7%/ "':%(%0'63% *%,$'6,% )#0% "*#$5'5'+ '2%3# Prolex51, # 9 /,"'&9 4&/(%,5# Intera (%73/2'*'/3$,' 4#&#*%*'0/"#3' %3.*%$,/-$&4$,' ,/&49$ ,/(' (% *%2#5'0'&#3 ' 2/&A/*/:,' #3/5'&#3. !"#( ,/&49$ (% 4/$*98'/ 0# /)96#"#-% 5#.%&# ' 0# %,$4%&'2%35% 9 4/&#"3#"#-9 3# 3'"/9 &%6' ' 9 #95/2#5$,/2 4&%"/F%-9. N&4$,' 96%$3'1' $9 )'*' 9,E96%3' 9 7"# &%.'/3#*3# 4&/(%,5#. <%7#3 /7 -'+, SEE-ERA.NET - Building Language Resources and Translation Models for Machine Translation (O0.&#7-# (%0'6,'+ &%$9&$# ' 4&%"/7'*#6,'+ 2/7%*# 0# 2#:'3$,/ 4&%"/F%-%), )'/ (% 9$2%&%3 3# (983/$*/"%3$,% ' )#*,#3$,% (%0',% (ICT 10503 RP, 2007-2008). Y%./" .*#"3' 7/4&'3/$ )'/ (% &#0"/( (%73/$2%&3'+ 4&%"/7'*#6,'+ 2/7%*# ,/(' $% /$*#-#(9 3# "'6%(%0'6,% &%$9&$% "%*','+ 7'2%30'(#, 9 $5"#&' 3# ,/&9$ The Acquis Communautaire. M%F95'2, 4/:5/ 7/,92%35# ,/(# 9*#0% 9 /"#( &%$9&$ (/: 9 5/ "&%2% 3'$9 )'*# 4&%"%7%3# 3# $&4$,' 4&%"/7'*#6,' 2/7%* 3'(% )'/ 4&/'0"%7%3 0# $&4$,'52. N# $"/(% $5&#3% (% $&4$,' 5'2 7/4&'3%/ &#0"/(%2 7&9./. "':%(%0'63/. &%$9&$# ,/(' $% 0#$3'"# 3# &/2#39 ]'*# G%&3# (5* !9! 80)*$ 1$ !8$72)8)* 2$"$ (9 5/2 5&%395,9 )'*/ (% 9,E96%3/ 16 (%0',#). D&9.' 4&/(%,#5 )'/ (% WISE - An Elec-tronic Marketplace to Support Pairs of Less Widely Studied European Languages (;*%,5&/3$,/ 5&8':5% 0# 4/7&:,9 4#&/"'2# 2#-% '096#"#3'+ %"&/4$,'+ (%0',#) 6'(' (% 1'E )'*# 4&/'0"/7-# 3% $#2/ "':%(%0'63'+ *%,$'6,'+ &%$9&$# /)/.#>%3'+ *'3."'$5'6,'2 2%5#4/7#1'2#, "%> ' '0.&#7-# ' 4&/2/1'(# %*%,5&/3$,/. 5&8':5# 0# $*#)'(% '096#"#3% )*#,3#$,% (%0',%, 9,E969(9>' ' $4&$,' (BSEC 009 / 05.2007, 2007 - 2008). D#E% #,5'"3/$5' 4/7&#092%"#(9, 4&% $"%.#, &#0"/( 4/$594#,# 0# $'35#,$'6,9 #3#*'09 $&4$,/. ,/(# (%, $ /)0'&/2 #3 $*/)/7#3 &%7 &%6' ' 2/&A/*/:,/ )/.#5$5"/, '090%53/ $*/8%3 4/$594#,. !"/ 4/7&#092%"# &#0"/( 3/"'+ &%$9&$#, 4&% $"%.# 3/"'+ 5'4/"# &%63',# ' ,/&49$#, ,#/ ' 4&#5%>'+ #*#5#.

M)+3&*.)+3 0<030 $ ("+&(+0 #0 +(*+,$ N*%7%># 5#)%*# 7#(% 4&',#0 5%,9>%. $5#-# 4/7&:,% (%0'6,'+ 5%+3/*/.'(# 0# $&4$,'. C#3.'&#-% 4/$5/(%>'+ #*#5# ' &%$9&$# $% 0#$3'"# 3# 96%3/( 4&/1%3' "':% "/7%>'+ %,$4%&#5# ,/&':>%-%2 $*%7%>'+ ,&'5%&'(92# ($"#,' (% 9 /4$%.9 /7 0 7/ 6).

1. 8'#DP5P/P: D# *' #*#5/&%$9&$ 4/$5/(' 0# (%0', / ,/2% $% ./"/&'? Q5/ "':% #*#5#/&%$9&$# 4/$5/(', &#3.'&#-% (% )/E%.

• 0: 3%2# 3',#,"'+ #*#5#/&%$9&$#

50 http://cordis.europa.eu/ictresults/index.cfm?section=news&tpl=article&ID=73737 51 http://www.cnrtl.fr/lexiques/prolex/ 52 ?&%"/F%-% %"&/4$,% 0#,4/3$,% &%.9*#5'"% (% 9 5/,9, # 7%/ 4&%"%7%3/. 2#5%&'(#*# $% 2/8% "'7%5' 3# http://prevodjenje.seio.gov.rs/evroteka/index.php?jezik=srpc

Language Technology Support for Serbian

42 INTERNAL DRAFT

• 6: 4/$5/(' 23/./ &#03/"&$3'+ #*#5#/&%$9&$# 2. *HNPTJDHNP: D# *' $9 #*#5'/&%$9&$' 7/$5943', 5/ (%$5, 7# *' $9 9 4/5"/&%3/. ,/7#, $*/)/73/ 7/$5943' 3# )'*/ ,/(/( 4*#5A/&2', '*' $9 7/$5943' $#2/ 0# "'$/,9 1%39 '*' 4/7 "%/2# /.&#3'6%3'2 9$*/"'2#?

• 0: 4&#,5'63/ $"' #*#5'/&%$9&$' $9 7/$5943' $#2/ 0# "'$/,9 1%39

• 6: "%*',' )&/( #*#5#/&%$9&$# (% $*/)/7#3 ' 7/$594#3 4/7 &#0923'2 *'1%31#2# /5"/&%3/. ,/7# '*' Crea-tive Commons ,/(% 7/0"/E#"#(9 4/3/"3/ ,/&':>%-% ' 4&%3#2%39.

3. 8'#;5P/P: L/*',/ 7/)&/ 3#()/E' 7/$594#3 #*#5, #4*',#1'(# '*' &%$9&$ /7./"#&# ,&'5%&'(92'2# 0# 4%&A/&2#3$% #*#5# /73/$3/ '37',#5/&'2# ,"#*'5%5# &%$9&$#? D# *' $9 /"' #*#5'/&%$9&$' 5%,9>' ' 7# *' $% #,5'"3/ /7&8#"#(9?

• 0: &%$9&$/#*#5 &97'2%35#&3'+ 2/.9>3/$5' • 6: #*#5 "'$/,/. ,"#*'5%5#, &%$9&$ $# ,"#*'5%53'2 #3/5#1'(#2# (,/(% $9 93%*' E97' '*' $9 '$5/ 5/*',/ 7/)&%)

4. IH9L5'/DHNP: H ,/(/( 2%&' 3#()/E' #*#5 0#7/"/E#"# /7./"#&#(9>' ,&'5%&'(92 4/,&'"%3/$5' ($5'*/"', 8#3&/"', "&$5% 5%,$5/"#, *'3."'$5'6,' A%3/2%3', 5'4/"' 9*#0#/'0*#0#, )&/(% (%0',# ,/(% 4/7&8#"# $'$5%2 0# 2#:'3$,/ 4&%"/F%-%, '57)? H ,/*',/( 2%&' $9 &%$9&$' &%4&%0%35#5'"3' 0# 1'E3' (%0', '*' 0# 4/7(%0',%?

• 0: &%9$& '*' #*#5 $4%1'(#*3% 3#2%3%, 4/$%)#3 $*96#(, "%/2# 2#*# 4/,&'"%3/$5, 2/8% 7# $% ,/&'$5' $#2/ 0# "&*/ $4%1'A'63% $*96#(%"%

• 6: &%$9&$ "%/2# :'&/,% 4/,&'"%3/$5', "%/2# &/)9$5# #*#5, '2# :'&/,9 4&'2%39, 4/7&8#"# 23/./ (%0',#

5. 2L/;HNP: M/8% *' $% #*#5/&%$9&$ $2#5&#5' 0&%*'2, $5#)'*3'2 $4&%23'2 0# 5&8':5%? M/8% *' $% 3#()/E' #*#5/&%$9&$ ,/&'$5'5' 5#,#" ,#,#" (% '*' 2/&# 7# $% 4&'*#./F#"#? D# *' (% 4%&A/&2#$# 5#,"% 5%+3/*/.'(% #7%,"#53# ' $4&%23# 0# 4&/'"/7-9 '*' (% 9 4'5#-9 $#2/ 4&/5/5'4 ,/(' 3% 2/8% 7# $% ,/&'$5' 0# 4&/'0"/7-9? <%7#3 '37',#5/& 2/8% 7# )97% 7# *' (% &%$9&$ (#*#5 :'&/,/ 4&'+"#>%3 9 0#(%73'1' ' 9$4%:3/ $% ,/&'$5' 9 $'$5%2'2# ,/(' ,/&'$5% (%0'6,% 5%+3/*/.'(%.

• 0: 4/6%53' 4&/5/5'4, &%$9&$ ,/(' $*98' ,#/ 4&'2%& '*' (% '0..&#F%3 ,#/ "%8)#, $'$5%2 &97'2%35#&3'+ 2/.9>3/$5', $'$5%2 ,/(' 7/,#09(% 3%,' ,/31%45

• 6: ,/24/3%35# ,/(# $% 2/8% /72#+ '35%.&'$#5' '*' 9.&#7'5'

6. G+L15'HNP: L/*',/ 7/)&/ $% #*#5/&%$9&$ 2/>% /7&8#"#5' '*' '35%.&'$#5' 9 5%,$9>' $'$5%2 0#$3/"#3 3# '3A/&2#1'/3'2 5%+3/*/.'(#2#? D# *' #*#5/&%$9&$ 0#7/"/E#"# /7&%8%3' 3'"/ /7&8'"/$5' 9 4/.*%79 7/,92%35#'1(% ' 4&'&963',#, /)(#:-%-# $*96#(%"# 94/5&%)%, '2# *' .&#A'6,' ,/&'$3'6,' '35%A%($, '57? D# *' ,/&'$5' $5#37#&7% ' 4&/.&#2$,# /,&98%-# ,/(# $9 $% 3#()/E% 4/,#0#*# 9 4&#,$' (,#,"/ (% Java EE)? ?/$5/(% *'

Language Technology Support for Serbian

43 INTERNAL DRAFT

'379$5&'($,' ' '$5&#8'"#6,' $5#37#&7' ' #,/ 4/$5/(% 7# *' '+ $% #*#5/&%$9&$ 4&'7&8#"# (34&. A/&2#5' 4/7#5#,#)?

• 0: 4/5493/ "*#$3'6,', ad hoc A/&2#5' 4/7#5#,# ' API-'

• 6: 9 4/5493/$5' $% 4&'7&8#"# $5#37#&7#, $ 4/5493/2 7/,92%35#1'(/2

7. IL5;#)H+=5'HNP: H ,/*',/( 2%&' $% 3#()/E' #*#5 '*' &%$9&$ 2/8% 4&'*#./7'5' '*' 4&/:'&'5' 3# 3/"% 0#7#5,%, 7/2%3%, 8#3&/"%, 5'4/"% 5%,$/5"#, $*96#(%"% 4&'2%3%, ' $*'63/?

• 0: #*#5/&%$9&$ $% 4&#,5'63/ 3% 2/8% #7#45'&#5' 0# 7&9.' 0#7#5#,, 5/ $% 3% 2/8% 9&#7'5' 6#, ,#7# $9 3# &#$4/*#.#-9 03#6#(3' &%$9&$' ' 2%$%1' &#7# E97'

• 6: 3'"/ 4&'*#./7E'"/$5' (% "%/2# "'$/,; #7#45#1'(# 2/.9>#, *#,# ' %A',#$3#

Language Technology Support for Serbian

44 INTERNAL DRAFT

?0K"<0 0<030 $ ("+&(+0

8'#DP5P/P

*HNPTJDHNP

8'#;5P/P

IH9L5'/DHNP

2L/;HNP

G+L15'HNP

IL5;#)H+=5'HNP

6/35\9/ P/XDH;H)57/ (#;#P5, P/XDH;H)57/, #J;59#5Z7/) OH9/D53#Z57#, ?HLVH;H)57# (5/,%3'0#1'(#, 5#.'&#-% "&$5% &%6', 2/&A/*/:,# #3#*'0#/.%3%&'$#-%) 4 3 5 5 5 4 4

I#LN5L#F/ (4*'5,# '*' 79)/,# $'35#,$'63# #3#*'0#) 1 2 5 3 2 2 2 M/?#DP59# L/\/D5Z/ (&#0&%:#"#-% 03#6%-# &%6', $5&9,59&# &#.92%3#5#, $%2#35'6,% 9*/.%) 0 0 0 0 0 0 0

M/?#DP59# P/9NP# (&#0&%:#"#-% ,/-&%A%&%31', ,/35%,$5, 4&#.2#5',#, 0#,E96'"#-%) 0 0 0 0 0 0 0

Advanced Discourse Processing (text structure, coherence, rhetorical structure/RST, argumentative zoning, argumentation, text patterns, text types etc.)

0 0 0 0 0 0 0

ILHD#;#1/F/ 5DVHL?#Z57# ('37%,$'&#-% 5%,$5#, 29*5'2%7'(#*3/ ?O, "':%(%0'63' ?O) 3 1 3 3 2 2 3

.9NPL#9Z57# 5DVHL?#Z57# (4&%4/03#"#-% '2%3/"#3'+ %35'5%5#, %,$5&#,1'(# 7/.#F#(# ' &%*#1'(#, 4&%4/03#"#-% 2':E%-# ' /$%>#-#, '$,/4#"#-% '0 5%,$5# ' #3#*'5',#)

1 2 2 2 3 2 3

(/D/L5N#F/ 7/359# (.%3%&'$#-% &%6%3'1#, .%3%&'$#-% '0"%:5#(#, .%3%&'$#-% 5%,$5#) 0 0 0 0 0 0 0

MT?#L53#Z57#, G+)H'#L#F/ D# J5P#F#, D#JL/+D/ P/XDH;H)57/ JL5NPTJ# 5DVHL?#Z57#?#

1 1 0 1 0 1 1

>#`5DN9H JL/'H-/F/ 1 1 0 1 0 1 1 IL/JH3D#'#F/ )H'HL# 2 2 1 1 1 1 0 M5DP/3# )H'HL# 2 2 4 4 5 5 1 Dialogue Management (dialogue capabilities and user modelling) 0 0 0 0 0 0 0

6/35\95 L/NTLN5 (K/NTLN5, JH+#Z5, %#3/ 3D#F#) K/V/L/DPD5 9HLJTN5 2 4 2 4 4 4 4 M5DP#9ND5 9HLJTN5 (treebanks, dependency banks) 0 0 0 0 0 0 0

M/?#DP5\95 9HLJTN5 0 0 0 0 0 0 0 *5N9TLND5 9HLJTN5 0 0 0 0 0 0 0 I#L#;/;D5 9HLJTN5, JL/'H+5;#\9/ ?/?HL57/ 3 3 3 2 2 2 3

(H'HLD5 9HLJTN5 (3%/)&#F%3' ./"/&3' 4/7#1', /)%*%8%3' '*' #3/5'&#3' ./"/&3' 4/7#1', ./"/&3' 7'(#*/:,' 4/7#1')

1 2 4 4 3 3 3

>T;P5?/+57#;D5 5 ?T;P5?H+#;D5 JH+#Z5 (5%,$59#*3' 4/7#1' ,/2)'3/"#3' $# #97'/ ' "'7%/ 4/7#1'2#)

1 2 2 1 2 1 2

6/35\95 ?H+/;5 1 3 2 3 2 2 3

Language Technology Support for Serbian

45 INTERNAL DRAFT

:/9N59HD5, P/L?5DH;H)57# 2 3 4 4 3 3 3 (L#?#P59/ 1 1 0 1 0 1 1 O/3#TLTN5, WordNet 2 4 3 2 4 2 4 GDPH;H`95 L/NTLN5 3# 3D#F/ H N'/PT (3&4. upper models, Linked Data) 1 1 0 1 0 1 1

Language Technology Support for Serbian

46 INTERNAL DRAFT

N0,B&:%$ H /"/( $%&'(' )%*'+ ,-'.# 96'-%3' $9 4&"' 3#4/&' 7# $% /1%3' /4:5% $5#-% 23/.'+ %"&/4$,'+ (%0',# 4&%2# 4/7&:1' ,/(9 4&98#(9 0# (%0'6,% 5%+3/*/.'(% 3# 3#6'3 ,/(' /2/.9>#"# "'$/, 3'"/ -'+/"/. 4/&%F%-# ' '7%35'A',#1'(9 4&/9$5# ' 4/5&%)#. @# $&4$,' $% $5#-% &%$9&$# ' 5%+3/*/.'(# 2/8% /4'$#5' #3 $*%7%>' 3#6'3:

! Q5/ $% 5'6% 2/&A/*/:,'+ ' $ -'2# 4/"%0#3'+ 4'5#-#, 2/8% $% $*/)/73/ &%>' 7# (% 3'"/ &#0"/(# 5%+3/*/.'(# ' &%$9&$# 0#7/"/E#"#(9>', 9.*#"3/2 0#+"#E9(9>' 4/$5/(#-9 "%*',/. %*%,5&/3$,/. &%63',# ' */,#*3'+ .&#2#5',#. S%4/$&%73# 4/$*%7'1# 5/.# (% 7# $9 4/5&%)3' #*#5' 0# 4&/3#*#8%-% '3A/&2#1'(# ' %,5&#,1'(9 '3A/&2#1'(# 3# &#$4/*#.#-9. S%,' /7 &%63',# $9 $4&%23' 0# :'&/,9 94/5&%)9 7/, 3%,% (/: 5&%)# 7/.&#7'5', 3# 4&'2%& N&4S%5.

! C%A%&%353' ,/&49$ $#"&%2%3/. $&4$,/. (%0',# %,#"$,/. 7'(#*%,5# (% 3# &#$4/*#.#-9, ,#/ ' 3%,/*',/ 4/&#"3#5'+ ,/&49$#, ' $"' /3' $9 3# &#$4/*#.#-9 '$5&#8'"#6'2# $&4$,/. (%0',#. =%,9># '$5&#8'"#-# $9 9$&%7$&%F%3# 3# 7/.&#7-9 &%A%&%353/. ,/&49$# ' -%./"/ 4&/:'&/'"#-% '(%,#"$,'2 7'(#*%,5/2.

! P/"/&3% 5%+3/*/.'(% $9 /7)&/ &#0"'(%3% ' 3#:*% $9 :'&/,% 4/$*/"3% 4&'2%3%, #*' $% '$5&#8'"#-# 2/&#(9 :'&'5' 7# )' $% 4&/:'&'*# ' 4/E# 4&'2%3%.

! N/A"%& 3#2%-%3 4/"%>#"#-9 4&/79,5'"3/$5' *%,$',/.&#A# (% &#0"'(%3, #*' 4&'+"#5#-% 3/"'+ 5%+3/*/.'(# 9 5&#7'1'/3#*3/ /&'(%35'$#3/2 *%,$',/.&#A$,/2 /,&98%-9 (% 4&%4&%,# )&8%2 &#0"/(9 *%,$',/.&#A'(%.

! H 3%,'2 4/7&96('2# $9 /)#"E%3' 9$4%:3' %,$4%&'2%35' 9 $5&/./ '$5&#8'"#6,/2 /,&98%-9, ,#/ :5/ (% 4*'5,/ 4#&$'&#-%, $92#&'0#1'(#, 2#:'3$,/ 4&%"/F%-%, /35/*/:,' &%$9&$'. M%F95'2, 7/)'(%3' &%09*5#5' $9 (/: 9"%, 7#*%,/ /7 3'"/# &#0"/(# ,/(' (% 4/$5'.395 0# &#0"'(%3% %"&/4$,% (%0',%. ?#8-9 '$5&#8'"#6# 4&'"*#6% ' 29*5'2%7'(#*3' ' 29*5'2/7#*3' 7/,92%35', 4/$%)3/ 9 ,35%,$59 7'.'5#*'0#1'(% ,9*59&3/. 3#$*%F#.

! O2#(9>' 9 "'79 $*/8%3/$5 $&4$,% $'35#,$%, 4/7&96(# 0#$3/"#3# 3# 79)/,/2 4#&$'&#-9 (%73/$5#"3/ 3% :4/$5/(%: $%2#35',# &%6%3'1#, $%2#35',# 5%,$5#, .%3%&'$#-% (%0',#. @)/. 5/.# 3% 4/$5/(' 3' A/&2#*'0/"#3# $'35#,$# $&4$,/. :5/ /.&#3'6#"# &#0"/( $'35#,$3/ ' $%23#5'6,' #3/5'&#3'+ ,/&49$#. T/&2#*'0#1'(# $'35#,$% $&4$,/. (% 4&%2# 5/2% 3#(+'53'(' 0#7#5#, 0# 7#E' &#0"/( (%0'6,'+ 5%+3/*/.'(#.

Appendix

47 INTERNAL DRAFT

=>?@-A>? (META-NET) M;=B-S;= (% 2&%8# '0"&$3/$5' ,/(9 A'3#3$'&# ;"&/4$,# 93'(#. Y9 5&%3953/ 6'3' 44 6*#3#, ,/(' 4&%7$5#"E#(9 31 %"&/4$,9 0%2E9, ,/(% $9 9 7#E%2 5%,$59 3#"%7%3%. M;=B-S;= 4/7$5'6% 5%+3/*/:,' $#"%0 "':%(%0'63% ;"&/4% (Multilingual Europe Technology Alliance - META), 3#&#$5#(9>9 0#(%73'19 4&/A%$'/3#*#1# ' /&.#3'0#1'(# $# 4/7&96(# (%0'6,'+ 5%+3/*/.'(# '0 ;"&/4%.

Figure 7: D&8#"% 0#$594E%3% 9 M;=B-S;=H

META-NET $#&#F9(% $# 7%$%5#, 7&9.'+ "%*','+ '3'1'(#5'"#, ,#/ :5/ (% CLARIN, ,/(' 4&98# 4/7&:,9 7&9:5"%3'2 3#9,#2# 7# 9 ;"&/4' 9$4/$5#"% 3/"/ 4/E% 7%*/"#-# - D'.'5#*3% +92#3'$5'6,% 3#9,%. META-NET (% 4/$"%>%3 /$5"#&'"#-9 5%+3/*/:,'+ /$3/"# 0# 9$4/$5#"E#-% ' /7&8#"#-% 4&#"/. "':%(%0'63/. %"&/4$,/. '3A/&2#1'/3/. 7&9:5"# ,/(%

! /2/.9>#"# ,/293',#1'(9 ' $#&#7-9 3# "':% (%0',#,

! /)%0)%F9(% (%73#, 4&'$594 '3A/&2#1'(#2# ' 03#-9 $"'2 ,/&'$3'1'2# 3# -'+/"/2 2#5%&-%2 (%0',9,

! 397' 3#4&%73% 2/.9>3/$5' 92&%8%3% '3A/&2#1'/3% 5%+3/*/.'(% $"'2 .&#F#3'2# 4/ 4&'$594#63'2 1%3#2#.

M;=B-S;= $5'29*':% ' 4&/2/"':% "':%(%0'63% 5%++3/*/.'(% 0# $"% %"&/4$,% (%0',%. =+3/*/.'(% /2/.9>#"#(9 #95/2#5$,/ 4&%"/F%-%, 4&/'0"/7-9 $#7&8#(#, /)&#79 '3A/&2#'1(#, 94&#"E#-% 03#-%2 0# :'&/, &#$4/3 #4*',#'1(# ' 4&%72%53'+ /)*#$5'. M&%8# 8%*' 7# 4/)/E:# 5%,9>% 4&'$594%, 7# )' 7/:*/ 7/ )/E% ,/293',#1'(% ' $#&#7-% 2%F9 (%0'1'2#. ;"&/4E#3' '2#(9 '$5/ 4&#"/ 3# '3A/&2#'1(% ' 03#-% )%0 /)0'&# ,/('2 (%0',/2 $% $*98%.

?($ *(0'%0 0,%$8" =>?@-A>?@ M;=B-3%5 (% 4,&%395 1. A%)&9#&# 2010. $# 1'E%2 7# $% 93#4&%7' '$5&#8'"#-% (%0'6,'+ 5%+3/*/.'(#. O3'1'(#5'"# 4/7&8#"# ;"&/49 ,/(# 97&989(% 9 (%73/ 7'.'5#*3/ 5&8'5% ' '3A/&2#1'/3' 4&/$5/&. M;=B-S;= (% 4/,&%39/ "':% #,5'"3/$5' ,/(% 5%8%

META – The Multilingual Europe Technology Alliance

About META-NET

48 INTERNAL DRAFT

-%./"/2 1'E9. M;=B-GO@O<B, M;=B-CB@M;SB ' M;=B-ON=CB]OGBY; $9 5&' 4&#"1# #,1'(% /"% 2&%8%.

Figure 8: J($ 2()1+) )&+$/" ;LJC-7LJC

>.O@-&4246@ 4/7$5'6% 0#(%73'19 7'3#2'63'+ ' 95'1#(3'+ 0#'35%&%$/"#3'+ $5&#3# 7# $% 97&98% /,/ 0#(%73'6,% "'0'(% ' 0#(%73'6,/. $5&#5%:,/. '$5&#8'"#6,/. 4*#3# (Strategic Research Agenda - SRA). P*#"3' 0#7#5#, /"% #,5'"3/$5' (% 7# '0.#7' ,/+%&%3539 ' 4/"%0#39 0#(%73'19 0# (%0'6,% 5%+3/*/.'(% 9 ;"&/4' 4/"%09(9>' 4&%7$5#"3',% ,&#(-% &#$1%4,#3'+ ' &#03/"&$3'+ .&94# 0#'35%&%$/"#3'+ $5&#3#. H 4&"/( ./7'3' 4/$5/(#-# M;=B-S;=B, 4&%0%35#1'(% 3# FLaReNet A/&929 9 Q4#3'(', D#3'2# (%0'6,'+ 5%+3/*/.'(# 9 R9,$%2)9&.9, JIAMCATT-9 2010. 9 R9,$%2)9&.9, LREC-9 2010. 3# M*#5', EAMT-9 2010. 9 T&#319$,/( ' ICT-9 2010 9 I*%.'(' $9 )'*% 9$2%&%3% 3# (#"3% 3#$594%. ?&%2# 4&"'2 4&/1%3#2#, M;=B-S;= (% "%> $594'/ 9 ,/35,#5 $# "':% /7 2,500 $5&96-#,# 0# (%0'6,% 5%+3/*/.'(% 7# )' $# -'2# 4/7%*'/ $"/(% 1'E%"% ' "'0'(%. S# M;=B-T!CHMH 2010. 9 I&'$%*9, M;=B-S;= (% 4/7%*'/ 4/6%53% &%09*5#5% '0.&#7-% $"/('+ "'0'(# $# "':% /7 250 96%$3',#. H $%&'(' '35%&#,5'"3'+ $%$'(#, 7/)'(%3 (% /70'" 96%$3',# 3# "'0'(% ,/(% (% 2&%8# 4&%7$5#"'*#. >.O@-K@2>.C@ (META-SHARE) $5"#&# /5"/&%3%, :'&/,/ &#$4&/$5&#-%3% 4/./73/$5' 0# 0#(%73'6,/ ,/&':>%-% ' &#02%39 &%$9&$#. M&%8# &#"3/4&#"3'+ &%4/0'5/&'(92# (P2P) >% $#7&8#5' (%0'6,% 4/7#5,%, #*#5% ' "%) $%&"'$% ,/(' $9 7/,92%35/"#3' "&*/ ,"#*'5%53'2 2%5#-4/7#1'2# ' /&.#3'0/"#3' 4/ $5#37#&7'0/"#3'2 ,#5%./&'(#2#. C%$9&$'2# $% 2/8% 3%4/$&%73/ 4&'$594'5' ' $"' $% 2/.9 4&%5&#8'"#5'. D/$5943' &%$9&$' 9,E969(9 $*/)/7#3, /5"/&%3' 2#5%&'(#* #*' ' 2#5%&'(#* $# /.&#3'6%3'2 4&'$594/2, ,/(' (% $#2/ ,/2%&1'(#*3/ 7/$594#3 90 /7./"#&#(9>9 3/"6#39 3#7/,3#79. M;=B-CB@M;SB (% 0#'35%&%$/"#3# 0# 4/$5/(%>% (%0'6,% 4/7#5,%, #*#5% ' $'$5%2% #*' ' 0# 3/"% 4&/79,5% ,/(' 5%, 3#$5#(9 ,/(' $9 4/5&%)3' 0# '0.&#7-9 ' %"#*9#1'(9 3/"'+ 5%+3/*/.'(#, 4&/'0"/7# ' $%&"'$#. !7 $9:5'3$,/. (% 03#6#(# 2/.9>3/$5 4/3/"3/. ,/&':>%-#, ,/2)'3/"#-#, 4&%3#2%3% ' 4&%&#7% (%0'6,'+ 4/7#5#,# ' #*#5#. M;=B-CB@M;SB >% 3# ,&#(9 4/$#5#5' ,&91'(#*3' 7%/ 5&8':5# (%0'6,'+ 5%+3/*/.'(# 0# $5&96-#,% ,/(' '+ &#0"'(#(9, /)#"E#(9 */,#*'0#1'(9, 7&9.% '$5&#8'"#6%, ,#/ ' 4&%"/7'/1% ' (%0'6,% $5&96-#,% '0 2#*'+, $&%7-'+ ' "%*','+ 4&%790%>#. M;=B-CB@M;SB 4/,&'"# 1%*' &#0"/(3' 1',*9$ (%0'6,'+ 5%+3/*/.'(#, /7 '$5&#8'"#-# 7/ '0.&#7-% 3/"'+ 4&/79,#5# ' $%&"'$#. LE963' #$4%,5 /"'+ #,5'"3/$5' (% 9$5#3/"E#"#-% M;=B-CB@M;S; ,#/

About META-NET

49 INTERNAL DRAFT

"#83/. ' "&%73/. 7%*# %"&/4$,% ' .*/)#*3% '3A&#$5&9,59&% 0# 0#(%73'19 0# (%0'6,% 5%+3/*/.'(%. >.O@-4MOK@04&@E. (META-RESEARCH) 9$4/$5#"E# 2/$5/"% ,# &%*%"#353'2 $&/73'2 5%+3/*/:,'2 /)*#$5'2#. !"# #,5'"3/$5 !"# #,5'"3/$5 '2# 0# 1'E 7# '$,/&'$5' 3#4&%7#, 9 7&9.'2 /)*#$5'2# ' 7# /$5"#&' 7/)'5 /7 '3/"#5'"3'+ '$5&#8'"#-# ,/(# 2/.9 7/4&'3%5' &#0"/(9 (%0'6,'+ 5%+3/*/.'(#. !"# #,5'"3/$5 4/$%)3/ 5%8' 7# 93%$% "':% $%2#35'6,/. 03#-# 9 2#:'3$,/ 4&%"/F%-%, /45'2'09(% 4/7%*9 4/$*# ,/7 +')&'73/. 2#:'3$,/. 4&%"/F%-#, '$,/&'$5' ,/35%,$5 ,/7 '0&#693#"#-# #95/2#5$,'+ 4&%"/7# ' 4&'4&%2' %24'&'($,% )#0% 0# 2#:'3$,/ 4&%"/F%-%. M;=B-ON=CB]OGBY; $#&#F9(% $% 7&9.'2 4/E'2# ' 7'$1'4*'3#2#, ,#/ :5/ $9 2#:'3$,/ 96%-% ' $%2#35'6,' "%). M;=B-ON=CB]OGBY; $% 9$&%7$&%F9(% 3# 4&',94E#-% 4/7#5#,#, 4&'4&%29 $,94/"# 4/7#5#,# ' /&.#3'0#1'(9 (%0'6,'+ &%$9&$# 0# 4/5&%)% %"#*9#1'(%, 0#5'2 3# 0#5'2 3# $#$5#"E#-% '3"%35#&# #*#5# ' 2%5/7#; ' ,/3#63/ 3# /&.#3'0/"#-% &#7'/3'1# ' $%2'3#&# 0# /)9,9 6*#3/"# 0#(%73'1%. !"# #,5'"3/$5 (% "%> (#$3/ '7%35'A',/"#*# #$4%,5% 2#:'3$,/. 4&%"/F%-# 9 ,/('2# 9"/F%-% $%2#35'6,/. 03#-# 2/8% 7# 95'6% 3# 5&%39539 7/)&9 4&#,$9. !$'2 5/.#, /"# #,5'"3/$5 (% $#6'3'*# 4&%4/&9,% ,#,/ 5&%)# 4&'$594'5' 4&/)*%29 '35%.&'$#-# $%2#35'6,'+ '3A/&2#1'(# 9 2#:'3$,/ 4&%"/F%-%. M;=B-ON=CB]OGBY; 5#,/F% 4&'"/7' ,&#(9 3/"' (%0'6,' &%$9&$ 0# 2#:'3$,/ 4&%"/F%-%, #3/5'&#3' ,/&49$ +')&'73'+ 90/&#,# 0# 2#:'3$,/ 4&%"/F%-% ,/(' /)%0)%F9(% 4/7#5,% 0# (%0'6,% 4#&/"% %3.*%$,'-3%2#6,', %3.*%$,'-:4#3$,' ' %3.*%$,'-6%:,'. M;=B-ON=CB]OGBY; (% &#"'*/ ' $/A5"%& ,/(' 4&',94E# "':%(%0'6,% 4/7#5,% ,/(' $9 $,&'"%3' 3# "%)9.

40+30' -("7" $#'(+.)+3$ =>?@-A>? Country Member (Affiliation) Contacts Austria Universität Wien Gerhard Budin Belgium University of Antwerp Walter Daelemans University of Leuven Dirk van Compernolle Bulgaria Bulgarian Academy of Sciences Svetla Koeva Croatia Zagreb University Marko Tadic Cyprus University of Cyprus Jack Burston Czech Rep. Charles University in Prague* Jan Hajic Denmark University of Copenhagen Bente Maegaard, Bolette Sandford Pedersen Estonia University of Tartu Tiit Roosmaa Finland Aalto University* Timo Honkela University of Helsinki Kimmo Koskenniemi, Krister Linden France CNRS, LIMSI* Joseph Mariani ELDA* Khalid Choukri Germany DFKI* Hans Uszkoreit, Georg Rehm RWTH Aachen* Hermann Ney Greece ILSP, R.C. “Athena”* Stelios Piperidis Hungary Hungarian Academy of Sciences TamásVáradi Budapest Technical University Géza Németh, GáborOlaszy Iceland University of Iceland Eiríkur Rögnvaldsson

About META-NET

50 INTERNAL DRAFT

Ireland Dublin City University* Josef van Genabith Italy ConsiglioNazionaleRicerche* Nicoletta Calzolari Fondazione Bruno Kessler* Bernardo Magnini Latvia Tilde Andrejs Vasiljevs University of Latvia Inguna Skadina Lithuania Institute of the Lithuanian Language Jolanta Zabarskaitë Luxembourg Arax Ltd. Vartkes Goetcherian Malta University of Malta Mike Rosner Netherlands Universiteit Utrecht* Jan Odijk Norway University of Bergen Koenraad De Smedt Poland Polish Academy of Sciences Adam Przepiórkowski University of !ód" Piotr Pezik Portugal University of Lisbon Antonio Branco Inst. for Systems Engineering and Computers Isabel Trancoso Romania Romanian Academy of Sciences Dan Tufis University AlexandruIoan Cuza Dan Cristea Serbia Belgrade University Duško Vitas, Cvetana Krstev, Ivan Obradovi# Pupin Institute Sanja Vraneš Slovakia Slovak Academy of Sciences Radovan Garabik Slovenia Jozef Stefan Institute* Marko Grobelnik Spain Barcelona Media* Toni Badia Technical University of Catalonia Asunción Moreno University PompeuFabra Núria Bel Sweden University of Gothenburg Lars Borin UK University of Manchester Sophia Ananiadou An * represents the founding members.

How to Participate? META-NET and META offer many opportunities for participation. Please check out www.meta-net.eu for information on upcoming events and activi-ties.

About META-NET

51 INTERNAL DRAFT