Инструментальные системы извлечения информации

Download Инструментальные системы извлечения информации

Post on 22-May-2015

313 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

2 2011

TRANSCRIPT

<ul><li> 1. - . .., alexey.noskov@gmail.com 2 2011</li></ul> <p> 2. (Information Extraction) (, , ) . ( ) (, ) 3. , : ( , , ) ( , , ) 4. : - - - ( )rF gunninghmD uF rumphreysD nd F qizusksD qei E siEsedqenerl erhiteture for ext ingineeringD4sx ygiihsxq yp ri sii yqew @rei sssA T wyxr yuryF heeD IWWUF 5. LT-NSL Wraetlic GATE GATE 2 Catalyst SProUT Whiteboard Heart of Gold Learning Based Java CAFE 6. SGML XML (, ,) (w,s,np) 7. XMLAs far as I was enabled...s id=6293 w c=w pos=IN id=624As/w w c=w pos=RB id=625far/w pp id=21236 w c=w pos=IN id=626 head=yesas/w np number=singular person=1 id=627 w c=w pos=PRP head=yes id=628I/w /np /pp vbar voice=passive time=past id=629 args=+6302 w c=w pos=VBD stem=be head=yes id=630was/w w c=w pos=VBN stem=enable id=631enabled/w /vbar 8. LT-NSL 1996 (, ) ( UNIX-) - hF wuelvieD gF frewD nd rF hompsonD sing qwv s fsis for htEsntensivexv4sx ygiihsxq yp ri pspr gyxpiixgi yx evsih xeevvexqeqi ygisxq @exvEWUD IWWUA 9. LT-NSL: SGMLSGML , , SGML SGML (, ) 10. Wraetlic 2006 , Java XML- (, , ) XSLT XML- XML, iF elfonseD eF worenoEsD tF wF quiroD nd wF uizEsdoD he wretli xvsuiteD4PHHTF 11. 12. LT-NSL Wraetlic GATE GATE 2 Catalyst SProUT Whiteboard Heart of Gold Learning Based Java CAFE 13. , ( ) ( ) 14. As far as I was enabled... As /6 1 far /6 2 as /6 3 I/6 4 was /6 5 enabled/6 6 ... 0HJAAn w ww wwwnp vbar pps 15. TIPSTER F qrishmnD si text phse ss rhiteture designD4in roeedings of workshop on held t iennD irginiX wy TEVD IWWTD ppF PRW!QHSD IWWTF 16. GATE 1996 , C++ ( , ..) - ( , ) rF gunninghmD uF rumphreysD nd F qizusksD qei E siEsedqenerl erhiteture for ext ingineeringD4sx ygiihsxq yp ri sii yqew @rei sssA T wyxr yuryF heeD IWWUF 17. GATE: GDM (Gate Document Manager) - CREOLE (Collection of REusable Objects for Language Engineering) - GGI (Gate Graphical Interface) - , 18. GATE: - hF wynrd et lFD e urvey of ses of qeiF4 19. GATE 2+ 2001 , Java Unicode , JAPE uF fonthevD etFlF qeiX e niodeEsed infrstruture supporting multilingulinformtion extrtionD4sx ygiihsxq yp yury yx sxpywesyxiegsyx py veyxsg exh yri gixev exh ieixiyiex vexqeqi @siv9HQAD fyyiD PHHQF 20. GATE 2: , - , , . 21. GATE 2: JAPE - - Java-rF gunninghmD rF gunninghmD hF wynrdD hF wynrdD F lnD nd FlnD teiX tv ennottion tterns ingineD4IWWWF 22. GATE 2: JAPE: Rule: PersonJobTitle( {Lookup.majorType == jobtitle} ):jobtitle( {TempPerson} ):person--:jobtitle.JobTitle = {rule = PersonJobTitle},:person.Person = {kind = personName, rule = PersonJobTitle}Rule: YearContext1( {Token.string == in} | {Token.string == by} )( {Token.kind == number } ):year--:year.Timex = {kind = date, rule = YearContext1} 23. Catalyst, 2002 TIPSTER - F ennd et lFD nd nd the gtlyst erhitetureD4PHHPF 24. Catalyst: , 25. Catalyst: 26. Catalyst: 27. Catalyst: - 28. , 29. LT-NSL Wraetlic GATE GATE 2 Catalyst SProUT Whiteboard Heart of Gold Learning Based Java CAFE 30. ( ) , ( ) : F hferD sntegrting heep nd hllow xturl vnguge roessing gomponents! epresenttions nd ryrid erhiteturesD4pulty of wthemtis nd gomputerieneD rlnd niversityD PHHUF 31. SProUT 2002-2004 , - F hrozdzynskiD rF uriegerD tF iskorskiD F hferD nd pF uD hllow roessing with ni(tion nd yped peture truturespoundtions ndepplitionsD4unstlihe sntelligenzD volF ID ppF IU!PQD PHHRF u 32. SProUT: 33. SProUT: - - f : q:a g : c h:d f, g, h - , q : a , d - q:af : f : e f : q:ag : c h:dh:d 34. SProUT: , , , f : 1 q: 2 h:e g : 1: i :u h:2 f : 1 q:e g : 1 h : e i :u 35. SProUT: POS : Determiner ? CASE : c morph INFL : NUMBER : n GENDER : g POS : Adjective * CASE : c morph INFL : NUMBER : n GENDER : gPOS : Noun cat CASE : cmorph INFL : NUMBER : n GENDER : g CAT : cat CASE : c phrase AGR : agr NUMBER : n GENDER : g 36. SProUT: 37. Whiteboard 2000-2002 , XML fF grysmnn et lFD en sntegrted erhiteture for hllow nd heeproessingD4xsis yp ixxvexseD ppF RRI!RRVD PHHPF 38. Whiteboard: 39. Heart of Gold 2004-2005 Whiteboard - XML XSLT F gllmeierD eF iiseleD F hferD nd wF iegelD he heephought ore rhiteture frmeworkD4in roeedings of vigD volF RD ppF IPHS!IPHVD PHHRF 40. Heart of Gold: 41. 42. LT-NSL Wraetlic GATE GATE 2 Catalyst SProUT Whiteboard Heart of Gold Learning Based Java CAFE 43. Learning based Java 2007 , , xF izzolo nd hF othD verning fsed tv for pid hevelopment of xvystemsD4in roeedings of the snterntionl gonferene on vnguge esoures ndivlution @vigAD PHIHF 44. Learning Based Java: /** This feature generating classifier senses all the* words in the document that begin with an alphabet* letter. The result is a bag-of-words representation* of the document. */discrete% BagOfWords(Post post) - { for (int i = 0; ipost.bodySize(); ++i) for (int j = 0; jpost.lineSize(i); ++j) { String word = post.getBodyWord(i, j); if (word.length()0word.substring(0, 1).matches([A-Za-z])) sense word; }}/** The label of the document. */discrete NewsgroupLabel(Post post) - { return post.getNewsgroup(); }http : //cogcomp.cs.illinois.edu/page/software view /11 45. Learning Based Java: /** Here, we train averaged Perceptron for many* rounds of the training data. **/ discrete NewsgroupClassifierAP(Post post) - learn NewsgroupLabel using BagOfWords from new NewsgroupParser(data/20news.train.shuffled) 40 rounds with SparseNetworkLearner { SparseAveragedPerceptron.Parameters p = new SparseAveragedPerceptron.Parameters(); p.learningRate = .1; p.thickness = 3; baseLTU = new SparseAveragedPerceptron(p); } progressOutput 20000 testFrom new NewsgroupParser(data/20news.test) end 46. CAFE 2001 , F gF tonessD gontinuous nderstndingX e pirst vook t gepiD4PHHIF 47. CAFE: 48. , GATE , ,, 49. , 50. ?</p>

Recommended

View more >