d3.3.2 final machine translation based semantic …...deliverable+d3.3.2+ xlike+...

27
FP7ICT2011.4.2 Contract no.: 288342 www.xlike.org Deliverable D3.3.2 Final machine translation based semantic annotation prototype Editor: Marko Tadić, UZG Author(s): Marko Tadić, UZG; Matea Srebačić, UZG; Daša Berović UZG; Danijela Merkler, UZG; Tin Pavelić, UZG. Deliverable Nature: Prototype (P) Dissemination Level: (Confidentiality) Public (PU) Contractual Delivery Date: M30 Actual Delivery Date: M36 Suggested Readers: All partners using the XLike Toolkit Version: 0.5 Keywords: machine translation, semantic annotation, Cyc ontology

Upload: others

Post on 07-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

 FP7-­‐ICT-­‐2011.4.2  Contract  no.:  288342  www.xlike.org  

 

         

   

Deliverable  D3.3.2  

 

 

Final  machine  translation  based  semantic  annotation  prototype    

 

 

 

 

 

Editor:  

Marko  Tadić,  UZG  

Author(s):   Marko  Tadić,  UZG;  Matea  Srebačić,  UZG;  Daša  Berović  UZG;  Danijela  Merkler,  UZG;  Tin  Pavelić,  UZG.  

Deliverable  Nature:   Prototype  (P)  

Dissemination  Level:  (Confidentiality)  

Public  (PU)  

Contractual  Delivery  Date:   M30  

Actual  Delivery  Date:   M36  

Suggested  Readers:   All  partners  using  the  XLike  Toolkit  

Version:   0.5  

Keywords:   machine  translation,  semantic  annotation,  Cyc  ontology  

Page 2: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  2  of  (27)     ©  XLike  consortium  2012  –  2014      

Disclaimer  

This  document  contains  material,  which  is  the  copyright  of  certain  XLike  consortium  parties,  and  may  not  be  reproduced  or  copied  without  permission.    

All  XLike  consortium  parties  have  agreed  to  full  publication  of  this  document.    

The   commercial   use   of   any   information   contained   in   this   document   may   require   a   license   from   the  proprietor  of  that  information.  

Neither   the   XLike   consortium   as   a   whole,   nor   a   certain   party   of   the   XLike   consortium  warrant   that   the  information  contained  in  this  document  is  capable  of  use,  or  that  use  of  the  information  is  free  from  risk,  and  accept  no  liability  for  loss  or  damage  suffered  by  any  person  using  this  information.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Full  Project  Title:   XLike–  Cross-­‐lingual  Knowledge  Extraction  

Short  Project  Title:   XLike  

Number  and    Title  of  Work  package:  

WP3  –  Cross-­‐lingual  Semantic  Annotation    

Document  Title:   D3.3.2  Final  machine  translation  based  semantic  annotation  prototype  

Editor  (Name,Affiliation)   Marko  Tadić,  UZG  

Work  package  Leader  (Name,  affiliation)  

Achim  Rettinger,  KIT  

Estimation  of  PM  spent  on  the  deliverable:  

8  PM  

 

Copyright  notice  

 2012-­‐2014  Participants  in  project  XLike  

 

Page 3: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  3  of  (27)        

Executive  Summary  

The  main  goal  of  the  XLike  project  is  to  extract  knowledge  from  multi-­‐lingual  text  documents  by  annotating  statements   in   sentences   of   a   document   with   a   cross-­‐lingual   knowledge   base.   The   purpose   of   the   final  machine  translation  based  semantic  annotation  prototype  described  here,  is  to  demonstrate  how  the  SMT  systems  could  be  used   to   translate   from  natural   language   into  a   formal   language.  This   translation  would  then  be  used  as   the  semantic  annotation  of  a  natural   language  sentence.  We  have  described  the   further  experiments  using  the  Moses  SMT  system  with  extended  translation  models  and  presented  the  evaluation  of  results.  

 

 

Page 4: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  4  of  (27)     ©  XLike  consortium  2012  –  2014      

Table  of  Contents  

Executive  Summary  ...........................................................................................................................................  3  Table  of  Contents  ..............................................................................................................................................  4  List  of  Figures  ....................................................................................................................................................  5  List  of  Tables  ......................................................................................................................................................  6  Abbreviations  ....................................................................................................................................................  7  Definitions  .........................................................................................................................................................  8  1   Introduction  ...............................................................................................................................................  9  1.1   Motivation  ..........................................................................................................................................  9  

2   Related  research:  Semantic  Parsing  .........................................................................................................  10  3   Statistical  Machine  Translation  techniques  .............................................................................................  12  3.1   General  Framework  ..........................................................................................................................  12  3.2   Early  prototype:  proof  of  concept  ....................................................................................................  12  3.3   Final  prototype:  additional  experiments  ..........................................................................................  13  

4   Preparing  the  training  data  ......................................................................................................................  14  4.1   Generation  of  larger  parallel  corpus  .................................................................................................  14  4.2   Generation  of  additional  linguistic  data  ...........................................................................................  14  

5   Using  Moses  .............................................................................................................................................  16  5.1   Training  Moses  with  a  larger  training  set  .........................................................................................  16  5.2   Training  Moses  with  factor-­‐based  models  ........................................................................................  17  

6   Evaluation  of  Translation  .........................................................................................................................  18  6.1   Translation  from  English  into  CycL  with  En-­‐EnSemRep-­‐Model04  .....................................................  18  6.2   Translation  from  English  into  CycL  with  Factor-­‐based  models  .........................................................  18  6.3   Evaluation  of  the  translation  quality  of  En-­‐EnSemRep-­‐Model04  .....................................................  19  6.3.1   Automatic  evaluation  of  En-­‐EnSemRep-­‐Model04  .....................................................................  19  6.3.2   Human  evaluation  of  En-­‐EnSemRep-­‐Model04  ..........................................................................  20  

6.4   Extrinsic  evaluation  ...........................................................................................................................  23  7   Conclusion  ................................................................................................................................................  25  References  ......................................................................................................................................................  26    

 

 

Page 5: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  5  of  (27)        

List  of  Figures  

Figure  1.  General  diagram  of  a  SMT  system  (from  [4])  ...................................................................................  12  Figure  2.  Example  of  training  data  in  TMX  format  ..........................................................................................  14  Figure  3.  Example  of  the  English  part  of  the  factor-­‐based  training  data  in  CoNLL  format  ..............................  15  Figure  4.  Example  of  the  CycL  part  of  the  factor-­‐based  training  data  .............................................................  15  Figure  5.  The  training  data  uploaded  as  a  parallel  corpus  to  Let'sMT!  platform  ............................................  16  Figure  6.  The  En-­‐EnSemRep-­‐Model04  available  for  translation  at  Let'sMT!  platform  ....................................  17  Figure  7.  Example  of  the  English  part  of  the  factor-­‐based  training  data  in  MOSES  format  ............................  17  Figure  8.  Example  of  the  translation  from  English  to  CycL  ..............................................................................  18  Figure  9.  Automatic  evaluation  of  translation  quality  for  En-­‐EnSemRep-­‐Model02  (above)  and  En-­‐EnSemRep-­‐

Model04  (below)  SMT  systems  ................................................................................................................  19  Figure  10.  Sisyphos  II  screen  with  Comparative  evaluation  example  of  the  better  first  translation  ..............  21  Figure  11.  Sisyphos  II  screen  with  Comparative  evaluation  example  of  both  translations  equally  good  .......  21  Figure  12.  Sisyphos  II  screen  with  Comparative  evaluation  example  of  both  translations  equally  bad  .........  22  Figure  13.  Sisyphos  II  screen  with  Absolute  evaluation  scenario  of  1000  Bloomberg  sentences:  Rubble  

example  ....................................................................................................................................................  23  Figure  14.  Sisyphos  II  screen  with  Absolute  evaluation  scenario  of  1000  Bloomberg  sentences:  Mainly  

nonfluent  example  ...................................................................................................................................  23  

Page 6: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  6  of  (27)     ©  XLike  consortium  2012  –  2014      

List  of  Tables  

Table  1.  Results  of  the  human  evaluation  of  translation  quality  of  1000  English  sentences  translated  into  CycL  by  En-­‐EnSemRep-­‐Model04  SMT  system  .........................................................................................  22  

Table  2  .  Results  of  the  human  evaluation  of  translation  quality  of  1000  Bloomberg  sentences  translated  into  CycL  by  En-­‐EnSemRep-­‐Model04  SMT  system  ...................................................................................  24  

 

 

Page 7: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  7  of  (27)        

Abbreviations  

SL     Source  Language  

TL     Target  Language  

IL     Interlingua  

NL     Natural  Language  

FL     Formal  Language  

NL2FL     Natural  Language  to  Formal  Language  

MT     Machine  Translation  

SMT     Statistical  Machine  Translation  

RBMT     Rule  Based  Machine  Translation  

L     Language  

TM     Translation  Model  

LM     Language  Model  

NLP     Natural  Language  Processing  

SRL     Semantic  Role  Labelling  

WSD     Word  Sense  Disambiguation  

SP     Semantic  Parsing  

Page 8: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  8  of  (27)     ©  XLike  consortium  2012  –  2014      

Definitions  

Parallel  Corpus   Parallel  corpus  consists  of  documents  that  are  translated  directly  into  different  languages.    

Comparable  Corpus   Comparable   corpus,   unlike   parallel   corpora,   contains   no   direct   translations.  Overall   they   may   address   the   same   topic   and   domain,   but   can   differ  significantly  in  length,  detail  and  style.    

Source  language   Language  of  the  text  that  is  being  translated.  

Target  Language   Language  of  the  text  into  which  the  translation  is  being  done.  

Formal  language   Artificial   language   that   uses   formally   defined   syntax.   Its   norm   precedes   its  usage.  

Natural  language   A  language  where  the  usage  precedes  the  norm.  

Language  pair   Unidirectional   translation   from   the   SL   to   TL.   Translation   from   La   to   Lb   is   one  language  pair  and  from  Lb  to  La  is  another  language  pair.  

 

 

Page 9: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  9  of  (27)        

1 Introduction  

In  this  deliverable  we  are  presenting  the  results  of  research  leading  to  the  final  machine  translation  based  semantic  annotation  prototype.  This  part  of  the  project  was  envisaged  and  covered  by  the  research  plans  situated  in  the  WP3,  namely  T3.3.  

 

1.1 Motivation  

The  main  goal  of  the  XLike  project  is  to  extract  knowledge  from  multi-­‐lingual  text  documents  by  different  means  and  treating  the  documents  at  all  possible  levels:  from  the  document  collection,  over  documents  as  unique  entities,  up  to  individual  paragraphs  and  sentences  that  occur  in  these  documents.  The  knowledge  can  be  formally  represented  as  statements  in  a  formal  language,  resembling  a  formal  logic  calculus  or  any  other   semantically   rich   format   (e.g.   RDF   triples),   or   as   mappings   from   any   of   the   mentioned   levels   of  processing  to  a  desired  conceptual  space  (e.g.  Cyc  ontology,  Wikipedia,  Dbpedia,  Linked  Open  Data,  etc.).  

Different  work  packages  and  the  respective  tasks  within  the  XLike  project  examine  different  approaches  to  this  problem,  while   the  task  T3.3  covered   in  this  and  previous  deliverable   (D3.3.1)   is   trying  to   investigate  how  the  machine  translation  techniques  could  be  exploited  for  cross-­‐lingual  semantic  annotation.  

Then   main   idea   behind   this   task   is   to   investigate   how   the   use   of   statistical   machine   translation   (SMT)  techniques  could  facilitate  the  obtaining  of  mappings  between  text  and  its  semantic  representation(s).  The  development   of   this   advanced   prototype   followed   the   idea   presented   in  D3.3.1:  would   it   be   possible   to  train   a   SMT-­‐system   to   translate   from  natural   language  as   a   source   language   into   a   formal   language  as   a  target  language?  The  work  presented  here  has  been  conducted  as  an  addition  to  the  early  prototype  that  confirmed  the  proof  of  concept,  i.e.  whether  this  idea,  that  could  be  applicable  in  theory,  once  turned  into  a  real  SMT-­‐system,  really  produces  results  usable  by  humans  and/or  machines  for  further  processing.  In  this  final   prototype  we  were   using   the   additional   capabilities   of   SMT-­‐systems   (such   as   larger   training   sets   or  factor-­‐based  SMT  models)  to  train  a  translation  model  and  target  language  model.  

Page 10: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  10  of  (27)     ©  XLike  consortium  2012  –  2014    

2 Related  research:  Semantic  Parsing  

The  task  of  natural   language  understanding  within  the  area  of  NLP  has  recently  been  mainly   focussed  on  shallow   semantic   analysis   (e.g.   Semantic   Role   Labelling,   SRL)   and   Word   Sense   Disambiguation   (WSD).  However,  more   ambitious   is   the   task   of   semantic   parsing   (SP),   which   is   the   construction   of   a   complete  semantic  representation  of  a  sentence  expressed  in  the  form  of  FL  statement(s).  In  this  project  task  (T3.3)  we  tried  to  observe  this  process  as  a  translation  from  NL  to  FL.  

While   in   computing   there   is   a   long   tradition   of   translating   from   a   FL   into   another   FL   (e.g.   programming  languages  translation  with  the  usage  of  compilers  and/or  interpreters),  translation  from  a  NL  into  a  FL  has  been   limited   usually   to   translation   of   commands   formulated   in   NL   into   a   programming   language  commands/expressions   or   a   query   language   statements.   The   programming   language  commands/expressions   were   always   strictly   defined   by   the   programming   language   formal   syntax.   The  query   languages  syntax  and  particularly  semantics  were  mostly  confined  within   the   limited  domains  with  usage   of   predefined   scripts   or   scenarios.   There   has   been   only   a   limited   number   of   attempts   to   cover   a  general  purpose  use  of  NL  sentences  and  come  up  with  their  FL,   i.e.  their  semantic,  representations.  Two  main   research   groups   were   working   on   this   task   previously,   mainly   using   existing   state-­‐of-­‐the   art   NLP  techniques   and   machine   learning   approaches.   One   group   was   led   at   MIT   by   Michael   Collins   and   his  associate  Luke  Zettlemoyer,  while  at  the  University  of  Texas,  Austin,  Raymond  Mooney  led  the  group  that  included  Yuk  Wah  Wong,  Rohit  J.  Kate  and  Ruifang  Ge.  

The  first  attempts  were  mostly  oriented  towards  the  combination  of  rule-­‐based  approaches  in  processing  NL   part   of   the   task   (e.g.   usage   of   constituency   oriented   context-­‐free   or   context-­‐sensitive   grammars   for  parsing  NL  sentences)  and  rule-­‐based  or  machine  learning  techniques  for  generating  FL  representations  of  these  analysed  sentences.  

Luke   Zettlemoyer's   paper   [9]   introduced   a   framework  making  use  of   structured   classification   to   perform  semantic  parsing.  The  follow  up  paper  [10]  introduced  "relaxed  grammar"  to  add  flexibility  to  the  grammar,  while   it   also   slightly   improved   the   learning   algorithm   to   boost   the   efficienty   and   make   the   algorithm  incrementally   applicable   (it   can   learn   from   instance   to   instance,   instead   of   batch   learning).   Later   the  context-­‐dependent  parsing  was  proposed  in  [11]  which  was  able  to  handle  even  the  discourse  structures.  Given   a   training   corpus   of  NL   sentences   aligned  with   their   FL   representations,   these   approaches   learn   a  model  capable  of  translating  sentences  to  a  desired  FL  representation.  However,  set  of  hand  crafted  rules  are   used   to   learn   syntactic   categories   and   semantic   representations   of   words   based   on   combinatorial  categorial   grammar   (CCG).   All   these  works   use   lambda   calculus   as   a   FL   representation   of   NL   sentences,  while   training  sets  sizes   for  machine   learning  parts  were  between  800  and  4500  sentences.  Although  the  reported  results  were  quite  high,   it   should  be  noted  that   the  experiments  were  conducted   in   the   limited  domain:  Air  travel-­‐planning  dialogs  (ATIS  domain).  

Research   of   semantic   parsing   in   Austin  was  more   oriented   towards   semantic   lexica,   i.e.   their   automatic  acquisition  from  NL  texts  [12].  Later  Wong  and  Mooney  proposed  a  system  which  makes  use  of  some  SMT  techniques  (e.g.  using  synchronous  grammar  for  word  alignment  to  learn  semantic  lexicon  and  learn  rules  for   composing  meaning   representation),   called  WASP   [13,14].   Kate   and  Mooney   also   invented   a   Kernel-­‐based  semantic  parsing  system  called  KRISP  [15,16,17],  largely  making  use  of  machine  learning  techniques  (such  as  SVM).  Both  these  groups  produced  several  PhD  thesis  that  contribute  to  the  research  in  this  area  [18,19,20].  

In   [21]   a   syntactic   combinatorial   categorial   parser   (CCG   parser)   is   also   used   to   parse   natural   language  sentences  and  also   to  construct   the  semantic  meaning  of   the  sentences  as  directed  by   their  parsing.  The  same  parser  is  used  for  both  and  lambda  calculus  was  also  used  for  the  FL  representation  of  NL  sentences.  

However,  neither  of  these  approaches  proposed  the  usage  of  the  the  well  established  general  purpose  SMT  systems   that   were   developed   in   the   previous   decade   and   were   made   accessible   for   free,   namely,   the  MOSES-­‐based  systems.  One  of  the  reasons  can  be  seen  in  the  sizes  of  the  training  data  the  mentioned  SP  systems  had  at  their  disposal:  the  MOSES-­‐based  systems  require  huge  training  sets  of  parallel  texts  where  one  side  of  a  pair  is  a  NL  sentence  and  its  aligned  counterpart  is  a  FL  representation  of  this  sentence.  These  

Page 11: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  11  of  (27)      

training  sets  have  to  encompass  millions  of  aligned  NL  sentences  and  their  FL  counterparts  in  order  to  have  a  decent  SMT  results.  The  mentioned  SP  systems  used  only  up  to  several  thousands  NL  sentences  usually  manually  annotated  for  their  FL  semantic  representations,  thus  forming  a  parallel  corpus  of  NL  sentences  and  their  FL  representations.  The  manual  annotation  was  certainly   labour   intensive  and  required  a  highly  skilled  human  annotator(s)  thus  preventing  a  less  expensive  large  scale  processing.  Such  small  corpora  can't  be  used  for  generating  a  useful  translation  models  in  the  SMT  systems.  

We  wanted  to  tackle  this  problem  by  giving  it  a  try  of  using  much  larger  parallel  corpus  for  training  a  SMT  system,  which   should  be   regarded  here   as   just   a   special   type  of  machine   learning   system.   Such  an  early  prototype  system  was  developed  and  described  in  D3.3.1  while  here  we  present  the  follow  up  experiments  in  enlarging  this  system  and  evaluating  its  performance.  

 

 

Page 12: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  12  of  (27)     ©  XLike  consortium  2012  –  2014    

3 Statistical  Machine  Translation  techniques  

 

3.1 General  Framework  

General  SMT  scenario   involves  collecting  the  parallel  data,  aligning  them  at  the  sentence  level,  using  that  data  for  training  the  SMT  systems  and  building  a  Translation  Model  (TM)  for  transfer  of  words  and  phrases  from  SL  into  TL.  In  order  to  select  between  different  probable  translations  and  to  use  the  most  appropriate  (often   also  more  natural)   TL   text,   very   large   Language  Models   (LM)   are  used   for   adjusting   the   final   SMT  system  output.  In  Figure  1  a  general  SMT  process  is  presented  as  a  diagram.  

 

 Figure  1.  General  diagram  of  a  SMT  system  (from  [4])  

 

Generally,   the  mentioned   scenario   involves   natural   language   (NL)   as   both,   SL   (Spanish)   and   TL   (English).  Training   is   performed   using   a   large   Spanish-­‐English   parallel   corpus   and   TM   is   being   built.   Large   English  monolingual  corpus  is  used  to  build  (train)  LM.  Decoder  applies  the  Decoding  Algorithm  to  all  TM  outputs  and  uses  LM  to  select  as  the  final  output  the  most  probable  translation  of  sentence  s   in  TL.  This  is  a  SMT  process  described  in  a  nut  shell  and  all  SMT  systems  so  far  (including  Moses)  were  adapted  for  NL  as  TL.  

 

3.2 Early  prototype:  proof  of  concept  

The  idea  behind  this  task  is  simple.  Since  the  main  goal  of  XLike  project  is  to  build  technology  for  extracting  and   representing   knowledge   from   the   text   cross-­‐lingually   in   a   language   independent   (common)   format,  preferably  formally  defined,  we  suggested  that  this  representation  could  be  written  in  a  formal   language.  From   the   Semantic  Web   community   the   representation   of   basic   relations   in   the   form  of   RDF   triples   has  become   common   way   of   representing   knowledge   involving   concepts   from   a   conceptual   space   or   an  ontology.  However,  population  of  conceptual  spaces  and  ontologies  with  relations  from  the  texts  has  been  complicated,   demanding   and   involved   a   lot   of   human   effort   if   it   is   entirely   rule-­‐based.   Wouldn't   it   be  possible  to  apply  analogous  shift  in  methodology  here,  like  it  was  applied  with  the  change  from  rule-­‐based  

Page 13: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  13  of  (27)      

MT  to  statistical  MT?  This  would   involve  usage  of  SMT  techniques   for  automatic   translation   from  natural  language  into  formal  language.  

In  D3.3.1  we  have  shown  that  this  initial  idea  can  be  applied  by  the  insight  into  the  results  of  evaluation  of  an   early   prototype   SMT   output   by   automatic   scores   (BLUE,  NIST,   TER)   and   human   evaluation   (Accuracy,  Fluency).   The   results   of   automatic   evaluation   were   surprisingly   good,   so   we   checked   them   by   human  evaluation  also  and  these  demonstrated  that  more  than  50%  of  NL  content  has  been  either  "full"  or  "mayor  content  conveyed".  This  led  us  to  the  additional  experiments  that  we  present  here.  

 

3.3 Final  prototype:  additional  experiments  

We  planned  the  following  experimental  steps  that  were  planned  to  lead  to  the  final  prototype:  

1. Generating   a   larger   parallel   corpus   of   English-­‐CycL   aligned   "sentences"   (1277K)   than   in   Early  prototype  (650K).  

2. Training  a  TM  with  larger  parallel  corpus  (1277K)  

3. Translation  of  the  same  test  set  of  1000  English  sentences  with  1277K  TM  

4. Comparative  evaluation  of  two  SMT  outputs  (early  and  final  prototype  TMs)  in  order  to  determine  the  baseline  TM  

5. Training  a  TM  using  additional  linguistic  information  obtained  from  the  English  pipeline  developed  within  WP2  

a. Training   a   TM   (TMsynt)   with   the   syntactic   information   that   is   the   result   of   dependency  parsing  

b. Training  a  TM  (TMsrl)  with  the  semantic  role  labels  

c. Training  a  TM  (TMsynt+srl)  with  both,  syntactic  information  and  semantic  role  labels  

6. Comparative   evaluation   of   TMsynt,   TMsem,   and   TMsytn+sem   with   the   baseline   TM   in   order   to  check  whether  the  additional  linguistic  information  contributed  to  the  quality  of  SMT  output  

7. For   extrinsic   evaluation   purpose   1000   English   sentences   extracted   from   Bloomberg   articles  appearing  on-­‐line,  were  also  absolutely  evaluated  for  accuracy  and  fluency   in  order  to  determine  how  this  system  will  behave  when  dealing  with  the  English  sentences  from  the  real  life  texts.  

However,   it  will   be   shown   in   the   rest   of   this   deliverable   that   some   of   the   planned   steps   yielded   poorer  results  than  expected  and  their  consequent  steps  had  to  be  abandoned.  

Page 14: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  14  of  (27)     ©  XLike  consortium  2012  –  2014    

4 Preparing  the  training  data  

The  general  process  of  generation  of  FL  "sentences"  and  their  English  counterparts  from  Cyc  ontology  was  described   in   detail   in   D3.3.1,   so   here  we  will   concentrate   only   of   additional   features   that   were   used   in  building  the  Final  prototype.  

 

4.1 Generation  of  larger  parallel  corpus  

Generation   of   English   sentences   aligned   with   FL   "sentences"   was   done   by   partners   from   IJS   since   they  operate  Cyc  ontology  as  a  whole.  In  the  second  generation  run  we  obtained  a  filtered  corpus  that  we  call  650K  since  it  consisted  of  ca  650,000  aligned  English-­‐CycL  "sentence"  pairs.  In  this  third  generation  run  we  obtained  ca  1.87  million  pairs  of  English  and  CycL  "sentences".  Like  in  the  Early  prototype  development,  we  noticed  that  a  lot  of  English  sentences  were  referring  to  relations  between  two  concepts  denoted  by  their  IDs   instead  by  terms  in  plain  English,  so  we  filtered  this  output  using  the  same  procedure  like   in  the  data  preparation   for   the  early  prototype.  This   filtering  process  was  applied  on   the   third  generation   run  and   it  yielded  1,277,680  clean  English-­‐CycL  aligned  "sentence"  pairs.  This  larger  parallel  corpus  we  will  call  1277K.  This   amount   of   data   represents   an   acceptable   quantity   of   parallel   data   for   a   thorough   SMT   experiment,  particularly  having  in  mind  the  monotonous  nature  of  CycL  as  TL.  The  training  data  were  prepared  in  TMX  format,  an  open  XML  industry  standard  format  for  exchanging  parallel  textual  data.  

 <tu> <tuv xml:lang="en"> <seg>Zagreb, Croatia's longitude is 16 degrees</seg> </tuv> <tuv xml:lang="se"> <seg>(#$longitude #$CityOfZagrebCroatia (#$Degree-UnitOfAngularMeasure 16.0))</seg> </tuv> </tu> <tu> <tu> <tuv xml:lang="en"> <seg>Minnie Driver appeared in "Circle Of Friends"</seg> </tuv> <tuv xml:lang="se"> <seg>(#$movieActors #$CircleOfFriends-TheMovie #$MinnieDriver)</seg> </tuv> </tu> <tu> <tuv xml:lang="en"> <seg>lacrimal fluid is a type of bodily secretion</seg> </tuv> <tuv xml:lang="se"> <seg>(#$genls #$LacrimalFluid #$Secretion-Bodily)</seg> </tuv> </tu>

Figure  2.  Example  of  training  data  in  TMX  format  

 

Out  of  the  prepared  1277K  sentence  pairs,  a  test  set  of  10,000  sentence  pairs  was  set  aside  for  evaluation  purposes.  

 

4.2 Generation  of  additional  linguistic  data  

For  generation  of  additional  linguistic  data  in  the  English  part  of  English-­‐CycL  aligned  pairs  of  sentences  we  used   the   English   pipeline   developed   within   WP2   (see   D2.2.2).   We   produced   three   different   translation  

Page 15: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  15  of  (27)      

models   (TMsynt,   TMsrl,   TMsynt+srl)   that   added   the   linguistic   information   to   the   English   sentences.   In  figures  3  and  4  the  examples  of  factor-­‐based  training  data  are  shown.  

No   Token   Lemma   PoS   MSD   NE   Dep.   Synt.   PWN  sense   WSD   A1   A2  

1   exactly   exactly   RB   pos=adverb|type=general   O   3   NMOD   00158309-­‐r   _   _   _  

2   one   one   DT   pos=determiner   O   3   NMOD   _   _   _   _  

3   tail   tail   NN   pos=noun|num=s   O   4   SBJ   02157557-­‐n   _   A0   _  

4   is   be   VBZ   pos=verb|vform=personal|person=3   O   0   ROOT   02620587-­‐v   be.00   _   _  

5   a   1   CD   pos=number   O   4   VC   _   _   _   _  

6   physical   physical   JJ   pos=adjective   O   7   NMOD   01778212-­‐a   _   _   _  

7   part   part   NN   pos=noun|num=s   O   4   PRD   00720565-­‐n   officiate.01   A1   _  

8   of   of   IN   pos=preposition   O   7   NMOD   _   _   _   A1  

9   every   every   DT   pos=determiner   O   10   NMOD   _   _   _   _  

10   snake   snake   NN   pos=noun|num=s   O   8   PMOD   01726692-­‐n   _   _   _  

11   .   .   .   pos=punctuation|type=period   O   10   DEP   _   _   _   _  

 

Figure  3.  Example  of  the  English  part  of  the  factor-­‐based  training  data  in  CoNLL  format  

 (

#$relationAllExistsUnique

#$physicalParts

#$Snake

#$Tail-BodyPart

)

Figure  4.  Example  of  the  CycL  part  of  the  factor-­‐based  training  data  

 

The  idea  behind  factor-­‐based  SMT  is  to  use  additional  linguistic  information  in  order  to  reduce  the  possible  number   of   translation   equivalents   relying   on   e.g.   lemmas   instead   of   tokens,   sytactic   roles   of   words   or  phrases  and/or  their  semantic  roles.  We  have  noticed  in  the  previous  experiment  (D3.3.1)  that  majority  of  "non-­‐fluent"  CycL  statements  had  invalid  CycL  syntax  and  we  expected  that  additional  linguistic  information  about  syntactic  and/or  semantic  roles  would  help  TM  to  produce  adequate  CycL  relations  that  would  then  generate  the  CycL  statements  that  follow  the  CycL  syntax  more  closely.  

Page 16: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  16  of  (27)     ©  XLike  consortium  2012  –  2014    

5 Using  Moses  

The  SMT  system  we  used  for  the  first  part  of  this  advanced  experiment  was  an  open  source  SMT  systems  suite  Moses1.  The  Let'sMT!  platform2,  the  already  existing  platform  for  generating  SMT  systems  out  of  your  own   parallel   data,   was   used   for   the   first   part   of   building   the   final   prototype,   i.e.   training   with   a   larger  training   set   1277K.   Since   the   Let'sMT!   platform   in   its   present   form   doesn't   support   factor-­‐based   SMT  models,  we  had  to  use  our  own  Moses  installation  for  this  part  of  training  and  translation.  

 

5.1 Training  Moses  with  a  larger  training  set  

The  prepared  training  data  were  fed  into  the  Let'sMT!  platform  as  a  parallel  corpus  following  the  procedure  of  language  name  selection  and  data  adaptation  described  in  D3.3.1.  

 

   

Figure  5.  The  training  data  uploaded  as  a  parallel  corpus  to  Let'sMT!  platform  

 

The  en2cyc_1277k_train  parallel  corpus  was  used  to  train  several  SMT  systems  with  different   features   in  order  to  produce  the  final  version  of  the  system  En-­‐EnSemRep-­‐Model04  that  was  then  used  for  this  second  experiment  in  translation  from  English  to  CycL.  

                                                                                                                         1  http://www.statmt.org/moses/  2  http://www.letsmt.eu,  see  also  about  the  Let'sMT!  project  at  http://www.letsmt.org.  

Page 17: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  17  of  (27)      

   

Figure  6.  The  En-­‐EnSemRep-­‐Model04  available  for  translation  at  Let'sMT!  platform  

 

 

5.2 Training  Moses  with  factor-­‐based  models  

Since   Let'sMT!   platform   doesn't   support   factor-­‐based   translation   models,   we   used   our   own   MOSES  installation  to  produce  these  TMs.  The  training  on  a  moderate  server  (2  Xeons  @2.4  GHz  with  64  Gb  RAM)  took   several   days   up   to   a   week   for   every   TM.   This   is   uncomparable   to   Let'sMT!   platform   which   uses  Amazon  Cloud  services  in  a  flexible,  on-­‐demand  basis  where  non-­‐factor-­‐based  TMs  were  trained  in  several  hours.  

In  order  to  train  MOSES  for  factor-­‐based  SMT,  the  output  of  English  WP2  pipeline  (see  example  in  figures  3  and  4  in  section  4.2)  had  to  be  converted  into  MOSES  training  format  for  factor-­‐based  models.  In  the  figure  7  an  example  of  English  part  of  MOSES  format  is  shown.  

 exactly|3|NMOD one|3|NMOD tail|4|SBJ is|0|ROOT a|4|VC physical|7|NMOD part|4|PRD of|7|NMOD every|10|NMOD snake|8|PMOD .|10|DEP

Figure  7.  Example  of  the  English  part  of  the  factor-­‐based  training  data  in  MOSES  format  

 

For  three  different  factor-­‐based  models  we  used  selectively  different  types  of  data:  

1. TMsynt:  using  syntactic  functions  tags  (ROOT,  SBJ,  NMOD  etc.)  in  building  TM  

2. TMsrl:  using  semantic  roles  tags  (A0,  A1  etc.)  in  building  TM  

3. TMsynt+srl:  using  both,  syntactic  and  semantic  tags  in  bulding  TM  

 

 

Page 18: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  18  of  (27)     ©  XLike  consortium  2012  –  2014    

6 Evaluation  of  Translation  

This   section   describes   the   translation   and   evaluation   of   the   translation   produced   by   the   En-­‐EnSemRep-­‐Model04  SMT  system  and  factor-­‐based  SMT  systems.  

 

6.1 Translation  from  English  into  CycL  with  En-­‐EnSemRep-­‐Model04  

The  trained  SMT  systems  can  be  used  through  the  Let'sMT!  platform  service  Translate  only  if  they  are  in  the  running  state.  Similar  to  Google  Translate,  the  Let'sMT!  platform  opens  a  SL  box  (where  user  pastes  the  SL  text)   and   a   TL   box   (where   selected   running   SMT   system   provides   the   translation).   Entire   SL   files   can   be  translated  without  this  web  interface  as  well.  

 

 Figure  8.  Example  of  the  translation  from  English  to  CycL  

 

6.2 Translation  from  English  into  CycL  with  Factor-­‐based  models  

Our   installation   of  MOSES  was   set   up   in   the  way   that   if   the   factor-­‐based   TM  model   yields   a   translation  output  below  certain  threshold,   it  will  not  produce  the  translation,  but  return  the  SL  sentence  instead.   In  our   case   this  was   the   only   outcome  of   all   three   factor-­‐based   TMs  we   trained:   all   output  was   below   the  threshold  of  ###  and  no  TL  (i.e.  CycL  statement)  was  produced.  This  literally  means  that  the  output  would  be  humanly  evaluated  as  rubble.  Subsequently,  the  evaluation  of  this  output  was  not  possible.  

The  explanation  for  this  undesired  outcome  should  be  found  in  the  amount  of  training  data.  When  it  comes  to  the  usage  of  additional  linguistic  data,  the  number  of  training  examples  should  grow  even  at  higher  rate  for   factor-­‐based  TMs  than  for  non-­‐factor-­‐based  ones.  Only   in  this  way  MOSES  can  find  during   its   training  phase  enough  evidence  for  different  combinations  of  tokens  and  additional  linguistic  data.  Our  somewhat  naïve  assumption  was   that  having  1.2  million  of   sentences  with  highly   simplified  and   repetitive   syntactic  structure  of  CycL  statements,  would  supply  the  factor-­‐based  TMs  with  enough  evidence  to  generalise  and  produce   acceptable   CycL   statements   out   of   English   sentences.   However,   this   expectation   was   not  

Page 19: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  19  of  (27)      

confirmed  by   this  experiment.  Whether   it  would  be  possible  without  any  other   rule-­‐based  pre-­‐editing  or  post-­‐editing  procedures,  remains  to  be  checked  in  some  further  investigation.  

 

6.3 Evaluation  of  the  translation  quality  of  En-­‐EnSemRep-­‐Model04  

Like  in  D3.3.1,  for  SMT  output  provided  by  En-­‐EnSemRep-­‐Model04  we  applied  both  kinds  of  evaluation  of  the  MT  quality  that  are  used  in  MT  research:  automatic  evaluation  and  human  evaluation.  

 

6.3.1 Automatic  evaluation  of  En-­‐EnSemRep-­‐Model04  

At  the  end  of  the  training  process  the  Let'sMT!  platform  produces  automatic  evaluation  of  the  trained  SMT  system  using  the  standard  automatic  evaluation  measures  such  as  BLUE,  NIST,  TER  and  METEOR  scores.  

 

 

 Figure  9.  Automatic  evaluation  of  translation  quality  for  En-­‐EnSemRep-­‐Model02  (above)  and  En-­‐EnSemRep-­‐

Model04  (below)  SMT  systems  

 

As   it   can   be   seen   in   figure   9,   the   values   of   automatic   evaluation   scores   for   En-­‐EnSemRep-­‐Model04   are  higher  in  every  cell,  apart  of  METEOR  case  sensitive  score  which  is  slighlty  lower.  This  was  expected  since  the   1277K   training   set,   due   to   the  monotonous   nature   of   CycL   as   TL,   provided  more   evidence   for   SMT  system   and   this   is   exactly   why   we   took   this   direction   of   enlarging   the   training   set   and   TM   in   this   final  prototype.  Automatic  evaluation  measures,  that  were  developed  primarily  for  evaluation  of  TL  when  it  is  a  NL,  here  express  very  high  values.  Such  values  are  usually  obtained   for  SMT  systems   that  are   trained   for  translation  between  very  closely  related  natural  languages  (e.g.  Swedish  and  English,  Croatian  and  Serbian  etc.)  with  large  amount  of  regular  lexical  similarities  and  similar  word  order.  However,  the  main  reason  for  these  values  in  this  case  of  En-­‐EnSemRep-­‐Model04  should  be  seen  in  the  very  simple  formal  syntax  of  CycL  that  probably  artificially  boosts  the  automatic  evaluation  measures.  

The   omission   of   TER   Score   calculation   by   the   Let'sMT!   platform   for   En-­‐EnSemRep-­‐Model04   is   a   bit  surprising  since  it  was  calculated  previously  for  a  smaller  model.  Since  TER  (Translation  Error  Rate)  is  usually  defined  as  an  error  metric  for  MT  that  measures  the  number  of  edits  required  to  change  a  system  output  into  one  of  the  reference  outputs  [8],   it  might  be  that   internally  the  Let'sMT!  platform  has  a   limitation   in  number  of  allowed  edits.  Due  to  the  larger  number  of  editing  operations,  this  score  was  not  calculated.  If  we  were  interested  profoundly  in  the  SMT  output  evaluation,  we  would  have  give  this  fact  a  closer  look  and  

Page 20: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  20  of  (27)     ©  XLike  consortium  2012  –  2014    

try  to  explain  it,  but  here  we  are  interested  in  producing  a  SMT  system  that  can  translate  from  NL  to  FL  and  missing  one  of  four  automatic  evaluation  scores  didn't  affect  our  line  of  research  at  the  moment.    

As   with   the   early   prototype   evaluation   (as   described   in   D3.3.1),   we   also   didn't   stay   with   the   automatic  evaluation,  but  produced  a  human  evaluation  as  well.  

 

6.3.2 Human  evaluation  of  En-­‐EnSemRep-­‐Model04  

In   order   to   keep   the   results   comparable  with   the   early   prototype,   the   same   procedures   and   tools   as   in  D3.3.1  were   used   for   the   human   evaluation   in   this   final  machine   translation   based   semantic   annotation  prototype.   We   used   1,000   sentences   from   the   test   set   of   10,000   sentence   pairs   that   was   set   aside  previously   (see   Section   4.1).   This   human   evaluation   set   of   1,000   was   translated   using   En-­‐EnSemRep-­‐Model04   SMT   system   and   result   was   submitted   to   the   evaluation   process.   The   human   evaluation   was  performed   by   three   evaluators,   each   covering   one   third   of   human   evaluation   set   (i.e.   2x333   and   1x334  sentences).  

The   software  used   for  human  evaluation  was  Sisyphos   II,   an  open   source  MT  human  evaluation  package  produced  by  a  Munich-­‐based   LT   company   Linguatec  within   the  ACCURAT  project3,   as   a  part  of  ACCURAT  Toolkit   [7].   This   suite   of   programs   written   in   Java   enable   three   different   human   evaluation   scenarios:  Absolute  evaluation,  Comparative  evaluation  and  Postediting  evaluation.  

In  contrast  to  the  early  prototype  evaluation  where  only  Absolute  evaluation  was  possible  since  only  one  translation  was  produced,  here  we  used  the  Comparative  evaluation  approach  that  allowed  us  to  compare  the  quality  of  SMT  output  of  two  systems:  En-­‐EnSemRep-­‐Model02  and  En-­‐EnSemRep-­‐Model04.  

The  Comparative  evaluation  scenario  is  as  follows.  For  each  translated  sentence  Sisyphos  II  displays  the  SL  sentence   and   two   TL   sentences  without   any   trace   from  which   of   the   two   translations   the   sentence   has  been  selected.  In  this  way  the  possible  bias  of  human  evaluator  towards  the  first  of  the  second  translation  is  avoided.  The  human  evaluator  has  four  categories  to  put  his/her  judgement  in:  

1. First  translation  better;  

2. Both  equally  good;  

3. Both  equally  bad;  

4. Second  translation  better.  

 

                                                                                                                         3  http://www.accurat-­‐project.eu  

Page 21: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  21  of  (27)      

 Figure  10.  Sisyphos  II  screen  with  Comparative  evaluation  example  of  the  better  first  translation  

 Figure  11.  Sisyphos  II  screen  with  Comparative  evaluation  example  of  both  translations  equally  good  

Page 22: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  22  of  (27)     ©  XLike  consortium  2012  –  2014    

 Figure  12.  Sisyphos  II  screen  with  Comparative  evaluation  example  of  both  translations  equally  bad  

 

Cumulative  results  of  human  Comparative  evaluation  are  given  in  the  Table  1.  

 

Category   Occurences   Percentage  First  translation  better   253   25,3%  Both  equally  good   291   29,1%  Both  equally  bad   211   21,1%  Second  translation  better   245   24,5%  

 

Table  1.  Results  of  the  human  evaluation  of  translation  quality  of  1000  English  sentences  translated  into  CycL  by  En-­‐EnSemRep-­‐Model04  SMT  system    

 

Interpretation  of  results  from  the  Table  1  show  that  human  evaluation  scored  the  translation  quality  of  En-­‐EnSemRep-­‐Model04   SMT   system   much   lower   than   automatic   evaluation.   The   average   comparative  evaluation  score   falls   into  value  Both  equally  good   (but  close   to  Both  equally  bad)  and  the  distribution   is  almost  equal  between  all  four  categories.  This  means  that  like  in  the  first  experiment  with  En-­‐EnSemRep-­‐Model02  SMT  system,  a  good  part  of   content   from  English   sentences   is   conveyed   into  CycL,  but   it   is  not  done  following  the  strict  formal  syntax  of  this  FL.  This  also  means  that  translation  from  English  into  CycL,  as  it   is   performed   by   any   version   of   this   SMT   system,   is   not   immediately   applicable   for   usage   where  statements  with  clean  and  regular  CycL  syntax  are  expected.  Since   the  comparative  human  evaluation  of  output   of   smaller   and   larger   TM  model   (En-­‐EnSemRep-­‐Model02   vs.   En-­‐EnSemRep-­‐Model04)   didn't   yield  significant  difference  in  the  favour  of  the  larger  model,  we  can  tentatively  say  that  we  have  almost  reached  the   point   of   oversaturation   in   training   and   that   it   may   be   questionable   whether   more   training   data  wouldn't   start   introducing  noise.  Also,   regarding  system  efficiency,   the   training  of   the   larger   system  took  longer  and  requested  more  computational  resources  in  both,  training  and  translation  phases.  

 

Page 23: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  23  of  (27)      

6.4 Extrinsic  evaluation  

So  far,  we  have  performed  intrinsic  evaluation  where  the  quality  of  SMT  system  output  was  evaluated  on  the  evaluation   set  of   sentences   that  were   taken   from   the   same   source  as   the   training   set.  However,  we  wanted  to  check  how  will   this  SMT  system  behave  when  confronted  with  a  set  of   real   life  sentences,   i.e.  sentences  produced  by  humans   in  real  communicative  scenario.   In  order  to  check  this  we  have  randomly  selected   1000   English   sentences   appearing   in   the   on-­‐line   Bloomberg   news   from   the   same   day   and  translated  them  using  En-­‐EnSemRep-­‐Model04  from  English  (as  SL)  into  CycL  (as  TL).  The  TL  sentences  were  then  evaluated  by  humans  using  the  Absolute  evaluation  scenario  described  in  more  detail  in  D3.3.1.  

 

 Figure  13.  Sisyphos  II  screen  with  Absolute  evaluation  scenario  of  1000  Bloomberg  sentences:  Rubble  

example  

 

 Figure  14.  Sisyphos  II  screen  with  Absolute  evaluation  scenario  of  1000  Bloomberg  sentences:  Mainly  

nonfluent  example  

Page 24: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  24  of  (27)     ©  XLike  consortium  2012  –  2014    

 

 

Cumulative  results  of  human  Absolute  evaluation  of  translation  of  1000  Bloomberg  sentences  are  given  in  the  Table  2.  

 

Category   Value   Occurences   Percentage  Adequacy   Full  content  conveyed   62      6,2%     Major  content  conveyed   336   33,6%     Some  parts  conveyed   349   34,9%     Incomprehensible   253   25,3%  Fluency   Grammatical   39      3,9%     Mainly  fluent   124   12,4%     Mainly  non  fluent   358   35,8%     Rubble   479   47,9%  

 

Table  2  .  Results  of  the  human  evaluation  of  translation  quality  of  1000  Bloomberg  sentences  translated  into  CycL  by  En-­‐EnSemRep-­‐Model04  SMT  system    

 

Interpretation  of  results  from  the  Table  2  show  that  human  evaluation  scored  the  translation  quality  of  En-­‐EnSemRep-­‐Model04  SMT  system  over  1000  real-­‐life  sentences  with  average  Adequacy  of  value  Some  parts  conveyed  (but  close  to  Major  content  conveyed),  while  Fluency  would  fall  into  value  Rubble  (almost  48%  of  all   translations   are   CycL-­‐nonfluent,   thus   breaching   its   syntactic   rules   (mostly   due   to   the   mismatching  parenthesis).  This  means  that  good  part  of  content  from  English  sentences  is  conveyed  into  CycL,  but  it   is  not   done   following   the   strict   formal   syntax   of   this   FL.   This   also  means   that   translation   from  English   into  CycL,  as  it  is  performed  by  this  SMT  system,  is  not  immediately  applicable  for  usage  where  statements  with  clean  and  regular  CycL  syntax  are  expected.  This  can  be  seen  if  compared  to  the  Absolute  evaluation  of  the  first  evaluation  scenario  (D3.3.1).  The  number  of  Full  content  conveyed  sentences  dropped  from  20.9%  to  6.2%  which  is  almost  15  percentage  points  less.  The  number  of  Rubble  CycL  sentences  grew  from  40.7%  to  47.9%,  i.e.  for  more  than  7  percentage  points  while  number  of  Mainly  non  fluent  CycL  sentences  also  grew  from  24.4%   to  35.8%,   i.e.   for  more   than  11  percentage  points.   Bottom   line   is   that  more   than  83%  of   all  English  sentences  are  translated  either  as  Mainly  non  fluent  or  complete  Rubble  in  CycL.  This  indicates  that  application  of  SMT  techniques  in  this  scenario  will  not  yield  directly  useful  results  for  semantic  annotation  of  NL  sentences.  

 

 

Page 25: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  25  of  (27)      

7 Conclusion  

With  this  deliverable  we  have  reported  on  the  following  experiments  that  attempted  to  use  SMT  system  for  translation   from  NL  as   SL   into   FL   as   TL.   The  CycL  was   the   FL  of  our   choice  because   the   training  material  could  have  been  produced  in  a  non-­‐expensive  way  by  generating  from  Cyc  Ontology  a  set  of  aligned  pairs  of  English  sentences  with  their  respective  CycL  "sentences"  as  counter  parts.  

This   parallel   corpus   served   as   a   training   material   for   Moses   based   SMT   system   that   was   used   as   the  advanced  or  final  prototype.  

Judging  by  automatic  evaluation  procedure,  the  scores  of  three  standard  automatic  MT  evaluation  metrics  (BLEU,  NIST  and  METEOR)  could  guarantee  high  quality  translation  since  these  scores  were  higher  for  the  enlarged  TM  that  was  used  to  build  a  En-­‐EnSemRep-­‐Model04  system  than  the  previous  early  prototype  En-­‐EnSemRep-­‐Model02   system.   However,   human   evaluation   applied   intrinsically   in   comparative   evaluation  scenario  yielded  results   that  displayed  slightly  better  performance  of   the  enlarged  TM,   i.e  En-­‐EnSemRep-­‐Model04  over  En-­‐EnSemRep-­‐Model02.  

On  top  of  that,  the  human  evaluation  was  applied  extrinsically  in  the  absolute  evaluation  scenario  on  1000  sentences   randomly   selected   from   Bloomberg   texts.   This   evaluation   has   shown   that   application   of   En-­‐EnSemRep-­‐Model04  brought  a  drop  in  number  of  sentences  evaluated  as  conveying  the  full  content,  while  the  CycL  rubble  statements  grew  to  more  than  83%.  

After   this   figures  of  extrinsic  evaluation,   it   can  be   said   that   this   approach  couldn't  be  useful   in   the  XLike  processing  platform  at  the  present  stage  taking  into  account  the  processing  at  the  sentence  level.  Whether  this  approach  could  yield  more  acceptable  results  at  the  level  of  paragraphs  or  whole  documents,  remains  to  be   checked,  but   the   current  experimental   settings  didn't   allow   the  engagement   such   large   computing  resources.  

 

Page 26: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

XLike   Deliverable  D3.3.2  

Page  26  of  (27)     ©  XLike  consortium  2012  –  2014    

References  

[1] Brown,  P.  F.;  Della  Petra,  S.  A.;  Della  Pietra,  V.  J.;  Mercer,  R.  L.  (1993)  The  Mathematics  of  Statistical  Machine   Translation:   Parameter   Estimation,   Computational   Linguistics,   volume   19,   number   2,   pp.  263-­‐311.  

[2] Och,   F.   J.;   Ney,   H.   (2003)   A   Systematic   Comparison   of   Various   Statistical   Alignment   Models,  Computational  Linguistics,  volume  29,  number  1,  pp.  19-­‐51.  

[3] Koehn,   P.;  Hoang,  H.;   Birch,   A.;   Callison-­‐Burch,   C.;   Federico,  M.;   Bertoldi,  N.;   Cowan,   B.;   Shen,  W.;  Moran,  C.;  Zens,  R.;  Dyer,  C.;  Bojar,  O.;  Constantin,  A.;  Herbst,  E.  (2007)  Moses:  Open  Source  Toolkit  for  Statistical  Machine  Translation,  Annual  Meeting  of  the  Association  for  Computational  Linguistics  (ACL2007),  demonstration  session,  Prague,  Czech  Republic.  

[4] Knight,  K.;  Koehn,  P.  (2003)  What’s  New  in  Statistical  Machine  Translation  (Tutorial  slides),  University  of   Southern   California,   pp.   4   (http://people.csail.mit.edu/people/koehn/publications/  tutorial2003.pdf,  accessed  2013-­‐12-­‐20).  

[5] Davies,   J.;   Grobelnik,   M.;   Mladenić,   D.   (eds.)   (2009)   Semantic   Knowledge   Management,   Springer,  Berlin-­‐Heidelberg.  

[6] Buitelaar,  P.;  Cimiano,  P.  (eds.)  (2008)  Ontology  Learning  and  Population:  Bridging  the  Gap  between  Text  and  Knowledge,  IOS  Press,  Amsterdam.  

[7] Pinnis,   M.,   Ion,   R.,   Ştefănescu,   D.,   Su,   F.,   Skadiņa,   I.,   Vasiļjevs,   A.,   &   Babych,   B.   (2012)   ACCURAT  Toolkit  for  Multi-­‐Level  Alignment  and  Information  Extraction  from  Comparable  Corpora.  Proceedings  of  the  ACL  2012  System  Demonstrations,  ACL,  Jeju,  South  Korea,  pp.  91–96.  

[8] Snover,  M.,  Dorr,  B.,  Schwartz,  R.,  Micciulla,  L.,  Makhoul,   J.   (2006)  A  Study  of  Translation  Edit  Rate  with   Targeted   Human   Annotation.   Proceedings   of   Association   for   Machine   Translation   in   the  Americas.  

[9] Zettlemoyer,  Luke  S.;  Collins,  Michael.  (2005)  Learning  to  map  sentences  to  logical  form:  Structured  classification   with   probabilistic   categorial   grammars.   Proceedings   of   the   21st   Conference   on  Uncertainty  in  Artificial  Intelligence.  

[10] Zettlemoyer,  Luke  S.;  Collins,  Michael  (2007)  Online  learning  of  relaxed  CCG  grammars  for  parsing  to  logical   form.   Proceedings   of   the   Joint   Conference   on   Empirical   Methods   in   Natural   Language  Processing  and  Computational  Natural  Language  Learning.  

[11] Zettlemoyer,  Luke  S.;  Collins,  Michael  (2009)  Learning  Context-­‐Dependent  Mappings  from  Sentences  to   Logical   Form.   Proceedings   of   the   47th   Annual   Meeting   of   the   Association   for   Computational  Linguistics  (ACL2009).  

[12] Thompson,   Cynthia   A.;  Mooney,   Raymond   J.   (2003)   Acquiring  word-­‐meaning  mappings   for   natural  language  interfaces.  Journal  of  Artificial  Intelligence  Research,  18:1–44.  

[13] Wong,  Yuk  Wah;  Mooney,  Raymond  J.  (2006)  Learning  for  semantic  parsing  with  statistical  machine  translation.  Proceedings  of  Human  Language  Technology  Conference  and  North  American  Chapter  of  the  Association  for  Computational  Linguistics  Annual  Meeting  (HLT-­‐NAACL-­‐2006),  pp.  439–446.  

[14] Wong,  Yuk  Wah;  Mooney,  Raymond  J.  (2007)  Learning  synchronous  grammars  for  semantic  parsing  with   lambda   calculus.   In   Proceedings   of   the   45th   Annual   Meeting   of   the   Association   for  Computational  Linguistics  (ACL-­‐2007),  pp.  960–967.  

[15] Kate,  R.  J.,  &  Mooney,  R.  J.  (2006)  Using  string-­‐kernels  for  learning  semantic  parsers.  Proceedings  of  the   21st   International   Conference   on   Computational   Linguistics   and   44th   Annual   Meeting   of   the  Association  for  Computational  Linguistics  (COLING/ACL-­‐06),  pp.  913–920.  

[16] Kate,   R.   J.,   &   Mooney,   R.   J.   (2007a)   Learning   language   semantics   from   ambiguous   supervision.  Proceedings  of  the  Twenty-­‐Second  Conference  on  Artificial  Intelligence  (AAAI  2007),  pp.  895–900.  

Page 27: D3.3.2 Final machine translation based semantic …...Deliverable+D3.3.2+ XLike+ ©XLike+consortium+2012a+2014+ Page3+of+(27)+ + ExecutiveSummary! The+main+goal+of+the+XLike+project+is+to+extract+knowledge+from

Deliverable  D3.3.2   XLike  

©  XLike  consortium  2012  -­‐  2014   Page  27  of  (27)      

[17] Kate,   R.   J.,   &  Mooney,   R.   J.   (2007b)   Semi-­‐supervised   learning   for   semantic   parsing   using   support  vector   machines.   Proceedings   of   Human   Language   Technologies:   The   Conference   of   the   North  American  Chapter  of  the  Association  for  Computational  Linguistics  (NAACL-­‐HLT-­‐07),  pp.  81–84.  

[18] Luke  S.  Zettlemoyer  (2009)  Learning  to  Map  Sentences  to  Logical  Form,  PhD  Thesis,  MIT.  

[19] Wong,   Yuk   Wah   (2007)   Learning   for   Semantic   Parsing   and   Natural   Language   Generation   Using  Statistical  Machine  Translation  Techniques,  PhD  Thesis,  University  of  Texas,  Austin.  

[20] Kate,   Rohit   Jaivant   (2007)   Learning   for   Semantic   Parsing   with   Kernels   under   Various   Forms   of  Supervision,  PhD  Thesis,  University  of  Texas,  Austin.  

[21] Baral,  Chitta;  Dzifcak,  Juraj;  Alvarez  Gonzalez,  Marcos;  Zhou,  Jiayu  (2011)  Using  Inverse  lambda  and  Generalization  to  Translate  English  to  Formal  Languages.  Proceedings  of  International  Conference  on  Computational  Semantics  (IWCS)  2011,  Oxford,  pp.  35-­‐44.