notube: models & semantics

23
Monday, March 26, 2012

Upload: notubeproject

Post on 01-Nov-2014

478 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: NoTube: Models & Semantics

Monday, March 26, 2012

Page 2: NoTube: Models & Semantics

WP1  Overview

• “Backend” shared datasets and services• Mappings, integration and common vocabulary• Extra datasets to support usecase scenarios

2

Monday, March 26, 2012

Page 3: NoTube: Models & Semantics

WP1:  Year  3  Direc2on  &  Achievements

• Moving  from  single  ‘warehouse’  to  distributed  set  of  databases,  datasets  and  services

• Planning  for  sustainable  life-­‐aFer-­‐project• Integra2ng  feedback  from  end-­‐to-­‐end  demos

3

Monday, March 26, 2012

Page 4: NoTube: Models & Semantics

4

Monday, March 26, 2012

Page 5: NoTube: Models & Semantics

Why  WP1?  two  roles

• NoTube  internal:  a  hub  for  data  sharing• NoTube  external:  show  how  shared  datasets  and  vocabularies  help  with  user-­‐facing  “Web  and  TV”  problems

• “show”  -­‐cri2cally-­‐  includes  “thinking  out  loud”  as  we  explore,  via  blog,  email,  twiTer  etc.– scholarly  ar2cles  rarely  reach  our  target  audiences

5

Monday, March 26, 2012

Page 6: NoTube: Models & Semantics

Outreach  message

• Let  metadata  flow  widely  -­‐  adver2sing  content,  rather  than  be  a  hidden  asset

• Iden/fy  and  link  content  with  useful  URLs(*)• Open  APIs  to  control  TV  and  link  devices  [WP7c]

6

...from W3C TV & Web position paper (with Project Baird), Berlin 9 Feb 2011

WP1 concerned primarily with the first two: getting metadata into the Web from source, rather than scraping, guessing, approximating.

Monday, March 26, 2012

Page 7: NoTube: Models & Semantics

Aside:  RDFa  went  mainstream

• Try  ‘View  source’  on  IMDB,  RoTen  Tomatoes,  BBC,  tv.com  sites  to  find  RDF  descrip2ons  of  TV  content.  

• NoTube’s  approach  was  to  lead  by  example,  to  engage  with  industry  and  to  plan  from  the  beginning  for  the  ‘aFerlife’.

• This  strategy  worked.

7

Monday, March 26, 2012

Page 8: NoTube: Models & Semantics

8

Facebook OGP

tv.com 'The Wire' page

...simple, extensible standards are being adopted

OGP since 2010; schema.org since 2011...

Monday, March 26, 2012

Page 9: NoTube: Models & Semantics

TV  Data  Warehouse

• We  s2ll  host  several  crawls  of  TV  EPG  data• Trend  is  for  data  to  be  more  cleanly  available  from  source,  without  scraping

• Crawling,  aggrega2on  and  integra2on  s2ll  useful,  but  less  scraping  required

• Crawled  'data  warehouse'  also  used  as  a  research  testbed  collec2on

9

Monday, March 26, 2012

Page 10: NoTube: Models & Semantics

WP1:  Example  Datasets  

• WP7c/WP3  use  DBpedia/Wikipedia  URLs  for  topics;  covers  all  mainstream  areas.    

• BBC  also  using  Lonclass/UDC  topic  codes  (we’re  helping  prepare  this  for  sharing)

• For  Music,  we  adopt  MusicBrainz  IDs• Mapping  diverse  representa2ons  of  ‘genre’• “Organic”  item/topic  similarity  measures  derived  from  user  data  from  WP3

10

Monday, March 26, 2012

Page 11: NoTube: Models & Semantics

WP1:  Data  Services

• Data  Services  exposed  as  sta2c  files:– Show  how  to  embed  RDFa  in  HTML– Publish  as  RDF/XML  Linked  Data

• Interac2ve  Data  Services:– Using  W3C  SPARQL,  SQL  or  SOLR/Lucene,  over  HTTP  and/or  XMPP.

11

Monday, March 26, 2012

Page 12: NoTube: Models & Semantics

WP1:  Exploita2on  and  Sustainability

• WP1’s  approach  designed  to  outlive  NoTube• Use,  augment  and  contribute  to  external  data

– e.g.  DBpedia,  Archive.org,  W3C  &  wider  Web  of  data  trend  (e.g.  RDFa  adop2on)

– also  we  demonstrate  e.g.  on  blog  how  we  did  it  -­‐  so  others  can  replicate  it

– WP4  enrichments  can  be  fed  back  to  externals,  e.g.  similarity  metrics  &  clusters

12

Monday, March 26, 2012

Page 13: NoTube: Models & Semantics

WP1:  Sustainability  2• NoTube’s  2010  W3C  “Web  &  TV”  posi2on  paper  lobbied  for  unique  IDs  &  public  metadata  for  video  content;  this  is  now  going  mainstream.

• VUA  will  con2nue  hos2ng  some  data,  using  PURL.org  so  can  pass  e.g.  to  W3C  later.

• Collab  with  Facebook  OGP  (helped  with  their  RDFa  adop2on)  and  now  search  engine's  Schema.org  (RDFa  and  extending  TV  vocab).

13

Monday, March 26, 2012

Page 14: NoTube: Models & Semantics

14

schema.org

Monday, March 26, 2012

Page 15: NoTube: Models & Semantics

Workpackage  Links

• Background  data  for  all  Workpackages• Collaborated  with  WP2  on  BMF  RDF  models• Closer  2es  throughout  WP3/7  developments• WP4  en2ty  and  topic  URIs  point  to  WP1• Outreach  work  around  RDFa,  Posi2on  Paper  

15

Monday, March 26, 2012

Page 16: NoTube: Models & Semantics

2nd  review  comments

• Not  clear  though  how  this  work  has  built  upon  the  results  of  year  1,  and  how  the  current  progress  is  in  line  with  the  case  studies.  – Worked  more  closely  and  pragma1cally  with  case  studies  in  

WP7,  especially  7c  and  related  WP3  work.  Moved  towards  more  decentralised  model,  instead  of  'warehouse'.

– 7c  collabora1on  with  KMI's  'Watch  and  Buy'  scenario,  and  with  WP4  1med  ad  inser1on  work,  used  EU  p2pnext  'limo'  work;  also  egtaMETA  from  EBU  from  7c

– WP1  work  became  more  "hands-­‐on";  we  helped  WP7  extract  datasets  such  as  TED.com  and  Archive.org  which  we  expect  will  shortly  be  replaceable  by  cleaner  informa1on  from  'official'  sources.  

16

Monday, March 26, 2012

Page 17: NoTube: Models & Semantics

2nd  review  comments

• No  relevant  state  of  the  art  is  documented  and  no  details  or  cita<ons  on  automated  algorithms  are  given.  Evalua<on  is  restricted  to  examples  and  no  quan<ta<ve  data  are  given.– We  accept  weakness  in  report  (lack  of  scholarly/scien1fic  detail);  chose  to  focus  on  more  informal  communica1on  with  outside  world  in  final  phase.  A  2nd  version  of  the  doc  was  produced,  but  main  changes  were  around  'life  aUer  project'  themes  rather  than  adding  more  scien1fic  and  scholarly  detail.

17

Monday, March 26, 2012

Page 18: NoTube: Models & Semantics

2nd  review  comments

•  A  close  collabora5on  with  WP7  is  recommended  in  order  to  ensure  that  work  meets  the  requirements  of  the  use  cases.– this  very  well  describes  our  emphasis  in  final  phase

18

Monday, March 26, 2012

Page 19: NoTube: Models & Semantics

Lessons  Learned

• It's  hard  to  simulate  an  evolving  global  data  ecosystem;  but  we've  played  a  small  part  in  some  huge  changes.

• Publishers  will  adopt  simple  Seman2c  Web  standards  when  they  are  given  an  incen5ve.

• It's  hard  for  a  4-­‐year  old  plan  to  stay  relevant  in  such  an  environment;  ability  to  be  agile  was  cri2cally  important.

19

Monday, March 26, 2012

Page 20: NoTube: Models & Semantics

WP1  Summary

• Used  open  standards  (RDF)  and  largely  open  data  (e.g.  Wikipedia/DBpedia)

• Integrated,  mapped  and  data-­‐mined• Contribu1ng  our  addi1ons  back  to  the  community  /  

commons  (highlight:  BBC  sims)• Documen1ng  what  we  learned  for  external  developers  and  

subsequent  projects

20

Questions?

Monday, March 26, 2012

Page 21: NoTube: Models & Semantics

21

Monday, March 26, 2012

Page 22: NoTube: Models & Semantics

22

Monday, March 26, 2012

Page 23: NoTube: Models & Semantics

WP1:  End-­‐to-­‐End  issues

• In  final  year,  our  End-­‐to-­‐End  scenarios  have  more  mature  implementa2ons

• Feedback  from  WP3/7c:  key  issue  is  sparsity  of  large  vocabularies  when  used  for  record  matching.  No  single  solu2on  here.

• Integra2ng  techniques  from  WP4  (e.g.  clustering,  data-­‐mining)  cri2cal  for  applying  large  and  chao2c  vocabularies  for  prac2cal  recommenda2ons.

23

Monday, March 26, 2012