iptc and the semantic web: two paths and seven lessons

12
IPTC and The Semantic Web: Two Paths and Seven Lessons Stuart Myles Associated Press 29 th June 2010

Post on 19-Oct-2014

4.041 views

Category:

Technology


1 download

DESCRIPTION

IPTC is exploring the use of Semantic Web for the news industry. This is the report I gave to the IPTC's 2010 AGM in San Francisco. We decided that there are two paths into the Semantic Web - creating Linked Data (using SKOS) and creating a news ontology based on NewsML-G2 (using OWL). Here are the seven lessons we've learned so far in our exploration.

TRANSCRIPT

Page 1: IPTC and the Semantic Web: Two Paths and Seven Lessons

IPTC and The Semantic Web:Two Paths and Seven Lessons

Stuart MylesAssociated Press

29th June 2010

Page 2: IPTC and the Semantic Web: Two Paths and Seven Lessons

Semantic Web News Vocabularies

© 2010 IPTC (www.iptc.org) All rights reserved 2

Best known RDF vocabularies areFOAF = Friend of a Friendhttp://xmlns.com/foaf/spec/DCMI Terms = Dublin Core Metadata Initiative Termshttp://dublincore.org/Other examples at http://vocab.org/

New York Times, Dow Jones and others have identified a need for a news vocabulary

IPTC decided to experiment with semantic web and linked data

Held a series to teleconferences to make rapid progress

Page 3: IPTC and the Semantic Web: Two Paths and Seven Lessons

Two Paths to the Semantic Web• We identified two paths into the Semantic Web world:

• Create a news ontology, based on NewsML-G2– Formal semantics for news, specified using OWL– “RDFization” of IPTC’s family of news standards

• Turn IPTC subject codes into Linked Data– Connect related data across the web using URIs, HTTP & RDF– A set of principles from Tim Berners Lee

http://www.w3.org/DesignIssues/LinkedData.html

• We decided to pursue the Linked Data path first

© 2010 IPTC (www.iptc.org) All rights reserved 3

Page 4: IPTC and the Semantic Web: Two Paths and Seven Lessons

Following the Linked Data Path

• The Linked Data principles, as specified by TBL– Use URIs as names for things – Use HTTP URIs so that people can look up those names. – When someone looks up a URI, provide useful information,

using the standards (RDF, SPARQL) – Include links to other URIs, so that they can discover more

things

• Apply the principles to IPTC’s subject codes– Already published as XML (G2 Knowledge Items)– And as HTML– The plan: convert XML into RDF

© 2010 IPTC (www.iptc.org) All rights reserved 4

Page 5: IPTC and the Semantic Web: Two Paths and Seven Lessons

Lesson #1One Model, Multiple Vocabularies

• RDF is a single model - Subject Predicate Object• With multiple syntaxes

– We selected RDF/XML and RDF/Turtle• And multiple “vocabularies”

– Such as SKOS, Dublin Core• SKOS = Simple Knowledge Organization System

– http://www.w3.org/2004/02/skos/– Designed for representing thesauri and classification schemes

• The Semantic Web “way” is– Use existing vocabularies as much as possible– When you invent a new term, link it to existing terms

• We decided to use SKOS and DC as the main vocabs

© 2010 IPTC (www.iptc.org) All rights reserved 5

Page 6: IPTC and the Semantic Web: Two Paths and Seven Lessons

Lesson #2Tool Support

• The approach:– Use RDF in general– Reuse existing vocabularies in particular

• The benefit:– Tools “just work”

• We learnt that this is mostly true…– We played with Protogee, TopBraid, Sesame

• Most things worked well in all tools– But “transitive” versions of SKOS broader, narrower aren’t

supported well– Late additions to SKOS standard

© 2010 IPTC (www.iptc.org) All rights reserved 6

Page 7: IPTC and the Semantic Web: Two Paths and Seven Lessons

Lesson #3Basics Well Documented

• In general, IPTC KnowledgeItems map well to RDF– SKOS concepts– Dublin Core properties

• Certain KI properties don’t have a direct mapping– Created and updated timestamps of KnowledgeItem properties

• Difficult to determine more advanced mappings– SKOS wiki had some documentationhttp://esw.w3.org/SkosCoreGuideToc/SectionVersioning– SKOS email list seems dormant– SemanticOverflow a great way to get questions answeredhttp://www.semanticoverflow.com/questions/902/adding-created-modified-properties-to-skos-do-i-need-to-reify

© 2010 IPTC (www.iptc.org) All rights reserved 7

Page 8: IPTC and the Semantic Web: Two Paths and Seven Lessons

Lesson #4Pull is Better than Push

• One possibility is to “push” our model into RDF– Try to preserve all the original semantics– But you don’t gain as much in out-of-the-box tool support

• The other possibility is to “pull” the model into RDF– May lose some nuances– But you gain in reuse – of modeling patterns, vocabularies and

tool support

(In fact, there was some dispute over the intended model of the IPTC KnowledgeItem properties)

© 2010 IPTC (www.iptc.org) All rights reserved 8

Page 9: IPTC and the Semantic Web: Two Paths and Seven Lessons

Lesson #5Linking and Mapping

“Include links to other URIs, so that they can discover more things”

• Linking is the heart of linked data• But linking is more like mapping

– owl:sameas seems to have unintended consequences– SKOS’s mapping properties offer a range of options

• closeMatch, exactMatch, broadMatch, narrowMatch, relatedMatch• http://www.w3.org/TR/skos-reference/#mapping

• We decided to map the 17 top level IPTC subject codes to DBPedia– Some top level terms are really “umbrella” terms – difficult to

map to a single equivalent

© 2010 IPTC (www.iptc.org) All rights reserved 9

Page 10: IPTC and the Semantic Web: Two Paths and Seven Lessons

Lesson #6There’s More to be Done

• Although we rapidly produced a Linked Data prototype, it is incomplete– Content negotiation requires work from the APA hosting– We need to think through and approve the details of the

mapping• The other path remains unexplored

– Building a news ontology, based on NewsML-G2– Can we leverage the work that EBU have already done?

• What about other formats?– Particularly RDFa

© 2010 IPTC (www.iptc.org) All rights reserved 10

Page 11: IPTC and the Semantic Web: Two Paths and Seven Lessons

Lesson #7There’s a Lot of Interest

• High attendance at the Semantic Web IPTC calls– Even though the topic is a bit complex and unfamiliar to most

• Participation was brisk– We rapidly developed RDF/XML and RDF/Turtle representations

• Occasional mentions on Twitter generated a lot more retweets and replies than other IPTC-related tweets

• There’s a lot of interest inside and outside the IPTC

© 2010 IPTC (www.iptc.org) All rights reserved 11

Page 12: IPTC and the Semantic Web: Two Paths and Seven Lessons

IPTC and Semantic Web:Next Steps

• Complete Linked Data mapping of IPTC Subject Codes and Media Codes

• Explore creating a News Ontology– Find out more about EBU’s work

• Start RDFa representation of news metadata

• Reach out to the broader Semantic Web and news communities for feedback and collaboration

• REQUEST to Standards Chair:– Can we formalize this effort into an official IPTC Working Group?

© 2010 IPTC (www.iptc.org) All rights reserved 12