ap metadata services, semtechbiz 2012

17
AP Metadata Services Amy Sweigert SemTechBiz June 6, 2012

Upload: amysweigert

Post on 06-Jul-2015

966 views

Category:

Technology


0 download

DESCRIPTION

Overview of AP Metadata Services presented at Semantic Technology and Business, June 2012, San Francisco.

TRANSCRIPT

Page 1: AP Metadata Services, SemTechBiz 2012

AP Metadata Services

Amy SweigertSemTechBizJune 6, 2012

Page 2: AP Metadata Services, SemTechBiz 2012

About the Associated Press

– AP is a not-for-profit news cooperative, owned by US newspaper and broadcast members, founded in 1846

– AP news content is seen by half the world’s population on any given day

– We process and deliver 100k+ content items daily

– AP, member and third-party content

– Text, photos, audio, multimedia interactives, and broadcast and online quality video

– Primarily B2B

Page 3: AP Metadata Services, SemTechBiz 2012

Evolution of AP Metadata Services

2006

• Initial taxonomy and rule development starts

2007

• Automated tagging of Subjects, People, Companies starts

2009-2010

• Scope and depth of coverage increases

• Platform stabilized

2011

• RDF modeling

• API development

• Pilot offering

2012

• AP Metadata Services Launch

2008

• Automated tagging of Companies, Organizations, Geography, Events starts

Page 4: AP Metadata Services, SemTechBiz 2012

Introducing AP Metadata Services

– Semantic Web services to drive the next generation of news delivery and consumption:

– AP News Taxonomy

– AP Tagging Service

– B2B service with continuing investment and human curation

– Ongoing and frequent updates to tagging rules, entities, concepts and their semantic relationships

– Designed to meet AP’s exacting needs for its own content

Page 5: AP Metadata Services, SemTechBiz 2012

What Does Rich Metadata Do for Publishers?

– Connect customers with more relevant content through:

– Improved search and discovery

– Automated aggregation, syndication and distribution of related content

– Richer and more relevant content products and services

– Reduced time to market for new products and services

– Reduces editorial workload, creates efficiencies

– Content interoperability

Page 6: AP Metadata Services, SemTechBiz 2012

• Site delivered ~5,000

articles and ~20,000 photos

over 2 months

• Routing and display of

content by team and

conference is automated

• Editorial resources are

focused on curating only

the most important parts of

the site

• Enables user experience

that would not be possible

without automated,

standard metadata

Page 7: AP Metadata Services, SemTechBiz 2012

The AP News Taxonomy

– Breadth and depth to support news and current events

– Defines rich semantic metadata specific to news

– Generic subjects and hierarchy

– Named entities

– Relationships, synonyms, additional entity data

– Delivers automated notifications of taxonomy changes

– New terms, deprecated terms, name changes, etc.

Page 8: AP Metadata Services, SemTechBiz 2012

The AP Tagging Service

– Software as a Service

– Leverages AP investment and expertise

– Tags concepts; more than entity extraction

– Automated tagging tied to AP News Taxonomy ensures more consistent, comprehensive results

Page 9: AP Metadata Services, SemTechBiz 2012

Coverage

– 4200 Subjects

– 2100 Geographic locations

– 1200 Organizations

– 91,000 People

– 41,000 Publicly-traded Companies

– Supports English language content

Top Level Subject Areas:

• Arts and Entertainment

• Business

• Demographic groups

• Environment and Nature

• Events

• General News

• Government and Politics

• Health

• Lifestyle

• Living Things

• Media

• Science

• Social Affairs

• Sports

• Technology

Page 10: AP Metadata Services, SemTechBiz 2012

A Foundation of Semantic Web Standards

– URIs for all entities and topics

– Taxonomy modeled in RDF

– SKOS Ontology

– Supplemented with other ontologies (Dublin Core, DBPedia, FOAF, GeoNames, etc.)

– Some AP-specific properties

– Taxonomy and Tagging Service accessible via RESTful APIs

– Using a SPARQL end-point internally to provide views of the taxonomy

Page 11: AP Metadata Services, SemTechBiz 2012

Supported Formats

AP Tagging Service

– Input formats

– Plain Text

– Simple XML: XML encoded content

e.g. XHTML, NITF, NewsML, NewsML-G2

– Output formats

– RDF/XML

– RDF/JSON

– RDF/Turtle

– Simple XML

– NewsML-G2

AP Taxonomy

– Taxonomy Output Format

– RDF/XML

– RDF/Turtle

– RDF/JSON

– NewsML-G2

– Taxonomy Change Log Output formats

– XML

– CSV

Page 12: AP Metadata Services, SemTechBiz 2012

Metadata Services in AP’s Content Lifecycle

Metadata Services

• Taxonomy fed to editorial tools

• Automated tagging applies subject and entity metadata from taxonomy

• Rich relationships between subjects, entities

• Metadata used to deliver targeted feeds, auto-publish and improve search and browse experience

Products defined based on rich

metadata

3rd partycontent

AP EditorialContent(Input) AP Tagging Service

applies standard values and related data

Content Repository

Standard AP News

Taxonomy values

Distribution methods:Internet syndicationWeb portalsAPIs

Page 13: AP Metadata Services, SemTechBiz 2012

<skos:Concept rdf:about="http://cv.ap.org/id/11AD96CF0A5149C5B3909F5BE9A5494A">

<skos:prefLabel xml:lang="en">Scott Walker</skos:prefLabel>

<ap:associatedState rdf:resource="http://cv.ap.org/id/1BC1BC3082C81004896CDF092526B43E" />

<ap:entryTerm xml:lang="en">Scott K. Walker</ap:entryTerm>

<ap:entryTerm xml:lang="en">Scott Kevin Walker</ap:entryTerm>

<ap:isPlaceholder rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</ap:isPlaceholder>

<dbpedia-owl:party rdf:resource="http://cv.ap.org/id/BF6E2E80760D10048F8AE6E7A0F4673E" />

<dbprop:birthdate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">1967-11-02</dbprop:birthdate>

<dcterms:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-11-01T10:23:29-05:00</dcterms:created>

<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-02-26T10:14:13-05:00</dcterms:modified>

<rdf:type rdf:resource="http://cv.ap.org/c/Politician" />

<skos:altLabel xml:lang="en">Scott K. Walker</skos:altLabel>

<skos:altLabel xml:lang="en">Scott Kevin Walker</skos:altLabel>

<skos:broader rdf:resource="http://cv.ap.org/id/C9D7FA107E4E1004847ADF092526B43E" />

<skos:definition xml:lang="en">45th Governor of Wisconsin. Milwaukee, Wisconsin County Executive. US Republican member of the Wisconsin State Assembly.</skos:definition>

<skos:inScheme rdf:resource="http://cv.ap.org/a#person" />

</skos:Concept>

RDF/XML representation of Scott Walker, Governor of Wisconsin

Page 14: AP Metadata Services, SemTechBiz 2012

- <ClassificationResults>

<DocumentId>C495D353258440B487279767F9A16D02</DocumentId>

<DocumentDate>2012-06-06T15:59:46-05:00</DocumentDate>

- <Entities>

- <Entity>

<Authority>AP Person</Authority>

<AuthorityVersion>3420</AuthorityVersion>

<Name>LeBron James</Name>

<Id>http://cv.ap.org/id/7c05129d1a1741af8bcc326c9459545c</Id>

- <Properties>

<PersonType>Professional Athlete</PersonType>

<PersonType>Sports Figure</PersonType>

<Team>Miami Heat</Team>

</Properties>

</Entity>

-

Subset of tags returned for article about NBA Finals game, in Simple XML format

Page 15: AP Metadata Services, SemTechBiz 2012

- <Entity>

<Authority>AP Organization</Authority>

<AuthorityVersion>3412</AuthorityVersion>

<Name>Miami Heat</Name>

<Id>http://cv.ap.org/id/8a85be975bf94cd18836b6eb5a1f6391</Id>

</Entity>

- <Entity>

<Authority>AP Organization</Authority>

<AuthorityVersion>3412</AuthorityVersion>

<Name>NBA Eastern Conference</Name>

<Id>http://cv.ap.org/id/4a653a1806bc49518c5e667120a283e3</Id>

</Entity>

- </Entities>

-

Subset of tags returned for article about NBA Finals game, in Simple XML format, cont.

Page 16: AP Metadata Services, SemTechBiz 2012

<Subjects>

- <Subject>

<Authority>AP Subject</Authority>

<AuthorityVersion>3415</AuthorityVersion>

<Name>NBA basketball</Name>

<Id>http://cv.ap.org/id/6c01c3e08c8010048288a13d9888b73e</Id>

</Subject>

- <Subject>

<Authority>AP Subject</Authority>

<AuthorityVersion>3415</AuthorityVersion>

<Name>NBA Finals</Name>

<Id>http://cv.ap.org/id/fd862c8beea14e189c9a5617cf5c379c</Id>

</Subject>

Subset of tags returned for article about NBA Finals game, in Simple XML format, cont.