semantic markup using schema.org

37
Wednesday Nights in the Tetherless World (TWed) April 4 th , 2012 Joshua Shinavier

Upload: joshua-shinavier

Post on 11-May-2015

7.642 views

Category:

Education


4 download

DESCRIPTION

A basic intro to microdata and schema.org, along with a new schema.org extension for datasets and data catalogs. "TWed" talk April 4, 2012.

TRANSCRIPT

Page 1: semantic markup using schema.org

Wednesday Nights in the Tetherless World (TWed)April 4th, 2012

Joshua Shinavier

Page 2: semantic markup using schema.org

• rich snippets

• microformats

• RDFa

• microdata

• microdata syntax

• schema.org

• deployment

• mappings, tools, extensions

• the Dataset extension

Outline

2

Page 3: semantic markup using schema.org

3

Page 4: semantic markup using schema.org

• several solutions for embedding semantic data in Web pages

• three syntaxes known (by Google) as “rich snippets”

- microformats

- RDFa

- HTML microdata

• all three are supported by Google, while

- microdata is the “recommended” syntax

the three syntaxes

4

Page 5: semantic markup using schema.org

• microformats emerged around 2005

• some key principles

- start by solving simple, specific problems

- design for humans first, machines second

• wide deployment

- used on billions of Web pages

- usage share was at 94% vis-a-vis competing formats (before microdata, anyway)

• formats exist for marking up Atom feeds, calendars, addresses and contact info, geo-location, multimedia, news, products, recipes, reviews, resumes, social relationships, etc.

First came microformats

5

Page 6: semantic markup using schema.org

microformats example

6

<div class="vcard"> <a class="fn org url" href="http://www.commerce.net/">CommerceNet</a> <div class="adr"> <span class="type">Work</span>: <div class="street-address">169 University Avenue</div> <span class="locality">Palo Alto</span>, <abbr class="region" title="California">CA</abbr>&nbsp;&nbsp; <span class="postal-code">94301</span> <div class="country-name">USA</div> </div> <div class="tel"> <span class="type">Work</span> +1-650-289-4040 </div> <div>Email: <span class="email">[email protected]</span> </div></div>

Page 7: semantic markup using schema.org

• RDFa aims to bridge the gap between human-oriented HTML and machine-oriented RDF documents

• provides XHTML attributes to indicate machine-understandable information

• uses the RDF data model, and Semantic Web vocabularies directly

then came RDFa

7

Page 8: semantic markup using schema.org

RDFa example

8

<div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <p property="foaf:name"> Alice Birpemswick </p> <p> Email: <a rel="foaf:mbox" href="mailto:[email protected]">[email protected]</a> </p> <p> Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p></div>

Page 9: semantic markup using schema.org

• microdata syntax is based on nested groups of name-value pairs

• HTML microdata specification includes

- an unambiguous parsing model

- an algorithm to convert microdata to RDF

• compatible with the Semantic Web via mappings

last but not least, microdata

9

Page 10: semantic markup using schema.org

10

Page 11: semantic markup using schema.org

• annotate an item with text-valued properties using the “itemprop” attribute

microdata properties

11

<div itemscope> <p>My name is <span itemprop="name">Daniel</span>.</p></div>

Page 12: semantic markup using schema.org

• as in RDF, you can have two properties, for the same item (subject) with the same value (object)

multiple values are OK

12

<div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul></div>

Page 13: semantic markup using schema.org

• these correspond to classes in RDF

item types

13

<section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly.</p> <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"></section>

Page 14: semantic markup using schema.org

• items may be given global identifiers, which are URLs

• they may be, but do not need to be Semantic Web URIs

global IDs

14

<dl itemscope itemtype="http://vocab.example.net/book" itemid="urn:isbn:0-330-34032-8"> <dt>Title <dd itemprop="title">The Reality Dysfunction <dt>Author <dd itemprop="author">Peter F. Hamilton <dt>Publication date <dd><time itemprop="pubdate" datetime="1996-01-26">26 January 1996</time></dl>

Page 15: semantic markup using schema.org

15

Page 16: semantic markup using schema.org

• schema.org is one of a number of microdata vocabularies

• it is a shared collection of microdata schemas for use by webmasters

• includes a type hierarchy, like an RDFS schema

- starts with top-level Thing and DataType types

- properties are inherited by descendant types

the schema.org vocabulary

16

Page 17: semantic markup using schema.org

Why should you use schema.org?

17

There are several reasons.

Page 18: semantic markup using schema.org

current schema.org types

18

(there are around 300 of them)

Page 19: semantic markup using schema.org

In terms of deployment...

19

...a few key types stand out.

Page 20: semantic markup using schema.org

Top types

20

type occurrences relativeProduct 5001966 0.27689260175

PostalAddress 1437388 0.07956913403

WebPage 1402426 0.07763375119

Offer 1267545 0.07016717684

Book 1111463 0.06152698395

Person 968737 0.05362613587

AggregateRating 780967 0.04323179816

GeoCoordinates 546586 0.03025722678

LocalBusiness 544662 0.03015072039

Article 525487 0.02908925463

Place 490433 0.02714877897

Residence 451652 0.02500198869

ItemPage 421911 0.02335562347

Organization 405876 0.02246797792

Blog 268582 0.01486782772

Page 21: semantic markup using schema.org

Who’s using it?

21

Over 1,000 domains found (through Sindice)

Page 22: semantic markup using schema.org

Some early adopters

22

domain occurrences relativewww.couponcabin.com 3662 0.04400596

www.digifotopro.nl 2852 0.034272255

www.weg.de 2336 0.028071525

futpedia.globo.com 2003 0.02406989

www.the-plug.com 2001 0.024045857

www.virtualtourist.com 1953 0.023469044

gdgt.com 1857 0.02231542

www.notasdeprensa.es 1564 0.018794463

www.libreriadelsanto.it 1294 0.015549894

liriklaguindonesia.net 1274 0.015309556

www.direct2florist.com 1080 0.012978273

www.bluefountainmedia.com 1065 0.01279802

www.alphabetsigns.com 1059 0.012725918

www.tasit.com 1004 0.012064988

www.teachstreet.com 1001 0.012028937

Page 23: semantic markup using schema.org

• maintains schema.org ↔ RDF mappings

- there are mappings for BIBO, DBpedia, Dublin Core, FOAF, GoodRelations, SIOC, and WordNet

• also provides examples, tutorials, and data dumps

schema.rdfs.org

23

See: http://schema.rdfs.org/mappings.html

Page 24: semantic markup using schema.org

• Google’s Rich Snippets Testing Tool

• schema.org libraries are available in Java, JavaScript, Perl, PHP, Python, and Ruby

• there are schema.org modules for Drupal, Joomla!, WordPress, and Virtuoso

• online tools include microdata extractors, generators and validators

• sindice.com supports microdata

schema.org tools

24

See: http://schema.rdfs.org/tools.html

Page 25: semantic markup using schema.org

• there are dozens of schema.org community proposals

- they extend existing schema.org vocabulary

• several have already been accepted into schema.org, incl.

- Job Postings

- IPTC/rNews integration

- User Comments

• others: Comics, Learning Resources, TV and Radio, Software Application, etc.

schema.org extensions

25

Page 26: semantic markup using schema.org

26

Page 27: semantic markup using schema.org

motivation: open government data

27

Page 28: semantic markup using schema.org

• DataCatalog

- a collection of datasets

- e.g. the International Open Government Data catalog

• Dataset

- an individual, abstract data set

- e.g. a data set about seismic hazard zones near San Francisco

• DataDownload

- a dataset in downloadable form

- e.g. an RDF/XML dump of the seismic hazard zones data set

the Dataset vocabulary: types

28

Page 29: semantic markup using schema.org

• catalog

- the catalog containing a dataset

• dataset

- a dataset contained in a catalog

• distribution

- a data download for a dataset

• keyword

- the topic of a dataset

• spatial

- the spatial extent of a data set (e.g. United States)

the Dataset vocabulary: properties

29

Page 30: semantic markup using schema.org

• the Dataset extension maps to a subset of the Data Catalog Vocabulary (DCAT)

• many other types and properties are inherited from schema.org

• collectively, they cover

- around 2/3 of DCAT, and

- around half of the Asset Description Metadata Schema (ADMS)

Dataset extension ↔ RDF

30

Page 31: semantic markup using schema.org

Dataset example (microdata)

31

<div itemscope="itemscope" itemid="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89" itemtype="http://schema.org/Dataset"> <a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"><span itemprop="name"> <b>Seismic Hazard Zones</b> </span></a> <div><meta itemprop="url" content="http://www.datasf.org/story.php?title=seismic-hazard-zones-"/> <span itemprop="description">The dataset represents the Liquefaction and Landslide Zones [...]</span></div> <div><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"><span itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country"> <span itemprop="name">United States</span> </span> </a></div> <div><i>Publisher:</i> <span itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"> <span itemprop="name">Department of Technology</span> </span> </div></div>

Page 32: semantic markup using schema.org

Dataset example (RDFa)

32

<div about="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89" typeof="dcat:Dataset"> <div><b><a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-">

<span property="dcterms:title">Seismic Hazard Zones</span> </a></b></div>

<div property="dcterms:description">The dataset represents the Liquefaction and Landslide Zones [...]</div> <div rel="dcterms:spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"> <span about="http://dbpedia.org/resource/United_States" typeof="adms:Country"> <span property="dcterms:title">United States</span> </span> </a> </div> <div rel="dcterms:publisher"><i>Publisher:</i> <span typeof="foaf:Organization"> <span property="dcterms:title">Department of Technology</span> </span> </div></div>

Page 33: semantic markup using schema.org

Google extracts this data

33

Item Type: http://schema.org/datasetname = Seismic Hazard Zones url = http://www.datasf.org/story.php?title=seismic-hazard-zones- description = The dataset represents the Liquefaction and Landslide Zones [...] spatial = Item( 1 ) publisher = Item( 2 )

Item 1 Type: http://schema.org/countryname = United States

Item 2 Type: http://schema.org/organizationname = Department of Technology

Page 34: semantic markup using schema.org

• HTML microdata

- http://www.w3.org/TR/microdata

• Schema.RDFS.org

- http://schema.rdfs.org

• W3C Web Schemas group ([email protected])

- http://lists.w3.org/Archives/Public/public-vocabs

• The Dataset proposal

- http://www.w3.org/wiki/WebSchemas/Datasets

• Rich Snippets Testing Tool

- http://google.com/webmasters/tools/richsnippets

Resources

34

Page 35: semantic markup using schema.org

• word clouds by

- http://wordle.net

• deployment statistics discovered using Sindice and Sindice4j

- http://sindice.com

- http://sindice4j.googlecode.com

Credits

35

Page 36: semantic markup using schema.org

• Tetherless World Constellation

• http://tw.rpi.edu

• Contact:

[email protected], @joshsh

Thanks!

36

Page 37: semantic markup using schema.org

37