a web-scale study of the adoption and evolution of the schema.org vocabulary over time
TRANSCRIPT
![Page 1: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/1.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary
over TimeRobert Meusel, Christian Bizer and
Heiko Paulheim
![Page 2: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/2.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 2
Motivation - LOD Cloud with 1.000 data providers
![Page 3: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/3.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 3
Motivation - schema.org MD with 700k data providers
![Page 4: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/4.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 4
Microdata in a Nutshell
- Adding structured information to web pages• By marking up contents and entities
- Arbitrary vocabularies are possible • Practically, only schema.org is deployed on a large scale
• Plus its historical predecessor: data-vocabulary.org
- Similar to RDFa
<div itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="name">Data and Web Science Group</span> <span itemprop="addressLocality">Mannheim</span>, <span itemprop="postalCode">68131</span> <span itemprop="addressCountry">Germany</span></div>
![Page 5: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/5.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 5
Schema.org in a Nutshell
- Vocabulary for marking up entities on web pages• 675 classes and 965 properties (as of May 2015, release 2.0)
- Promoted and consumes by major search engine companies• Google, Bing, Yahoo!, and Yandex
• Google Rich Snippets
- Community-driven evolution and development
![Page 6: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/6.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 6
Schema.org in a Nutshell – Coverage
- Schema.org has incorporated some popular vocabularies, like:• Good Relations (2012)
• W3C BibExtend (2014)
• MusicBrainz vocabulary (2015)
• Automotive Ontology (2015)
![Page 7: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/7.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 7
Microdata with Schema.org in HTML Pages
<html>…<body>…<div id="main-section" class="performance left" data-sku="M17242_580“>
<h1> Predator Instinct FG Fußballschuh </h1><div>
<meta content="EUR"><span data-sale-price="219.95">219,95</span>…</body></html>
HTML pages embed directly markup languages to annotate items using different vocabularies
<html>…<body>…<div id="main-section" class="performance left" data-sku="M17242_580" itemscope itemtype="http://schema.org/Product"><h1 itemprop="name"> Predator Instinct FG Fußballschuh </h1><div itemscope itemtype="http://schema.org/Offer" itemprop="offers"><meta itemprop="priceCurrency" content="EUR"><span itemprop="price" data-sale-price="219.95">219,95</span>…</body></html>
1._:node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Product> .
2._:node1 <http://schema.org/Product/name> "Predator Instinct FG Fußballschuh"@de .
3._:node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Offer> .
4._:node1 <http://schema.org/Offer/price> "219,95"@de .
5._:node1 <http://schema.org/Offer/priceCurrency> "EUR" .
6.…
![Page 8: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/8.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 8
Wrap-Up
- Semantic annotations are used by more and more websites
- Entities on websites become machine-readable and machine-understandable
- schema.org together with Microdata is a success story • Promoted by search engine companies
• Deployed by over 17% of all websites [1] (over 700k data providers)
- Usage is more compliant to the schema than e.g. LOD [2]
[1] http://webdatacommons.org/structureddata/2014-12/stats/stats.html[2] Meusel and Paulheim, ESWC 2015
![Page 9: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/9.jpg)
9
Digging for Reasons
- So, Microdata is more often deployed and is often more schema compliant, although there are millions of uncontrolled providers with different skill sets
- But why? Some hypotheses…• Availability of documentation
• Tool support
• Business incentive
• Schema flexibility
- Can we confirm/reject those from looking at the data?
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
![Page 10: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/10.jpg)
10
A Diachronic Perspective
- Versions of schema.org are archived over time• Plus: there are several crawl releases per year
• i.e., we can look at change over time
- If we look at both schema and deployed data, we may observe• Adoption rates of schema changes
• Data-first changes to the schema
• Convergence or divergence of deployed data
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
![Page 11: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/11.jpg)
11
A Diachronic Perspective
- Three releases of WDC Microdata corpus [1]• 2012, 2013, and 2014
- Versions of schema.org that were valid• At the beginning of the crawl
• At the end of the crawl
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
[1] http://webdatacommons.org/structureddata
![Page 12: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/12.jpg)
12
Top-down Adoption
- How fast are changes in the schema adopted?• New classes/properties
• Deprecations
• Domain/range changes
- Measuring adoption: challenges• Different crawls
• Overall growth of deployed schema.org
- Measure: normalized usage increase (nui) from i to j:• nui(s)>1.05: usage of schema element s has increased significantly
• nui(s)<0.95: usage of schema element s has decreased significantly
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
![Page 13: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/13.jpg)
13
Top-down Adoption
- Adoption of new classes and properties• Almost half of all introduced classes are never used!
• Similar for new properties
- Reasons• Bulk-addition of vocabularies
• not every term is equally needed• e.g., medical vocabulary
• Blind spot of our approach• some terms are mainly for e-mail markup• e.g., Actions
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
SURPRISE!
![Page 14: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/14.jpg)
14
Top-down Adoption
- Main domains of positive adoption• Meta data for web content
(schema.org/Website has the highest nui)
• Broadcasting (e.g., TV Episodes)
• Questions & Answers
• Postal addresses
- Classes featured in Google Rich Snippets• Still growth on high level (tens of thousands of data providers)
• But nui(s)<0.95
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Yellow PagesSearch Engine Listings
Collaboration with BBC and EBU
Influence of CMS adoption
Q&A Pages, such asStackoverflow
![Page 15: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/15.jpg)
15
Top-down Adoption
- Adoption of domain/range changes• Again: rather low overall adoption
- Adopted well for• Products (height, width, itemCondition, …)
• Broadcasting domain (episode, actor, ...)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Search Engine Listings
Collaboration with BBC and EBU
![Page 16: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/16.jpg)
16
Top-down Adoption
- Adoption of deprecations• Works well (29 out of 32 have a significantly low nui)
- Exceptions• s:map (← s:hasMap)
• s:maps (← s:hasMap)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
For Google Maps(lots of outdated tutorials)
![Page 17: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/17.jpg)
17
Bottom-up Evolution
- Martin Luther• Started the protestant church
• A success story, too (like schema.org)
• (i.e., 800 million adopters worldwide)
- Famous quote:• “Man muss […] dem gemeinen Mann aufs Maul schauen”
• (roughly: “You have to listen to the way the common man really speaks.”)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Martin Luther, 1483-1546
Disclaimer:I do not speak for the
protestant church.
![Page 18: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/18.jpg)
18
Bottom-up Evolution
- Are new features in the schema first used “inofficially”?• New classes/properties
• Domain/range changes
- Instrument for measurement: ROC curves• True positives mapped against false positives
• tp: elements used before
• fp: elements not used before
• Ranking by #PLDs
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
![Page 19: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/19.jpg)
19
Bottom-up Evolution
- There are some mild influences observable• Stronger for domain/range changes
• especially range changes
• Weaker for new classes/properties
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
2012→ 2013 2013→ 2014 2012→ 2014
classes properties domains ranges
![Page 20: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/20.jpg)
20
Bottom-up Evolution
- Extension mechanism• Allows for user-defined classes/properties
• Those become subclasses implicitly
- Analysis over time• No measurable impact on standard evolution
• “Inofficial” use is likelier than use of extension mechanism
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
s:Product/ElectronicProduct
s:price/reducedPrice
![Page 21: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/21.jpg)
21
Overall Convergence
- Measuring convergence• i.e., homogeneity of descriptions of classes
• Example: two instances of s:LocalBusiness
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
_:1
_:2 “Birmingham”
“Main Street 24”
s:LocalBusiness
s:PostalAddress _:1
_:2 “Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
![Page 22: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/22.jpg)
22
Overall Convergence
- Recap• RDF from Microdata is a set of trees
• i.e., we can enumerate all paths to leaf nodes(omitting literals)
- Example:
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
_:1
_:2 “Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
rdf:type-s:LocalBusiness, s:address-rdf:type-s:PostalAddress,s:address-s:addressLocality,s:address-s:streetAddress
![Page 23: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/23.jpg)
23
Overall Convergence
- Using all paths, we can compute the entropy for each class as
- A low entropy refers to a high homogeneity
- We normalize both by maximum entropy and the total number of paths• i.e., we use normalized entropy rate as a measure for homogeneity
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
![Page 24: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/24.jpg)
24
Overall Convergence
- Observations• Overall entropy decreases over time
- Classes with high convergence rates• WebSite, Blog, …
• Hotel, Restaurant, …
• Product, Offer, …
• Rating, Review
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Influence of CMS adoption
Yellow pages
Google Rich Snippets
...all of the above
![Page 25: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/25.jpg)
25
Key Adoption Drivers
- Search Engine Optimization• Web site providers want to be high in Google rankings
• Direct business incentive!
- Tool adoption• Major CMSs use schema.org
- Standard Agility• schema.org: 25 revisions in last three years
• cf. FOAF: six revisions in last eight years
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
![Page 26: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/26.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 26
Summary
- Both ways, top-down and bottom-up adoptions can be observed
- Homogeneity of deployed schema increase over time
- Described empirical data-driven study reveals valuable insights to understand how and why schema.org is a success story
- Observed key drivers and obstacles can also help to understand and analysis adoption of other standards, e.g. LOD
- More fine-grained insights might be revealed when extending the analysis corpus to the mailing list archive and issue tracker
![Page 27: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time](https://reader033.vdocuments.mx/reader033/viewer/2022042602/55cb013fbb61eb5d7a8b457c/html5/thumbnails/27.jpg)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 27
Thank you! Questions? Feedback?
Raw data can be found on the website of WebDataCommons:
http://webdatacommons.org/structureddata/
More interesting datasets and analysis:
http://webdatacommons.org/index.html
Acknowledgement
The extraction and analysis of the datasets was supported by AWS in Education Grant.