embedding knowledge in html - inspiring innovation

27
Embedding Knowledge in HTML Some content from a presenta.ons by Ivan Herman of the W3c

Upload: others

Post on 24-Mar-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Embedding Knowledge in HTML - Inspiring Innovation

EmbeddingKnowledgeinHTML

Somecontentfromapresenta.onsbyIvanHermanoftheW3c

Page 2: Embedding Knowledge in HTML - Inspiring Innovation

Overview

l WhywewanttoembedstructureddatainHTML

l RDFal Microdataandschema.orgl RDFaliteasanencodingforMicrodatal JSON-LDasanencodingforRDFandMicrodata

l Usecasesandexamples

Page 3: Embedding Knowledge in HTML - Inspiring Innovation

HTMLisEverywhere

l WeusuallythinkofHTMLasthelanguageofWebpages

l Butit’salsowidelyusedon/formobiledevicesandtablets–  Itreadilyadaptsfordifferentscreensizes/orienta.ons

l Andisthebasisofmanyebookformats–  E.g.Kindle’sformats,mobi,epub

l HowcanweaddknowledgetoHTMLpages?

Page 4: Embedding Knowledge in HTML - Inspiring Innovation

AddingRDF-likedatatoHTML

l We’dliketoaddsemi-structuredknowledgetoaconven.onalHTMLdocument–  HumansseeandunderstandregularHTMLcontent(text,images,videos,audio)

–  MachinesseeandunderstanddatamarkupinXML,RDForsomeotherformat

l Possibili.esinclude–  Addalinktoseparatedocumentwithknowledge–  Embedknowledgeascomments,Javascript,etc.–  DistributeknowledgemarkupthroughoutHTMLasaZributesofexis.ngHTMLtags

Page 5: Embedding Knowledge in HTML - Inspiring Innovation

l Contentprovidersprefernottogeneratemul.plepage:,oneforhumans(HTML)andanotherformachines(RDF)–  RDFserializa.onsarecomplex–  Requiresseparatestorage,genera.on,etc.mechanisms

–  Introducesredundancy,whichcanleadtoerrorsifwechangeonepagebutnottheother

l Simplifiesthejobofsearchenginesaswell

Onepage,nottwo

Page 6: Embedding Knowledge in HTML - Inspiring Innovation

Generalapproachl Provideorreusetaga"ributestoencodethemetadata–  Browsers&appsignoreaZributestheydon’tunderstand

l Threeapproacheshavebeendeveloped–  Microformats(~2005)–  RDFa(~2007)–  Microdata(akaschema.org)(~2012)

l Status2014/5(IMHO)–  Microformatsusedbutfutureislimited–  RDFabecomingtheencodingofchoice–  Schema.orgvocabulariesgehnglargeuptake

Page 7: Embedding Knowledge in HTML - Inspiring Innovation

l Earliestidea,supplantedbyRDFaandMicrodatal ReusesHTMLaZributeslike@class,@.tlel Separatevocabulariesdevelopedforcommonusecases,e.g.,address,CV,recipes…

l Difficulttomixmicroformats(noconceptofnamespaces)

l Doesn’tdefineanRDFrepresenta.onpossibletotransformvia,e.g.,XSLT+GRDDL,buttransforma.onsarevocabularydependent

Page 8: Embedding Knowledge in HTML - Inspiring Innovation

l vCard:popularformatfor“businesscard”data

l Exampleusecaseforemail–  SenderaZachesvCardtoemailmessage–  Recipientdetachestocontactapp

l hCardisaMicroformatbasedonvCard–  AllowswaytoembedvCarddatainawebpage

Page 9: Embedding Knowledge in HTML - Inspiring Innovation

l vCard:popularformatfor“businesscard“data

l Exampleusecaseforemail–  SenderaZachesvCardtoemailmessage–  Recipientdetachestocontactapp

l hCardisaMicroformatbasedonvCard–  AllowswaytoembedvCarddatatoawebpage

BEGIN:VCARD VERSION:4.0 N:Forrest;Gump;;Mr.; FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;MEDIATYPE=image/gif:http://www.example.com/dir_photos/my_photo.gif TEL;TYPE=work,voice;VALUE=uri:tel:+1-111-555-1212 TEL;TYPE=home,voice;VALUE=uri:tel:+1-404-555-1212 ADR;TYPE=WORK,PREF:;;100 Waters Edge;Baytown;LA;30314;United States of Amer ica LABEL;TYPE=WORK,PREF:100 Waters Edge\nBaytown\, LA 30314\nUnited States of America ADR;TYPE=HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;TYPE=HOME:42 Plantation St.\nBaytown\, LA 30314\nUnited States of Ame rica EMAIL:[email protected] REV:20080424T195243Z END:VCARD

Page 10: Embedding Knowledge in HTML - Inspiring Innovation

l vCard:popularformatfor“businesscard“data

l Exampleusecaseforemail–  SenderaZachesvCardtoemailmessage–  Recipientdetachestocontactapp

l hCardisaMicroformatbasedonvCard–  AllowswaytoembedvCarddatatoawebpage

<ul class="vcard"> <li class="fn”>Forrest Gump</li> <li class="org”>Bubba Gump Shrimp Co.</li> <li class="tel”>1-111-555-1212</li> <li><a class="url" href="http:/bubbagump.com/"> http://bubbagump.com/</a></li> </ul>

Page 11: Embedding Knowledge in HTML - Inspiring Innovation

l DefinedandsupportedbyGoogle,Bing,YahooandYandex

l AddsnewaZributestoHTML5toexpressmetadata

l Workswellforsimpler“single-vocabulary”cases,butnotwellsuitedformixingvocabulariesorforcomplexvocabularies

l Nono.onofdatatypesornamespacesl DefinesagenericmappingtoRDF

Microdataapproach

Page 12: Embedding Knowledge in HTML - Inspiring Innovation

l Addsnew(X)HTML/XMLaZributesl HasnamespacesandURIsatitscore

–  Somixingvocabularyiseasy,asinRDF

l CompleteflexibilityforusingliteralsorURIresources

l Isacompleteserializa.onofRDF

RDFaapproach

Page 13: Embedding Knowledge in HTML - Inspiring Innovation
Page 14: Embedding Knowledge in HTML - Inspiring Innovation
Page 15: Embedding Knowledge in HTML - Inspiring Innovation
Page 16: Embedding Knowledge in HTML - Inspiring Innovation

YieldingthisRDF

<http://www.ivan-herman.net/foaf#me> schema:alumniOf <http://www.elte.hu> ; foaf:schoolHomePage <http://www.elte.hu> ; schema:worksFor <http://www.w3.org/W3C#data> ; … <http://www.elte.hu> dc:title "Eötvös Loránd University of Budapest" . … <http://www.w3.org/W3C#data> dc:title "World Wide Web Consortium (W3C)” …

Page 17: Embedding Knowledge in HTML - Inspiring Innovation
Page 18: Embedding Knowledge in HTML - Inspiring Innovation
Page 19: Embedding Knowledge in HTML - Inspiring Innovation
Page 20: Embedding Knowledge in HTML - Inspiring Innovation

YieldingthisRDF

[ rdf:type schema:Review ; schema:name "Oscars 2012: The Artist, review" ; schema:description "The Artist, an utterly beguiling…" ; schema:ratingValue "5" ; … ]

Page 21: Embedding Knowledge in HTML - Inspiring Innovation

RichSnippetsl Searchenginesaddtextunderresultstopreviewwhat’sonpageandwhyit’srelevant

l Textosenextractedfromstructureddataembeddedonthepage

l SeehZp://bit.ly/RichSNformoreinforma.on

Page 22: Embedding Knowledge in HTML - Inspiring Innovation
Page 23: Embedding Knowledge in HTML - Inspiring Innovation

l RDFaandMicrodataaremodernop.onsl Bothhavesimilarapproaches–  StructureddataencodedinHTMLa"ributesonly–nonewelements

– Definesomespeciala"ributese.g.,itemscopeformicrodata,resourceforRDFa

–  ReusesomeHTMLcoreaZributes(e.g.,href)– UsetextualcontentofHTMLsource,ifneeded

l RDFdatacanbeextractedfromboth

RDFaandMicrodata:similariGes

Page 24: Embedding Knowledge in HTML - Inspiring Innovation

l Microdataop:mizedforsimplerusecases:–  Onevocabularyata.me–  Treeshapeddata–  Nodatatypes

l RDFaprovidesfullserializa.onofRDFinXMLorHTML–  PriceisextracomplexityoverMicrodata

l RDFa1.1LiteisasimplifiedauthoringprofileofRDFa,verysimilartomicrodata

RDFaandmicrodata:differences

Page 25: Embedding Knowledge in HTML - Inspiring Innovation

AmountofstructureddataonWeb?

l WebDataCommonsprojectusesCommonCrawldatatoes.mateamountofstructureddataonWeb

l LookedforMicrodata,RDFaotherformats(e.g.,hCalendar,hCard)inURLsparsableasHTML

l November2015crawlfound–  541Mpagesoutof1.77B(30%)withstruct-ureddatain2.7Mdomainsof14.4M(19%)

–  24.2Btriplesabout6.1Ben..es

l Datacanbedownloaded

Page 26: Embedding Knowledge in HTML - Inspiring Innovation

AmountofstructureddataonWeb?

Page 27: Embedding Knowledge in HTML - Inspiring Innovation

Conclusions

l Theamountofstructureddataonthewebisgrowingsteadily

l Microdatashowsthestrongestgrowthl RDFaalsocommonl Microformatdataisprobablynotgrowingasmuch