the semantic web for librarians and publishers, by michael keller, stanford university

113
Seman&c Web for Libraries & Publishers Charleston Conference 111103 Monday, November 21, 11 so, what’s the problem?

Upload: charleston-conference

Post on 01-Nov-2014

1.628 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Seman&c  Web  for  

Libraries  &  Publishers

Charleston  Conference  

111103

Monday, November 21, 11

so, what’s the problem?

Page 2: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

2

The  Problem  Set

Monday, November 21, 11

Page 3: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Silos

Page 4: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

More silos

Page 5: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Lots of different silos

Page 6: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Blue silos

Page 7: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Old SilosWe in the library and publishing trades force readers, some of them who are authors as well, to search iteratively for information they want or need or thinks might exist, in many different silos, using many different search engines, forms, and vocabularies. We do not make it easy for them to discover what is locally available, what is more or less easy to get, or everything that might be available. No wonder the young and foolish depend upon and believe in Google’s searches. Google is quick...and in terms of search terms of relevance, very, very dirty.

Page 8: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

We give them better interfaces, ones that permit refinement of results, to our holdings at the title level, BUT...

Page 9: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Simulateneously, we show them many other tools, each excellent in some ways, to continue their exploration of the literature. No single tool is comprehensive. We do not refer our clients to the Web, at least not on our own web sites! // Our OPACs refer to our holdings. While Indices and abstracts refer our readers to articles in journals to which we may have licensed. SFX and similar provide readers with links to titles revealed to which we have subscribed. Neither our opacs nor the secondary databases directly to more than a tiny, percentage of the vast collection of pages that is the World Wide Web. The Web, of course, refers in fragmentary fashion to information resources we might, I emphasize, MIGHT have on hand for our readers.

Page 10: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

And the results of using other, often very good, discovery tools differ in relevance ranking, format, and options than the ones we provide for our OPAcs, thus adding confusion.

Page 11: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

some of us provide our readers with lots of databases to search. Too many really, for all but a few are not forensic-level scholars.

Page 12: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Selecting a licensed data base is an art in itself!Once again notice that we rarely offer a web search engine as an option, and for good reasons. Nevertheless, the discoverable relevant information resources on the web apparently are not part of our repertory.

Page 13: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

!!!

Monday, November 21, 11

We have not conspired to make the search for relevant information objects difficult. We just have not yet had the tools, the methods, the vision, and yes, the gumption to try something new.

Page 14: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Ntl Cntr forBiotech Info

NSF CyberInfrastructurequake engineering simulation

ATLAS at LHC -- 150*106 sensors

Monday, November 21, 11

Here’s a teensy slice of the information and communication environment in which our faculty and students find themselves. And it gets more complex every day. Alas the larger the number of websites indexed by Bing or Google or whatever search engine du jour, the more likely it is that the relevance of the returns will be less pointed and precisely matched to what the searcher hoped to find.

Page 15: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Too many silos.Here’s the biggest of the lot...

Page 16: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

16

Monday, November 21, 11

Page 17: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

17

One size fits all???

Monday, November 21, 11

Does  one  size  fit  all?

Page 18: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

18

Monday, November 21, 11

Not  quite.    Even  Google  has  silos  and  uses,  as  do  others,  clever  interfaces  to  hide  the  fact  of  the  silos.

Page 19: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

Given all these silos and search engines, our users, our authors, and readers, and teachers, and students, people on the street, our nations...need us to find a better way. Facts about the information objects we have acquired or leased, facts about books, articles, films, and so forth that we have published need to be found in the wild, on the web. Ideally, we, librarians and publishers will get the facts about what we have and what we are making public, for fun or profit, discoverable on the Web.

Page 20: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Discovery & Access

... the problems

Monday, November 21, 11

Let’s dwell on the problems briefly...

Page 21: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

1. Too many stovepipe systems

2. Too little precision with inadequate recall

3. Too far removed from W3 WorldWide

Web

Monday, November 21, 11

Page 22: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

1. Too many stovepipe systems

Monday, November 21, 11

Page 23: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

The landscape of discovery & access services is a shambles

1. Too many stovepipe systems

Monday, November 21, 11

Page 24: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

The landscape of discovery & access services is a shambles

It can’t be mapped in any logical way

1. Too many stovepipe systems

Monday, November 21, 11

Page 25: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

The landscape of discovery & access services is a shambles

It can’t be mapped in any logical way• not by us (the supposed information pros)• not by the faculty & students who must navigate the chaos

1. Too many stovepipe systems

Monday, November 21, 11

Page 26: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

The landscape of discovery & access services is a shambles

It can’t be mapped in any logical way• not by us (the supposed information pros)• not by the faculty & students who must navigate the chaos

This state of affairs shouldn’t be a surprise

1. Too many stovepipe systems

Monday, November 21, 11

Page 27: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

2. Too little precision with inadequate recall

Monday, November 21, 11

Page 28: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Some of the problem ... too many stovepipe systems

2. Too little precision with inadequate recall

Monday, November 21, 11

Page 29: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Some of the problem ... too many stovepipe systems• dumbing-down effects of federation often hinder explicit searches• each interface has its own search-refinement tricks• numerous, overlapping discovery paths hamper full recall

2. Too little precision with inadequate recall

Monday, November 21, 11

Page 30: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Some of the problem ... too many systems• dumbing down effects of federation often hinder explicit searches• each interface has its own search-refinement tricks• numerous, overlapping discovery paths hamper full recall

Most of the problem ... limitations in the design & execution of infrastructure that supports discovery & access

2. Too little precision with inadequate recall

Monday, November 21, 11

Page 31: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 1st limiting factor ... ambiguity

Monday, November 21, 11

Page 32: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 1st limiting factor ... ambiguityMost of our metadata uses a string of bytes to label a semantic entity [people, places, things, events, ...]

Monday, November 21, 11

Page 33: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 1st limiting factor ... ambiguityMost of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...]

• discovery based on matching text labels• not on the gist of semantic entities

Monday, November 21, 11

Page 34: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 1st limiting factor ... ambiguityMost of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...]

• discovery based on matching text labels• not on the gist of semantic entitiesFor libraries, the fix is authorities• authoritative forms of strings (names, organization, titles, places, events, topics, etc.)

Monday, November 21, 11

Page 35: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 1st limiting factor ... ambiguity

For libraries, the fix is authorities• authoritative forms of strings (names, organization, titles, places, events, topics, etc.) work to improve precision and recall

hold on ... what about cases where no one-to-one relationship exists between a string-of-text label & the underlying semantic entity

Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...]

• discovery based on matching text labels• not on the gist of semantic entities

Monday, November 21, 11

Page 36: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 1st limiting factor ... ambiguity

For libraries, the fix is authorities• authoritative forms of strings (names, organization, titles, places, events, topics, etc.) work to improve precision and recall

hold on ... what about cases where no one-to-one relationship exists between a string-of-text label & the underlying semantic entity

byte string: 4a 61 67 75 61 72

Take for example the text string: jaguar

Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...]

• discovery based on matching text labels• not on the gist of semantic entities

Monday, November 21, 11

Page 37: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

MacintoshOS X 10.2

E-Type (UK) or XK-E (US) mftg 1961 to 1974

Atari videogame console

XK series, in pro-duction since 1996

etc.

Ltd.

... a rose is a rose is a rosecompany

cars

hardware & software

John Giannandrea, CTO, Metaweb

Monday, November 21, 11

Imagine this keyword search and realize the ambiguity of the term “jaquar”

inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008

Page 38: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

MacintoshOS X 10.2

type 140 Jaguar class fast attack craft [torpedo],Germany WWII

E-Type (UK) or XK-E (US) mftg 1961 to 1974

Fender electric guitar,introduced in 1962

XF10F prototype swing-wing fighter, early 1950s, Grumman

Atari videogame console

XK series, in pro-duction since 1996

Anglo-French ground attack aircraft

etc.

Ltd. heavy metal band formed in Bristol, England. Dec 1979

Philadelphia-basedsinger/songwriter Jaguar Wright

... a rose is a rose is a rosecompany

cars

hardware & software

music

military

John Giannandrea, CTO, Metaweb

Monday, November 21, 11inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008

Page 39: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

MacintoshOS X 10.2

type 140 Jaguar class fast attack craft [torpedo],Germany WWII

Jacksonville

E-Type (UK) or XK-E (US) mftg 1961 to 1974

Fender electric guitar,introduced in 1962

DC Comics' Impact series, ... loosely based on Archie Comics' character

XF10F prototype swing-wing fighter, early 1950s, Grumman

The Jaguar is a superheropublished by Archie Comics

Atari videogame console

XK series, in pro-duction since 1996

Anglo-French ground attack aircraft

etc.

Ltd. heavy metal band formed in Bristol, England. Dec 1979

Philadelphia-basedsinger/songwriter Jaguar Wright

... a rose is a rose is a rosecompany

cars

hardware & software

music

military

heros

pro footbal

John Giannandrea, CTO, Metaweb

Monday, November 21, 11inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008

Page 40: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

MacintoshOS X 10.2

type 140 Jaguar class fast attack craft [torpedo],Germany WWII

Jacksonville

E-Type (UK) or XK-E (US) mftg 1961 to 1974

Fender electric guitar,introduced in 1962

DC Comics' Impact series, ... loosely based on Archie Comics' character

XF10F prototype swing-wing fighter, early 1950s, Grumman

The Jaguar is a superheropublished by Archie Comics

Atari videogame console

XK series, in pro-duction since 1996

Anglo-French ground attack aircraft

etc.

Ltd. heavy metal band formed in Bristol, England. Dec 1979

Philadelphia-basedsinger/songwriter Jaguar Wright

Prrrrr... a rose is a rose is a rosecompany

cars

hardware & software

music

military

heros

pro footbal

John Giannandrea, CTO, Metaweb

Monday, November 21, 11

inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008

Page 41: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 2nd limiting factor ... instance-based metadata

Monday, November 21, 11

Page 42: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 2nd limiting factor ... instance-based metadata

Most of our metadata uses focuses on publication artifacts

• identify responsibility for its creation • list topical headings

Monday, November 21, 11

Page 43: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 2nd limiting factor ... instance-based metadata

For simple cases ... few worries• as with ambiguity, one-to-one relationships pose few problems• things work for authors with a few books in several editions

Most of our metadata uses focuses on publication artifacts

• identify responsibility for its creation • list topical headings

Monday, November 21, 11

Page 44: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

the 2nd limiting factor ... instance-based metadata

For simple cases ... few worries• as with ambiguity, one-to-one relationships pose few problems• things work for authors with a few books in several editions

Most of our metadata uses focuses on publication artifacts

• identify responsibility for its creation • list topical headings

But, as complexity increases, precision & recall suffer

Monday, November 21, 11

Page 45: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

search: Shakespeare’s Hamlet 811 entriesWading thru search results for authors

like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall

Prolific authors ...

Monday, November 21, 11

A Socrates (Stanford Libraries OPAC) keyword search for the terms shakespeare and hamlet

Page 46: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

search: Shakespeare’s Hamlet 811 entriesWading thru search results for authors

like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall

Unflagging patience marks the task of flipping back & forth between hundreds of brief and full records to sort thru the varied instances of a single entity

Prolific authors ...

Monday, November 21, 11

Page 47: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

search: Shakespeare’s Hamlet 811 entriesWading thru search results for authors

like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall

Unflagging patience marks the task of flipping back & forth between hundreds of brief and full records to sort thru the varied instances of a single entity, e.g.• critical editions based on primary sources• 18th & 19th century collections of the plays• social, historical and literary essays• histories & critiques of such writings• video and audio recordings of performances• reviews and indices of the same• treatments of stagecraft, costumes, music• life & works of notables associated with the plays (e.g., performers, directors)• other art forms inspired by the plays

Prolific authors ...

Monday, November 21, 11

Page 48: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

3. Too far removed from W3 WorldWide

Web

Monday, November 21, 11

Page 49: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Together, our metadata & collections make up a big chunk of the “dark web”

[ info resources that search-engine spiders can’t see ]

3. Too far removed from W3 WorldWide

Web

Monday, November 21, 11

Page 50: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Together, our metadata & collections make up a big chunk of the “dark web”

[ info resources that search-engine spiders can’t see ]

It’s clear that visibility on the web promotes dramatic increases in discovery and access

3. Too far removed from W3 WorldWide

Web

Monday, November 21, 11

Page 51: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Together, our metadata & collections make up a big chunk of the “dark web”

[ info resources that search-engine spiders can’t see ]

It’s clear that visibility on the web promotes dramatic increases in discovery and access• Library of Congress & Smithsonian images (FLICKR)

3. Too far removed from W3 WorldWide

Web

Monday, November 21, 11

Page 52: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Together, our metadata & collections make up a big chunk of the “dark web”

[ info resources that search-engine spiders can’t see ]

It’s clear that visibility on the web promotes dramatic increases in discovery and access• Library of Congress & Smithsonian images (FLICKR)• SULAIR’s Highwire Press ( > 2x increase via Google)

3. Too far removed from W3 WorldWide

Web

Monday, November 21, 11

Page 53: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Together, our metadata & collections make up a big chunk of the “dark web”

[ info resources that search-engine spiders can’t see ]

It’s clear that visibility on the web promotes dramatic increases in discovery and access• Library of Congress & Smithsonian images (FLICKR)• SULAIR’s Highwire Press ( > 2x increase via Google)

The state of affairs is well known ...

3. Too far removed from W3 WorldWide

Web

Monday, November 21, 11

Page 54: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

54

Our  Working  Environment

Monday, November 21, 11

Page 55: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

library

academy

produceprovide

publisher

Scholars&  students

Monday, November 21, 11

Here is a schematic to suggest how our ecosystem works. It is more complex, of course, but the basics are embodied here.

Page 56: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

internet

Once  upon  a  &me…the  Internet

Monday, November 21, 11

And here is the way the e-discovery and e-communication environment is developing. First there was the Internet. Prophets such as Vannevar Bush, Ted Nelson, and Doug Englebart showed us the way.

Page 57: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

internet

Then…the  World  Wide  Web

webof

pages

Monday, November 21, 11

Thanks to another profit, Tim Berners-Lee, the Internet, a network of communicating computers, became a web of pages of information. Scholarly journal publishers and some librarians realized early on that there were functional advantages to scholarship and to publishing in the web of pages. Yahoo, Google, and others realized that mining the web of pages by words on those pages, could make the rapidly growing web of pages reveal more through indexing and cataloging the web. Indexing won out as we now know over cataloging.

The next thing is the subject of this talk. It is the web of data. It is the web of relationships constructed and expressed so that both computers and humans can identify and understand relationships in that web. The web of data lives with the web of pages and is carried on the Internet, the global carrier.

Page 58: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

internet

web

of

pages

web

of

data

Under  construc&on

Monday, November 21, 11

This web of data is the next big thing in discovering relevant information objects and the next big thing in empowering individuals, communities, and industries in making better use of information that they or others create. What distinguishes this web of data, this linked data environment, is the principal of identifying entities, virtual & real by statements of relationships and descriptions in machine readable form. More about this as we go along.

Page 59: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

internet

web

of

pages

web

of

data

aka Linked Data

Under  construc&on

Monday, November 21, 11

We  are  calling  this  next  phase  the  Linked  Data  phase,  because  it  is  enGrely  dependent  upon  statements  of  relaGonships  and  descripGons  in  machine  readable  form,  but  this  phase  may  be  only  a  pre-­‐cursor  to  another,  more  complex  and  more  difficult  web  world  to  engineer.  The  next  phase  is  the  SemanGc  Web,  which  in  theory  allows  the  machine  readable  relaGonships  and  descripGons  to  interoperate  to  saGsfy  a  person’s  requirements,  albeit  without    constant  interacGon.    In  short,  in  the  SemanGc  Web,  the  machines  will  understand  meaning  and  presumably  act  on  it.    Scarey,  eh?

Page 60: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

60

ConstrucGon  Tools

Monday, November 21, 11

How  to  we  work  to  alleviate  our  problems  as  informaGon  professionals,  librarians  and  publishers?

Page 61: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

• identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces

Recipe  for  crea+ng  the  web  of  data

Monday, November 21, 11

Page 62: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

• identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces• tie those facts together with named connections

Recipe  for  crea+ng  the  web  of  data

Monday, November 21, 11

Page 63: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

• identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces• tie those facts together with named connections• publish the relationships as crawl-able links on the web

Recipe  for  crea+ng  the  web  of  data

Monday, November 21, 11

Page 64: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

• identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces• tie those facts together with named connections• publish the relationships as crawl-able links on the web

Recipe  for  crea+ng  the  web  of  data

Build/use apps supporting discovery via the web of data

Monday, November 21, 11

Page 65: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

65

Monday, November 21, 11

Here  is  a  pile  of  words  represenGng  all  the  words  on  the  web  that  most  search  engines  index  constantly.    Good  search  engines  today  can  do  a  lot  with  this  pile.    BUT,  the  search  engines  create  the  percepGon  of  relaGonships,  not  based  on  meaning,  but  on  other  factors,  such  as  number  of  links  to  a  site  containing  the  words  of  interest  OR  the  traffic  to  a  site.

Page 66: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

66From  this  pile  of  words,  structure!

Monday, November 21, 11

The  Linked  Data  approach  aSempts  to  structure  the  pile  in  anGcipaGon  of  the  need  for  discovery.    That  structure  is  based  on  meaning,  on  relaGonships.    I  will  make  this  clearer  in  the  next  slides.

Page 67: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

67

Monday, November 21, 11

Here’s  a  graph  of  a  very  few  relaGonships  to  Yo  Yo  Ma,  the  great  ‘cellist.

Page 68: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

68Linked  Data  WebMonday, November 21, 11

Here’s  a  graph  of  relaGonships  to  Haggis,  just  a  fun  one  I  could  not  resist  throwing  in.    Meaning  is  provided  by  understanding  relaGonships.

Page 69: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

69

RDF$triples$&$URIs$

•  RDF$triples$=$subject$–$object$–$predicate$– A$way$to$describe$objects$or$even$ideas$on$the$web$– An$object$or$idea$might$have$many$RDF$triples$describing$it$– Objects$or$ideas$need$not$exist$on$the$web!$

•  URIs$=$Uniform$Resource$IdenDfiers$– Allows$machine$interacDon$among$Web$objects$–  Various$syntacDcal$schemes$&$protocols$used$to$construct$URIs$

– At$least$3$needed$to$support$an$RDF$(subject$–$objectJ$predicate)$

Monday, November 21, 11

Geek  ingredients  to  the  construcGon  of  the  Linked  DAta  Web.  RDF  means  Resource  DescripGon  Framework,  always  expressed  as  a  simple  sentence,  though  mulGple  such  statements  might  aSach  to  a  single  enGty.    In  fact,  we  need  mulGple  RDFs  in  this  scheme.

Page 70: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

70

Monday, November 21, 11

A  graph  of  RDF  statements  and  URIs

Page 71: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

71

The Linked Data Principles1. Use Resource Description Frameworks as names of things (people, places, times, objects, ideas...anything really)2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful RDF information4. Include RDF statements that link to other URIs so that they can discover related things

Monday, November 21, 11

The  really  great  aspect  of  RDFs  is  that  they  can  refer  to  ideas,  not  just  to  physical  or  virtual  enGGes.    Any  kind  of  idea  could  be  treated.

Page 72: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

72

Library'Metadata'

•  Library'metadata'standards'closed'•  “Passive”'metadata,'searchable,'but…'•  In'Silos ''•  Readable,'but'not'ac=onable'•  Search'results'refinable,'but'final'

'

Monday, November 21, 11

These  are  some  of  the  edges  of  the  problem  of  library  metadata.

Page 73: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

73

Library'Metadata'•  Library'metadata'standards'

closed'•  “Passive”'metadata,'

searchable,'but…'•  In'Silos ''•  Readable,'but'not'

ac<onable'•  Search'results'refinable,'but'

final'

Seman/c'Web'Metadata'•  Open'

•  Dynamic,'Contextualized'

•  In'the'wild'•  Interac<ve,'Responsive'

•  Leading'to'other'queries'&'views'

Library'Metadata'•  Library'metadata'standards'

closed'•  “Passive”'metadata,'

searchable,'but…'•  In'Silos ''•  Readable,'but'not'

ac<onable'•  Search'results'refinable,'but'

final'

Seman/c'Web'Metadata'•  Open'

•  Dynamic,'Contextualized'

•  In'the'wild'•  Interac<ve,'Responsive'

•  Leading'to'other'queries'&'views'

Monday, November 21, 11

And  here  is  the  comparison  between  the  library  metadata  scene  now  and  the  one  we  advocate  for  the  Linked  Data/SemanGc  Web.    Library  metadata  in  the  Linked  Data  Web  should  be  freely  available,  constantly  updated,  o[en  reconciled  with  RDF  triple  statements  from  non-­‐library  sources.    Library  Linked  Data  should  be  enGrely  open  on  the  web.

Page 74: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

74

Make  Library    bibliographic  factsin  to  RDFs  &  URIs;Release  them  into  the  wild.Make  Library  Linked  Data  OPEN.

Monday, November 21, 11

I  should  add  that  accounGng  for  physical  objects  in  our  collecGons,  locaGng  them,  making  our  collecGons  auditable,  and  managing  our  collecGons  seems  to  be  possible  using  Linked  Data  too,  at  least  in  principal.

Page 75: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

75

What  about  Publishers?

Monday, November 21, 11

Page 76: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

76

Publishers*&*Socie/es**making*use*of*Linked*Data*

•  Aggregate*content*in*their*own*realms*&*beyond*•  Aggregate*informa/on*about*–  Conferences*–  Career*building*&*employment*opportuni/es*–  Communi/es*in*collabora/on*–  Commercial*&*other*services*suppor/ng*research*with*specimens,*source*material,*processing,*trials*

–  Produc/ve*rela/onships*with*others*•  Provide*ac/onable,*constantly*updated*links*in*support*of*scholars,*teachers,*and*learners*

•  Provide*compelling*services*tying*users*to*them*

Monday, November 21, 11

Libraries  too  can  use  Linked  Data  to  reveal  and  adverGse  compelling  services  offered  to  their  clients.

Page 77: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

77Seman4c  Web  adoptersMonday, November 21, 11

Here  are  some  of  the  big  players  in  the  Linked  Data  /  SemanGc  Web  world.    The  BriGsh  Library  has  released  RDFs/URIs  for  the  enGre  BriGsh  NaGonal  Bibliography.    The  Library  of  Congress  has  released  the  same  for  LCSH  &  Name  Authority  Files.    LCSH  includes  links  to  AGROVOC,  RAMEAU,  DNB,  GLIN  Subject  Thesaurus,  and  the  NaGonal  Agriculture  Library's  Subject  Index.    Every  Personal  and  Corporate  entry  in  LC/NAF  links  to  VIAF,  the  Virtual  InternaGonal  Authority  File  based  at  OCLC.        The  N  Y  Times  18  months  ago  made  all  500,000  (and  growing)  of  its  index  terms  available  in  the  wild  as  RDFs  and  URIs.

Page 78: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

78

Monday, November 21, 11

For  publishers  and  libraries...though  we  should  not  neglect  services.

Page 79: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

79

...if  users  can  find  it  in  their  own  contextMonday, November 21, 11

Page 80: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Context

80

ContentUsers

Users  =  readers,  authors,  teachers,  students

Monday, November 21, 11

Page 81: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Context

81

ContentUsers

Publishers  must  make  content  VISIBLEMonday, November 21, 11

I  am  using  the  imperaGve  here,  because  invisible  published  content  means  invisible  benefit  to  the  author  and/or  the  publisher.

Page 82: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

82

Monday, November 21, 11

Here  is  a  recent  PLoS  arGcle  from  PLoS  Neglected  Tropical  Diseases.    

Page 83: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

83

Monday, November 21, 11

And  here  is  the  semanGcally  enhanced  version  of  this  arGcle,  enhancements  provided  by  David  ShoSen  et  al.  in  the  form  of  links  to  further  informaGon,  interacGve  figures,  re-­‐orderable  reference  list,  citaGons  in  context  and  tag  trees.  These  enhancements  took  10  man  weeks  in  2009!    However,  with  the  growing  ecology  of  linked  data,  much  of  this  could  be  accomplished  by  auto-­‐tagging  and  algorithmic  construcGon  of  the  basic  RDFs  &  URIs  for  the  unique  arGcle.    Microdata  submiSed  by  some  publishers  and  their  supporGng  services  to  schema.org  lead  to  these  exciGng  possibiliGes.

Page 84: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

84

aggrega+onMonday, November 21, 11

AggregaGon  counts,  but  think  how  much  more  we  would  get  if  we  could  aggregate  from  libraries,  publishers,  and  the  wild  and  weird  variety  of  sources  on  the  web?

Page 85: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

85

Monday, November 21, 11

Page 86: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

86

Disambigua4on

Monday, November 21, 11

RDFs  and  URIs  can  operate  in  many  languages  and  relaGonships  can  be  expressed  across  languages,  a  potenGal  big  benefit  to  research  and  collaboraGon  in  research.

Page 87: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

87

Web  of  Data  Progress

Monday, November 21, 11

Page 88: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

88

2007

Monday, November 21, 11

FOAF  =  Friend  of  a  Friend.    Hundreds  of  millions  of  RDFs/URIs.    Fortunately  they  do  not  take  much  space  in  memory!

Page 89: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

89

Monday, November 21, 11

This  is  the  2011  graph  of  enGGes  supplying  RDFs  and  URIs.    Now  the  populaGon  is  in  the  hundreds  of  billions,  heading  to  trillions.

Page 90: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

90hSp://inkdroid.org/lod-­‐graph/

2011

Monday, November 21, 11

Page 91: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

91

EncouragementExamples

Monday, November 21, 11

Page 92: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

92

Linked'Open'Data'Value'Proposi4on'•  Linked'open'data'(LOD)'puts'informa4on'where'people'are'looking'for'it'–'on'

the'Web;''•  LOD'can'expands'discoverability'of'our'content;''•  LOD'opens'opportuni4es'for'crea4ve'innova4on'in'digital'scholarship'and'

par4cipa4on;''•  LOD'allows'for'open'con4nuous'improvement'of'data;''•  LOD'creates'a'store'of'machineDac4onable'data'on'which'improved'services'can'

be'built;''•  Library'linked'open'data'might'facilitate'the'break'down'the'tyranny'of'domain'

silos;''•  LOD'can'provide'direct'access'to'data'in'ways'that'are'not'currently'possible;''•  LOD'provides'unan4cipated'benefits'that'will'emerge'later'as'the'stores'of'LOD'

expand'exponen4ally.'''A"product"of"the"Stanford/CLIR"Linked"Data"Workshop"June"2011."

Monday, November 21, 11

25  ParGcipants  from  the  BriGsh  Library,  the  Bibliothèque  naGonale  de  France,  the  Deutsch  NaGonalbibliothek,  the  Royal  Library  of  Denmark,  Aalto  University  in  Finland,  the  Library  of  Congress,  the  Bibliotheca  Alexandrina,  the  NaGonal  InsGtute  of  InformaGcs  of  Japan,  Google,  Seme4,  Emory,  University  of  Virginia,  University  of  Michigan,  California  Digital  Library,  Knowledge  MoGfs,  CLIR,  and  Stanford.    

Page 93: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

93Google  using  Stanford  bib  facts  +  web  resources

Monday, November 21, 11

This  is  a  movie  of  a  live  interacGon  with  Freebase  using  bibliographic  facts  from  Stanford,  and  linked  informaGon  resources  from  the  web.    It  shows  in  a  limited  way  the  potenGal  for  discovery  and  retrieval  in  the  Linked  Data  Web.    

Page 94: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

94

BnF  using  data  only  from  its  catalogs  &  Gallica

Monday, November 21, 11

This  is  another  movie  of  the  Linked  Data  prototype  based  enGrely  on  bibliographic  facts  from  the  BnF  catalogs  and  digital  texts  in  Gallica.    There  are  no  other  web  resources  drawn  into  this  prototype...yet.

Page 95: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

95

Monday, November 21, 11

Page 96: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

96

A"Bibliographic"Framework"for"the"Digital"Age"(October"31,"2011)!

•  “The!new!bibliographic!framework!project!will!be!focused!on!the!Web!environment,!Linked!Data!principles!and!mechanisms,!and!the!Resource!Descrip?on!Framework!(RDF)!as!a!basic!data!model.!!The!protocols!and!ideas!behind!Linked!Data!are!natural!exchange!mechanisms!for!the!Web!that!have!found!substan?al!resonance!even!beyond!the!cultural!heritage!sector.!!Likewise,!it!is!expected!that!the!use!of!RDF!and!other!W3C!(World!Wide!Web!Consor?um)!developments!will!enable!the!integra?on!of!library!data!and!other!cultural!heritage!data!on!the!Web!for!more!expansive!user!access!to!informa?on.”!

Deanna%Marcum,%Associate%Librarian%of%Congress,%introducing%a%transi7on%from%MARC.%

Monday, November 21, 11

Page 97: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

97

We  in  the  cultural  heritage  and  knowledge  management  institutions  are  discovering  better  ways  of  publishing,  sharing,  and  using  information  by  linking  data  and  helping  others  do  the  same.    Through  this  work,  we  have  come  to  value  and  to  promote  the  following  practices:

1.   Publishing  data  on  the  web  for  discovery  and  use,  rather  than  preserving  it  in  dark,  more  or  less  unreachable  archives  that  are  often  proprietary  and  pro?it  driven;    

2.   Continuously  improving  data  and  Linked  Data,  rather  than  waiting  to  publish  “perfect”  data;

3.   Structuring  data  semantically,  rather  than  preparing  ?lat,  unstructured  data;

4.   Collaborating,  rather  than  working  alone;

5.   Adopting  Web  standards,  rather  than  domain  speci?ic  ones;

6.   Using  open,  commonly  understood  licenses,  rather  than  closed  and/or  local  licenses.

Value  Proposi-on  for  LAM’s

from  the  Stanford/CLIR  Workshop  on  Linked  Data,  June  2011

Monday, November 21, 11

In  each  couplet,  we  emphasize  the  second  half,  a[er  “rather  than”,  admitng  that  someGmes  the  first  half  of  the  couplet  has  to  be  operaGve.

Page 98: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

98

DARPA  InternetMonday, November 21, 11

This  is  where  we  started  2.5  decades  ago.

Page 99: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

99World  Wide  Web

Monday, November 21, 11

Thanks  to  Tim  Berners-­‐Lee  and  many  others,  we  advanced  in  this  environment  from  the  early  1990s  unGl  today.

Page 100: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

100

SOCIAL  WEB

Monday, November 21, 11

We  cannot  ignore  the  social  web  that  exists  in  the  current  WWW,  but  think  how  much  more,  some  of  it  scarey,  could  be  done  in  the  Linked  Data  Web  with  the  behaviors  of  the  Social  Web.

Page 101: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

101Linked  Data  WebMonday, November 21, 11

Just  that  funny  reminder  of  the  fundamental  nature  of  the  Linked  Data  Web:  expressing  machine  acGonable  relaGonships.

Page 102: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

102Seman+c  WebMonday, November 21, 11

And  in  the  next  web,  the  SemanGc  Web,  who  knows  what  may  be  possible.    

Page 103: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

103

Ubiquitous  compu+ng

Monday, November 21, 11

To  the  progression  of  network  types,  we  need  to  add  a  couple  of  enormously  important  environmental  factors.    Ubiquitous  compuGng  is  a  very  important  one.    Having  lots  of  computers  on  the  net  makes  the  possibility  of  an  open  global  linked  data  web  very  strong.

Page 104: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

104

Mobility

Monday, November 21, 11

And  our  ability  to  communicate  by  voice  (how  about  that  Siri?)  and  by  bits/bytes  from  everywhere,  is,  perhaps,  just  another  aspect  of  ubiquitous  compuGng.

Page 105: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

105

Ubiquitous  Compu4ng

Mobile

Internet

Web

Social  Web

Linked  Web

Monday, November 21, 11

The  black  box  in  the  upper  right  corner  is  the  SemanGc  Web,  a  level  of  sophisGcaGon  yet  to  be  achieved.    The  linked  data  web  is  at  hand,  though.Will  Librarians  and  Publishers  join  the  development  of  the  Linked  Open  Data  web?    I  certainly  think  we  should.

Page 106: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Monday, November 21, 11

NO MORE SILOS ARE NEEDED or wanted.

Page 107: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

107

W3C Library Linked Data Incubator Grouphttp://www.w3.org/2005/Incubator/lld/

A Bibliographic Framework Initiative General Plan for the Digital Age (October 31, 2011)http://www.loc.gov/marc/transition/news/framework-103111.html

Linked  Data  Survey  &  Workshop  June  2011hSp://www.clir.org/pubs/archives/linked-­‐data-­‐survey/

Monday, November 21, 11

Page 108: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

108

Monday, November 21, 11

Page 109: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

109

Monday, November 21, 11

Page 110: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

110

Monday, November 21, 11

Page 111: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

111

Monday, November 21, 11

Page 112: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

112

Monday, November 21, 11

Page 113: The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

113

Monday, November 21, 11