the web of data as a massively scalable nosql database
DESCRIPTION
Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. It leverages fundamental characteristics of Web architecture (loose coupling, decentralization, simple and well defined access patterns) and builds on RDF (a W3C standard data model). We'll give a brief overview of RDF and show how Linked Data principles decouple its use for interoperability and data modelling from the "heavyweight" Semantic Web baggage that has long been considered a barrier to entry. The characteristics that allowed the Web to scale so quickly and widely include decentralization, a massively distributed architecture, an absence of integrity constraints, and weak guarantees about consistency. The Web of data aims to achieve the same end for data, promoting it to a first class Web citizen and making linking data as easy and ubiquitous as linking HTML documents. Many of the same characteristics that make the Web so successful and scalable also apply to the Web of Data. The rise of NoSQL databases is a response to the changing requirements of Web scale data. Typically these databases deliver performance at scale by relaxing consistency guarantees, eschewing transactions, using flexible data models and distributed architectures, and placing constraints on access patterns. Linked Data and RDF turn the Web itself into a decentralized and massively scalable sparse column store with globally identifiable column names; an enormous, globally distributed repository of linked, structured data. In this talk we will highlight the common characteristics of various flavors of NoSQL database and the Web of Data. We will also discuss important differences, and outline the trade-offs involved when deciding on a storage solution for your application data, such as the importance of query performance, availability or ACID transactions. We will be delving into concerns around: Scalability Data portability Common query languages Tool chain interoperabilityTRANSCRIPT
The Web of Data as a NoSQL Database
Sam Tunnicliffe@beobal
Talis Systems Ltd
http://talis.comhttp://github.com/talis
NoSQL Now! 2011
version 1.0
entity retrievalusing xDBC & ORM
or custom SQL
schema-last
entity retrievalusing store specific
protocols andclients
sharded, polyglot storage
sharding strategymay be encapsulatedby clients/servers or
may require theapplication to handlerouting/addressingas well as managing
store specificprotocols and
clients
schema knowledgeresides in application
or access layer
What if you could use the Web as a database?
loose coupling
http://www.flickr.com/photos/11950mike/4707805552
outsource data acquisition costs
http://www.flickr.com/photos/juniorvelo/2861770108
proven, extreme scalability
http://www.flickr.com/photos/krayker/2268587409
leverage existing infrastructure
http://www.flickr.com/photos/ranjithsiji/4897513366
more and more diverse data
http://www.flickr.com/photos/mandy_pantz/2512569926
serendipity
http://www.flickr.com/photos/sylvar/3291628571
high latency
http://www.flickr.com/photos/zivkovic/5850008238
giving away control
http://www.flickr.com/photos/kecko/4052526123
variable availability
http://www.flickr.com/photos/numberstumper/3057162582
global names
global names
1969-059A
global names
1969-059A1969-059Aspacecraft/1969-059A
global names
1969-059A1969-059Aspacecraft/1969-059A
nasa.dataincubator.org/spacecraft/1969-059A
URIs for entity names
1969-059A1969-059Aspacecraft/1969-059A
nasa.dataincubator.org/spacecraft/1969-059Ahttp://nasa.dataincubator.org/spacecraft/1969-059A
things have attributes
mass 28801.1
things have attributes
mass 28801.1name “Apollo 11 CSM”
things have attributes
mass 28801.1name “Apollo 11 CSM”
launch launch/1969-059
URIs for attribute names
http://purl.org/net/schemas/space/mass 28801.1http://xmlns.com/foaf/0.1/name “Apollo 11 CSM”
http://purl.org/net/schemas/space/launch launch/1969-059
links
http://www.flickr.com/photos/juniorvelo/457197656
dereference to get data
DNS is your routing component
http://www.flickr.com/photos/cjschmit/4623783487
RDF and linked data
subject
predicate
object
RDF and linked data
1969-59A
launch
launch/1969-59
RDF and linked data
1969-59A
launch
launch/1969-59
launch date: 16 July 1969launch vehicle: Saturn Vweather: clear, dry
mass: 28801.1name: Apollo 11 CSM
launch/1969-59
1969-059A
Mexico
Apollo 11
Canada
United States
Cape Canaveral
RDF and linked data
launch date: 16 July 1969launch vehicle: Saturn Vweather: clear, dry
nasa.gov
geonames.org
Washington D.C.
alternate name: Stati Unitialternate name: Estados Unidosalternate name: アメリカ合衆国population: 311,874,000
web enabled data
entity lookups come from
authoritative sources
routes between linked entities isexplicit in data
DNS does the hard work
web enabled data
realtime discoveryof additionaldata sources
web enabled data
expandeddata universe
simplified access protocol
but some thingsare now outside of your control
local caches
http://www.flickr.com/photos/vhanes/3722327096
outcomes
http://www.flickr.com/photos/carbonnyc/293733099
shared effort
http://www.flickr.com/photos/toffehoff/244870160/
more simple data integration
http://www.flickr.com/photos/thedailyenglishshow/3947409618/
more linked data
http://www.flickr.com/photos/ninjanoodles/114033269
network effects
http://www.flickr.com/photos/asurroca/66225176
● using global names● for entities ● for attributes
● using standard formats● making data dereferenceable● linking to other data
use the web as a database by...
http://www.flickr.com/photos/ryanwick/3461847552
thank you
http://talis.com