standardizing for open data
DESCRIPTION
Plans of W3C in the area of standard activities on Data on the WebTRANSCRIPT
![Page 1: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/1.jpg)
(1)
Standardizing for Open Data Ivan Herman, W3C Open Data Week
Marseille, France, June 26 2013 Slides at: http://www.w3.org/2013/Talks/0626-Marseille-IH/
![Page 2: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/2.jpg)
(2)
Data is everywhere on the Web!
l Public, private, behind enterprise firewalls
l Ranges from informal to highly curated
l Ranges from machine readable to human readable l HTML tables, twitter feeds, local vocabularies,
spreadsheets, …
l Expressed in diverse models l tree, graph, table, …
l Serialized in many ways l XML, CSV, RDF, PDF, HTML Tables, microdata,…
![Page 3: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/3.jpg)
(3)
![Page 4: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/4.jpg)
(4)
![Page 5: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/5.jpg)
(5)
![Page 6: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/6.jpg)
(6)
![Page 7: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/7.jpg)
(7)
![Page 8: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/8.jpg)
(8)
W3C’s standardization focus was, traditionally, on Web scale
integration of data
l Some basic principles: l use of URIs everywhere (to uniquely identify things) l relate resources among one another (to connect
things on the Web) l discover new relationships through inferences
l This is what the Semantic Web technologies are all about
![Page 9: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/9.jpg)
(9)
We have a number of standards
RDF 1.1
SPARQL 1.1
URI JSON-‐LD Turtle RDFa RDF/XML
RDF: data model, links, basic assertions; different serializations
SPARQL: querying data
A fairly stable set of technologies by now!
![Page 10: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/10.jpg)
(10)
We have a number of standards
RDB2RDF RDF 1.1
RDFS 1.1 SPARQL 1.1
OWL 2
URI JSON-‐LD Turtle RDFa RDF/XML
RDF: data model, links, basic assertions; different serializations
SPARQL: querying data
RDFS: simple vocabularies
OWL: complex vocabularies, ontologies
RDB2RDF: databases to RDF
A fairly stable set of technologies by now!
![Page 11: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/11.jpg)
(11)
We have Linked Data principles
![Page 12: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/12.jpg)
(12)
Integration is done in different ways
l Very roughly: l data is accessed directly as RDF and turned into
something useful l relies on data being “preprocessed” and published as RDF
l data is collected from different sources, integrated internally l using, say, a triple store
![Page 13: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/13.jpg)
(13)
![Page 14: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/14.jpg)
![Page 15: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/15.jpg)
(15)
However…
l There is a price to pay: a relatively heavy ecosystem l many developers shy away from using RDF and
related tools
l Not all applications need this! l data may be used directly, no need for integration
concerns l the emphasis may be on easy production and
manipulation of data with simple tools
![Page 16: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/16.jpg)
(16)
Typical situation on the Web
l Data published in CSV, JSON, XML
l An application uses only 1-‐2 datasets, integration done by direct programming is straightforward l e.g., in a Web Application
l Data is often very large, direct manipulation is more efficient
![Page 17: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/17.jpg)
(17)
Non-‐RDF Data
l In some setting that data can be converted into RDF
l But, in many cases, it is not done l e.g., CSV data is way too big l RDF tooling may not be adequate for the task at
hand l integration is not a major issue
![Page 18: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/18.jpg)
(18)
![Page 19: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/19.jpg)
(19)
What that application does…
l Gets the data published by NHS
l Processes the data (e.g., through Hadoop)
l Integrates the result of the analysis with geographical data
Ie: the raw data is used without integration
![Page 20: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/20.jpg)
(20)
The reality of data on the Web…
l It is still a fairly messy space out there L l many different formats are used l data is difficult to find l published data are messy, erroneous, l tools are complex, unfinished…
![Page 21: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/21.jpg)
(21)
How do developers perceive this?
‘When transportation agencies consider data integration, one pervasive notion is that the analysis of existing information needs and infrastructure, much less the organization of data into viable channels for integration, requires a monumental initial commitment of resources and staff. Resource-‐scarce agencies identify this perceived major upfront overhaul as "unachievable" and "disruptive.”’ -‐-‐ Data Integration Primer: Challenges to Data Integration, US
Dept. of Transportation
![Page 22: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/22.jpg)
(22)
One may look at the problem through different goggles
l Two alternatives come to the fore: 1. provide tools, environments, etc., to help
outsiders to publish Linked Data (in RDF) easily
l a typical example is the Datalift project
2. forget about RDF, Linked Data, etc, and concentrate on the raw data instead
![Page 23: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/23.jpg)
![Page 24: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/24.jpg)
(24)
But religions and cultures can coexist… J
![Page 25: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/25.jpg)
(25)
Open Data on the Web Workshop
l Had a successful workshop in London, in April: l around 100 participants l coming from different horizons: publishers and users
of Linked Data, CSV, PDF, …
![Page 26: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/26.jpg)
(26)
We also talked to our “stakeholders”
l Member organizations and companies
l Open Data Institute, Open Knowledge Foundation, Schema.org
l …
![Page 27: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/27.jpg)
(27)
Some takeaway
l The Semantic Web community needs stability of the technology l do not add yet another technology block J l existing technologies should be maintained
![Page 28: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/28.jpg)
(28)
Some takeaway
l Look at the more general space, too l importance of metadata l deal with non-‐RDF data formats l best practices are necessary to raise the quality of
published data
![Page 29: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/29.jpg)
(29)
We need to meet app developers where they are!
![Page 30: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/30.jpg)
(30)
Metadata is of a major importance
l Metadata describes the characteristics of the dataset l structure, datatypes used l access rights, licenses l provenance, authorship l etc.
l Vocabularies are also key for Linked Data
![Page 31: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/31.jpg)
(31)
Vocabulary Management Action
l Standard vocabularies are necessary to describe data l there are already some initiatives: W3C’s data cube,
data catalog, PROV, schema.org, DCMI, …
l At the moment, it is a fairly chaotic world… l many, possibly overlapping vocabularies l difficult to locate the one that is needed l vocabularies may not be properly managed,
maintained, versioned, provided persistence…
![Page 32: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/32.jpg)
(32)
W3C’s plan:
l Provide a space whereby l communities can develop l host vocabularies at W3C if requested l annotate vocabularies with a proper set of metadata
terms l establish a vocabulary directory
l The exact structure is still being discussed: http://www.w3.org/2013/04/vocabs/
![Page 33: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/33.jpg)
![Page 34: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/34.jpg)
(34)
CSV on the Web
l Planned work areas: l metadata vocabulary to describe CSV data
l structure, reference to access rights, annotations, etc.
l methods to find the metadata l part of an HTTP header, special rows and columns,
packaging formats…
l mapping content to RDF, JSON, XML
l Possibly at a later phase: l API standards to access CSV data
![Page 35: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/35.jpg)
![Page 36: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/36.jpg)
(36)
Open Data Best Practices
l Document best practices for data publishers l management of persistence, versioning, URI design l use of core vocabularies (provenance, access control,
ownership, annotations,…) l business models
l Specialized Metadata vocabularies l quality description (quality of the data, update
frequencies, correction policies, etc.) l description of data access API-‐s l …
![Page 37: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/37.jpg)
(37)
Summary
l Data on the Web has many different facets
l We have concentrated on the integration aspects in the past years
l We have to take a more general view, look at other types of data published on the Web
![Page 38: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/38.jpg)
(38)
In future…
l We should look at other formats, not only CSV l MARC, GIS, ABIF,…
l Better outreach to data publishing communities and organizations l WF, RDA, ODI, OKFN, …
![Page 39: Standardizing for Open Data](https://reader034.vdocuments.mx/reader034/viewer/2022051818/54bc9f314a7959906e8b4607/html5/thumbnails/39.jpg)
Enjoy the e
vent!