lsd dimensions: use and reuse of linked statistical data as rdf data cube
DESCRIPTION
Governments, public agencies and institutions, and companies produce a great amount of statistical data every year. Much of these data are released as Open Data and published on the Web, although usually as documents, not as Linked Data. In this talk I'll introduce RDF Data Cube (QB), a W3C standard for publishing multidimensional data, such as statistics, on the Web in such a way that they can be linked to other datasets and concepts. However, QB is pretty open towards how users should model dimensions and codes (variables and values in QB jargon), which hampers reusability of existing ones. To this end, I'll show you LSD Dimensions, a web based application that monitors the usage of dimensions and codes over five hundred public SPARQL endpoints.TRANSCRIPT
LSD DimensionsUse and Reuse of Linked
Statistical Data as RDF Data Cube
Albert Meroño-Peñuela@albertmeronyo
WAI meeting 06-10-2014
Statistics!
Data integration – 220 years ago
Data integration - nowadays
Data integration - nowadays
Towards 5-star Linked Statistical Data
Towards 5-star Linked Statistical Data
Towards 5-star Linked Statistical Data
DFT
Towards 5-star Linked Statistical Data
DFT
Eurostat TSV
RDF Data Cube
• 4-star LSD: use URIs to denote (statistical) things
• 5-star LSD: link own (statistical) things to other (statistical) things
“There are many situations where it would be useful to be able to publish multi-dimensional data, such as
statistics, on the web in such a way that they can be linked to related data sets and concepts.”
RDF Data Cube vocabulary (QB)• SDMX compatible• Defines cubes as a set of observations that consist of
dimensions, measures and attributes
• Dimensions: time period, region, sex (qb:DimensionProperty)• Measure: population life expectancy (qb:MeasureProperty)
• Attribute: unit of measure = years, metadata status = measured (qb:AttributeProperty)
Observation: “the measured life expectancy of males in Newport in the period 2004-2006 is 76.7 years”
5-star LSD: 270a.info
Sarven Capadisli, Sören Auer, Reinhard Riedl. “Linked Statistical Data Analysis”. 1st Int. Workshop on Semantic Statistics (SemStats) ISWC 2013.
Are we done?
• P1: Comparability? Can we arbitrarily combine any pair of these datasets/dimensions?
• P2: Reusability? How often are dimensions reused? Can we reuse dimensions created by others?
• P3: Discoverability? How to discover dimensions created by others?
• P4: Relevance? What’s the size of LSD?
P1: Comparability of LSD: SSCLSDA
Sarven Capadisli, Albert Meroño-Peñuela, Sören Auer, Reinhard Riedl. “Semantic Similarity and Correlation of Linked Statistical Data Analysis”. 2nd Int. Workshop on Semantic Statistics (SemStats) ISWC 2014.
P2+P3+P4: LSD Dimensions
Need for an intelligent system that helps us on (1) discovering (2) reusing (3) analyzing dimensions in LSD
http://lsd-dimensions.org/
http://lsd-dimensions.org/
Are we done?
• P1: Comparability? Can we arbitrarily combine any pair of these datasets/dimensions? Unclear
• P2: Reusability? How often are dimensions reused? Can we reuse dimensions created by others? Logarithmic law / Probably yes
• P3: Discoverability? How to discover dimensions created by others? LSD Dimensions
• P4: Relevance? What’s the size of LSD? ~8.5% of the LOD cloud
Future Work
• Monitor additional metadata (rdfs:subPropertyOf, rdfs:range)
• Generate PROV during crawling
• Modeling of formulas in RDF Data Cube
• Plug to LOD Laundromat
• Crawl dimensions and codes from qb:Observation
• SPARQL endpoint and API– Suggest dimensions and codes to users
Thank you
Questions, suggestions, comments most welcome
@albertmeronyo
http://lsd-dimensions.org/https://github.com/albertmeronyo/LSD-Dimensionshttps://github.com/csarven/sense-of-lsd-analysis
http://www.cedar-project.nl