linked data quality assessment – daq and luzzu
TRANSCRIPT
![Page 1: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/1.jpg)
Linked Data Quality Assessment – daQ and Luzzu
Jeremy DebattistaUniversity of Bonn
Presentation at the Ontology Engineering Group (UPM)
![Page 2: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/2.jpg)
…who am I?
• B.Sc (Hons) in Computer Science – University of Malta– Thesis: Collaborative Editing and Expert Finding
• M.App Sc in Computer Science – DERI, National University of Ireland, Galway– Thesis: Ontology-based rules for User-Controlled
Support in Ubiquitous Environments
• PhD Candidate – University of Bonn
![Page 3: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/3.jpg)
… my PhD – the big picture
• Work related to Data Quality (in LD)– representing quality metadata (daQ)– assessing data quality (Luzzu)– identifying new metrics from standard
vocabularies (like PROV-O)
![Page 4: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/4.jpg)
… the need for Quality Metadata
• Convincing data consumers to use our published data
• Filtering datasets
• Poor Quality Perspective – Big Data Veracity
![Page 5: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/5.jpg)
… the daQ vocabulary
![Page 6: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/6.jpg)
… the daQ vocabulary
![Page 7: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/7.jpg)
… the daQ vocabulary
• Metadata as Named Graphs
• Usage of abstract class concept
• Metric assessment as Observations
• Preserving Provenance information
![Page 9: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/9.jpg)
… daQ Applications
• daQ validator – Validates quality metric schemas extending the daQ (will be online soon)– e.g. checking that each dimension is in exactly one category…
• Luzzu – next slides
![Page 10: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/10.jpg)
… Luzzu – QA Framework
• A comprehensive QA framework– assesses LD quality using user-provided metrics (we
have a number of LOD metrics already) in a scalable manner
– provides queryable metadata (daQ) – provide quality reports which can be used for cleaning
• Java Based with maven integration• http://eis-bonn.github.io/Luzzu
![Page 11: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/11.jpg)
… Luzzu – QA Framework
![Page 12: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/12.jpg)
… Luzzu – QA Framework
![Page 13: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/13.jpg)
…what’s missing in Luzzu
• Make Luzzu work better on Big Data Platforms
– We already have a SPARK Processor
– How can metrics be scaled on different cores? Something like map-reduce maybe?
![Page 14: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/14.jpg)
… data quality lifecycle
![Page 15: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/15.jpg)
… quality metrics
• Traditional naïve way
• Probabilistic Techniques (A paper was presented at ESWC this year)
![Page 16: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/16.jpg)
… probabilistic technique hypothesis
Probabilistic approximation techniques would :
(H1) drastically improve computational time(H2) give close to accurate results
![Page 17: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/17.jpg)
… probabilistic techniques used
Reservoir Sampling
Bloom Filters
Clustering Coefficient Estimation
Dereferenceability
Links to External Data Providers
Extensional Conciseness
Clustering Coefficient of a
Network
![Page 18: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/18.jpg)
… some results
Reservoir Sampling
Bloom Filters
Clustering Coefficient Estimation
Dereferenceability
Links to External Data Providers
Extensional Conciseness
Clustering Coefficient of a
Network
Precision: approx. 75% Time Saved: > 2 Orders of Magnitude
Precision: 100%Time Saved: > 2 Orders of Magnitude
![Page 19: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/19.jpg)
… some results
Reservoir Sampling
Bloom Filters
Clustering Coefficient Estimation
Dereferenceability
Links to External Data Providers
Extensional Conciseness
Clustering Coefficient of a
Network
Precision: approx. 97%Time Saved: > 3 Orders of Magnitude
![Page 20: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/20.jpg)
… some results
Reservoir Sampling
Bloom Filters
Clustering Coefficient Estimation
Dereferenceability
Links to External Data Providers
Extensional Conciseness
Clustering Coefficient of a
Network
Precision: approx. 95% Time Saved: > 1 Order of Magnitude
![Page 21: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/21.jpg)
… what am I working on
• Large Scale/Data web Scale evaluation Journal Paper– assessing the quality of LOD Cloud datasets
• daQ (Journal Paper)
![Page 22: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/22.jpg)
… what do we do at Bonn
• Open Government Data – Publishing and Consumption– Data Value Chains, Value Creation, Budgeting
• Portal for publication and consumption of open data– Lowering of semantic data to shallower domain specific
formats (RDB, CSV etc..)
• RDF Visualisations and Recommendations
![Page 23: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/23.jpg)
… what do we do at Bonn
• Dataset Change Detection
• Collaborative Authoring and Open Educational Content
• Low-threshold agile methodology for collaborative vocabulary development
• Mapping of AutomationML to RDF
![Page 24: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/24.jpg)
… some tools
http://purl.org/net/exconquer/
![Page 25: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/25.jpg)
… some tools
http://purl.org/net/dsaas
![Page 26: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/26.jpg)
… some tools
http://slidewiki.org
![Page 27: Linked Data Quality Assessment – daQ and Luzzu](https://reader030.vdocuments.mx/reader030/viewer/2022032505/55c588e1bb61ebdf168b4716/html5/thumbnails/27.jpg)
… some tools
http://eis.iai.uni-bonn.de/Projects/LinkDaViz.html