lodstats: the data web census dataset. kobe, japan, 2016
TRANSCRIPT
![Page 1: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/1.jpg)
LODStats
![Page 2: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/2.jpg)
Introduction
Description and System Architecture
Dataset Model
Use Cases
Agenda
Data Web Statistics (Summary)
Conclusions
![Page 3: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/3.jpg)
How to comprehend this data?
3
● Data portals● Big nucleus datasets● SPARQL endpoints
Introduction
![Page 4: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/4.jpg)
9960+RDF Datasets on the Data Portals
4
![Page 5: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/5.jpg)
Calculate statistical metrics User interface
5
Aggregates datasets from the largest data portals
LODStats: Web Application
SPARQL interface
“LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]
![Page 6: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/6.jpg)
6
CKAN Aggregator
LODStats: System Architecture
Scan largest CKAN repos Filter out RDF datasets
“Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]
![Page 7: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/7.jpg)
7
LODStats core application
LODStats: System Architecture (cont.)
Queue RDF datasets Calculate statistics
![Page 8: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/8.jpg)
LODStats: Provisioning
Docker image per component
docker-compose.yml for the whole project
Sustainable and platform independent deployment8
![Page 9: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/9.jpg)
LODStats: Provisioning (cont.)
9
web:
restart: always
build: ./web
links:
- db
- rabbitmq
environment:
- LODSTATS_DB=db
- RABBITMQ=rabbitmq
rabbitmq:
restart: always
image: rabbitmq:3.6.1
db:
restart: always
build: ./db
virtuoso:
restart: always
build: ./virtuoso
environment:
- DBA_PASSWORD=dba
- SPARQL_UPDATE=false
- DEFAULT_GRAPH=http://lodstats.aksw.org/
nginx:
build: ./nginx
restart: always
links:
- web
- virtuoso
environment:
- VIRTUAL_HOST=lodstats.aksw.org,stats.lod2.eu
![Page 10: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/10.jpg)
LODStats: Provisioning (cont.)
10
$ git pull https://github.com/AKSW/lodstats.docker $ docker-compose build$ docker-compose up -d
![Page 11: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/11.jpg)
11
Data Model
![Page 12: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/12.jpg)
12
Data Web Statistics Summary
More statistics are available from SPARQL endpoint
2011 2016
Datasets 422 9,644
Links 3% 40%
Data Portals datahub.io publicdata.eu, data.gov, datahub.io
![Page 13: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/13.jpg)
Privacy Analysis Does dataset
contain sensitive information?
Coverage Analysis Does dataset
contain necessary information?
Quality AnalysisDefine quality metrics using
statistical data.
Vocabulary ReuseFind a suitable vocabulary for your dataset.
13
How can you use LODStats data?
Use Cases
Link Target IdentificationWhich datasets are good
candidates for interlinking?
“Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]
![Page 15: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/15.jpg)
Availability● Application
○ Online at: http://lodstats.aksw.org○ LODStats processing module: https://github.com/aksw/lodstats ○ LODStats frontend including SPARQLify mappings:
https://github.com/aksw/lodstats_www ○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker
● Dataset○ Online at: http://lodstats.aksw.org/sparql ○ Datahub.io: https://datahub.io/dataset/lodstats ○ Can be deployed in Virtuoso using docker-compose from deployment repo
![Page 16: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/16.jpg)
Processing of very large datasets (Spark/Hadoop)
Improving usability of the frontend
Extending data collection to crawling
Conclusions & Future WorkLODStats is easily replicable using Docker technology
![Page 17: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/17.jpg)
Augustusplatz 10, Room P905, 04109 Leipzig, Germany
Address
+49-341-97-32260
Phone
twitter.com/akswgroup
http://aksw.com/IvanErmilov
17
Contact Information
![Page 18: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/18.jpg)
Thank YouIvan Ermilov <[email protected]>
![Page 19: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051707/58ed2b361a28ab04728b458f/html5/thumbnails/19.jpg)
Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin, Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web
LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012
References1
2Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 -- June 2, 2016, Proceedings
3