addressing exploitability of smart city data
TRANSCRIPT
Addressing exploitability of Smart City data
1
Enrico Daga, Mathieu d’Aquin, Alessandro Adamou, Enrico Motta Data Science Group Knowledge Media Ins8tute, The Open University Milton Keynes (UK)
Feedback: @enridaga @datasciencegr #kmiou
September 13th, 2016 -‐ Trento (Italy) IEEE Interna)onal Smart Ci)es Conference (ISC2) hNp://events.unitn.it/en/isc2-‐2016
2
Smart Bins to make garbage collec2on more efficient
Monitor parking spaces to support ci2zens’ mobility
Observe busyness of places to be=er tune services
Forecast car accidents to improve drivers’ awareness
MK:Smart is an integrated innova8on and support programme leveraging large-‐scale city data to drive growth in Milton Keynes (UK) [1].
Smart City data
hNps://datahub.mksmart.org
Delivery
Onboarding
Processing
Acquisi8onData Hub
It is a loop!
Feedback: @enridaga @datasciencegr #kmiou
Top MK!
3
Top MK is a virtual card playing game where each card represents a ward in Milton Keynes, with characteris8cs such as area, popula)on, level of qualifica)ons, etc. Two players, one human and the other automa8c, try to win the other’s cards by choosing the characteris8c that has the best chance to win against the other card.
hNps://data.beta.mksmart.org/apps/topmk/
Feedback: @enridaga @datasciencegr #kmiou
The problem of exploitability
• Data come from different owners and have different licenses.• Data are processed into new data before being reused.• What are the policies that apply to the output data?• Can we make use of it in a commercial setting?
4
Could Top Trumps sell this game?
Feedback: @enridaga @datasciencegr #kmiou
"Data exploitability" is the assessment of the policies associated with the data resulting from the computation of diverse datasets in complex data flows.
Under the hood - 1/5
The En)ty-‐Centric API (ECApi) offers an en8ty based access point to the informa8on offered by the Data Hub [2].
5hNps://data.mksmart.org/en8ty/ward/newport_pagnell_north
{ "global:religion": [{ "global:sikh": ["16"], "global:no_religion": [“2323”], ... }], "global:maritalStatus": [{ "global:in_a_registered_same-sex_civil_partnership": ["11"], "global:married": ["3290"], ... }], "global:economicActivity": [{ "global:unemployed:_never_worked": ["15"], "global:unemployed:_age_50_to_74": ["33"], "global:in_employment": ["3785"], "global:unemployed:_age_16_to_24": ["48"], "global:long-term_unemployed": ["49"], ... }], "global:percentInBasicSkills": [{ "global:literacy_level_1": ["47.41344196"], "global:literacy_level_2": ["46.23217923"], "global:numeracy_level_1_2.5percentci": ["18.13034623"], "global:numeracy_level_1": ["32.38289206"], ... }], "global:peopleInAgeGroups": [{ "global:age_85_to_89": ["152"], "global:age_20_to_24": ["393"], ... }],
"global:qualifications": [{ "global:full-time_students:_age_18_to_74:_economically_inactive": ["61"], "global:highest_level_of_qualification:_level_4_qualifications_and_above": ["1413"], "global:highest_level_of_qualification:_level_1_qualifications": ["1042"], "global:highest_level_of_qualification:_level_3_qualifications": ["794"], "global:highest_level_of_qualification:_level_2_qualifications": ["1050"], "global:full-time_students:_age_18_to_74:_economically_active:_unemployed": ["17"], "global:highest_level_of_qualification:_apprenticeship": ["327"], "global:highest_level_of_qualification:_other_qualifications": ["271"], "global:full-time_students:_age_18_to_74:_economically_active:_in_employment": ["84"], "global:no_qualifications": ["1167"], "global:schoolchildren_and_full-time_students:_age_18_and_over": ["163"], "global:schoolchildren_and_full-time_students:_age_16_to_17": ["165"], "global:all_usual_residents_aged_16_and_over": ["6064"] }](Some logic here)
Entity-Centric API (ECApi)
6
The data hub offers a provenance access point including the metadata of the datasets, including ownership and licenses.
{
"dataset": "urn:census/ks501-qualification", "description": {
"global:owner": ["Milton Keynes Council"], "global:title": ["Census 2011 - Qualifications in Milton Keynes' wards"], "global:uuid": ["3f6c6107-835c-45ee-b8b4-83c2099b4084"], "global:issued": ["2015-10-12 19:18:36"], "global:distribution": ["http://data.mksmart.org/entity/thing/www:uri/datahub.mksmart.org/ns/distribution/3527333636"], "global:modified": ["2016-09-06 12:03:14"], "global:type": ["http://data.mksmart.org/entity/thing/www:uri/www.w3.org/ns/dcat#Dataset"], "global:format": ["CSV"], "global:landingPage": ["http://data.mksmart.org/entity/thing/www:uri/https://datahub.mksmart.org/dataset/census-2011-qualifications-in-milton-keynes-wards/"], "global:homepage": ["https://datahub.mksmart.org/dataset/census-2011-qualifications-in-milton-keynes-wards/"], "global:name": ["census-2011-qualifications-in-milton-keynes-wards"], "global:attribution": [""],
"global:policy": ["http://data.mksmart.org/entity/policy/open-government-license"], "@id": "urn:census/ks501-qualification", "global:api": ["https://datahub.mksmart.org/data-catalogue-api/?action=dataset&name=census-2011-qualifications-in-milton-keynes-wards"] },
"attributes": [ "global:qualifications/global:all_usual_residents_aged_16_and_over", "global:qualifications/global:full-time_students:_age_18_to_74:_economically_active:_in_employment", "global:qualifications/global:full-time_students:_age_18_to_74:_economically_active:_unemployed", "global:qualifications/global:full-time_students:_age_18_to_74:_economically_inactive", … ] },
hNps://data.mksmart.org/en8ty/ward/newport_pagnell_north.prov
“global:qualifications” attributes come from the "Census 2011 -
Qualifications in Milton Keynes' wards” dataset, distributed under the Open
Government License.
Under the hood - 2/5Provenance
7
{ "global:type": ["http://data.mksmart.org/entity/thing/www:uri/datahub.mksmart.org/ns/schema/RedistributionPolicy"], "global:landingPage": [ "http://data.mksmart.org/entity/thing/www:uri/https://datahub.mksmart.org/policy/open-government-license/", "http://data.mksmart.org/entity/thing/www:uri/https://datahub.beta.mksmart.org/policy/open-government-license/" ], "global:description": [""], "global:title": ["Open Government License"], "global:homepage": [ "https://datahub.beta.mksmart.org/policy/open-government-license/", "https://datahub.mksmart.org/policy/open-government-license/" ], "global:name": ["open-government-license"], "global:api": [ "https://datahub.mksmart.org/data-catalogue-api/?action=policy&id=open-government-license", "https://datahub.beta.mksmart.org/data-catalogue-api/?action=policy&id=open-government-license" ],
"global:permission": [ "http://data.mksmart.org/entity/thing/www:uri/permission:publish-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:redistribute-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:use-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:copy-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:reproduce-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:combine-1441", "http://data.mksmart.org/entity/thing/www:uri/
permission:commercialize-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:adapt-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:transmit-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:extract-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:derive-1441" ] }
hNp://data.mksmart.org/en8ty/policy/open-‐government-‐license
Licenses are described as machine readable policies: permissions, prohibi8ons or du8es [3].
Good news, this is OGL, it can be used in commercial applications.
Under the hood - 3/5License
8
Under the hood - 4/5Data flowData flows can be represented with the Datanode ontology [4] as graphs of data “nodes”.
(The logic here) http://purl.org/datanode/ns/http://purl.org/datanode/docs/
This is the semantics behind the code!
9
Under the hood - 5/5Reasoning on Policy PropagationMachine readable policies and data flows allow us to reason on policy propaga8on exploi8ng Policy Propaga)on Rules (PPR) [5].
hNps://github.com/enridaga/pprreasoner/
These are the policies of the output data!
has(output, duty:attribution) has(output, permission:commercialise)
has(X,P) ⋀ propagates(P,R) ⋀ relation(R,X,Y) → has(Y,P)
propagates(permission:commercialise,processed into)
has(dataset1,permission:commercialise) has(dataset1,duty:attribution)
relation(node23,node16,processed into)
Provenance and License
Data flow
Policy Propagation Rule
Propagated policies
Rule engine
Yes.(but they must include attribution statements)
10
The problem of exploitability (reprise)
Could Top Trumps sell this game?
How can we make it work at scale?
• Represent diversity of datasets, licenses and data flows• Support developers in the assessment of policies associated with the
data and how they affect their data flows
11
Data cataloguing as the backbone of data governance.Follow the journey of the data and trace the semantics, respecting the diversity datasets, licenses and data flows.
Metadata Supply Chain - 1/2Approach
Delivery
Processing
Record
Content
Data flow
Proven
ance
(Meta)data Catalogue
Acquisi)on
Onboarding
Onboarding Setup a catalogue record of the data source
Acquisi)on Extract content metadata (8meliness, validity, …)
Processing Describe the Data flow Reason on policy propaga8on
Delivery Provide provenance informa8on
Feedback: @enridaga @datasciencegr #kmiou
12
•Data provider specifies a single License •Same License for any user •License is described in the catalogue •License policies are referenced by Policy Propaga8on Rules
•Data source is accessible •Acquisi8on processes respect the data source License
•Data flows can be described with Datanode •ETL pipelines do not violate the policies •Process execu)ons do not influence policies propaga)on
•Data flow descrip8ons and License policies enable reasoning on policy propaga8on •End-‐user access methods provides provenance informa8on
Evaluation (can we really do that?)
An end-to-end solution for exploitability assessment can be implemented.
Metadata Supply Chain - 2/2
Considering a given set of assump8ons (details in the paper…):
Lessons learnt
13
• Assessing exploitability of smart city data is possible following a holistic approach to data cataloguing:• understanding the semantics of data flows;• understanding the role of policies (licences).
• New open challenges:• Handle the diversity of policies and consequently the size of Policy
Propagation Rules [3].• Support Data providers in the selection of the right license [6].• Support developers in the definition of data flows [7].• Integrate validation of propagated policies [8].• Integrate validation of data flows with respect to policies.• Reasoning with process execution traces (not only at design time).
• We need an end-user evaluation “in the wild”.
14
Thank you
hNps://dsg.kmi.open.ac.uk/data-‐exploitability-‐how-‐to-‐achieve-‐it/
References[1] M. d’Aquin, J. Davies, and E. Motta. Smart cities’ data: Challenges and opportunities for semantic technologies. Internet Computing, IEEE, 19(6):66–70, 2015.
[2] A. Adamou and M. d’Aquin. On requirements for federated data integration as a compilation process. In Proceedings of 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PRO- FILES)., pages 75–80, 2015.
[3] Open Digital Rights Language (ODRL) Version 2.1 https://www.w3.org/ns/odrl/2/ODRL21 (accessed 09/09/2016)
[4] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Describing semantic web applications through relations between data nodes. Technical Report kmi-14-05, Knowl- edge Media Institute, The Open University, Walton Hall, Milton Keynes, 2014.
[5] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Propagation of policies in rich data flows. In Proceedings of the 8th International Conference on Knowledge Capture, page 5. ACM, 2015.
[6] Daga, Enrico ; d'Aquin, Mathieu ; Motta, Enrico and Gangemi, Aldo (2015). A Bottom-Up Approach for Licences Classification and Selection. In: 2015 Workshop on Legal Domain And Semantic Web Applications (LeDA-SWAn 2015), 1 June 2015, Portoroz, Slovenia.
[7] E. Daga, M. d.Aquin, A. Gangemi and E. Motta: An incremental learning method to support the annotation of workflows with data-to-data relations. 20th International Conference on Knowledge Engineering and Knowledge Management. Bologna, Italy, 19-23 November 2016 - ACCEPTED
[8] H.-P. Lam and G. Governatori. The Making of SPINdle. In A. Paschke, G. Governatori, and J. Hall, editors, Proc. RuleML’09, pp. 315–322. Springer-Verlag, 2009
15