using ontologies to make smart cities smarter rosario uceda-sosa, biplav srivastava and bob schloss...
TRANSCRIPT
Using Ontologies to make Smart Cities Smarter
Rosario Uceda-Sosa, Biplav Srivastava and Bob SchlossIBM Research
{rosariou@us, rschloss@us, sbiplav@in} @.ibm.comJune 2012
Application Developer/Consultant
Application Developer/Consultant
A Semantic Data Model for Smart CitiesA Semantic Data Model for Smart Cities
An ontology can make a city interconnected and smart, but it needs to assume that1. Cities have their own data sources, not necessarily connected, and may not want to consolidate them. 2. Cities have non-standard organizations, departments and competencies.
A semantic data model (an ontology) of a city, if it is complete and authoritative, (1) simplifies the development of applications that require integrated access to city data sources and (2) enables solution reuse as we move from one city to the next.
Independently of using ETL for data consolidation, a semantic data model (3) can extend the metadata with new categories (SanitationServices, CrimesAgainstProperty) without modifying the application or the data sources.
[ETL]
SemanticData Model
2. Reuse3. Metadata Extensions
Data Model
Data sources
… but, what is an ontology, anyway?… but, what is an ontology, anyway?
What do you think?
… but, what is an ontology, anyway?… but, what is an ontology, anyway?
In a Smart City domain, we’re concerned with modeling the city data (city activity data, city departments, assets, KPIs), not the city itself (the full set of spatial and temporal relations between people and objects in the city) Ontologies help us to structure and reason about city events, entities and services.
Ontology = Class + Relations + Constraints
Knowledge Base = Ontology + instances + (Standard) Inference and rules
In Computer Science, “An ontology is a formal explicit description of concepts in a domain of discourse (classes (sometimes called concepts)), properties of each concept describing various features and attributes of the concept (slots (sometimes called roles or properties)), and restrictions on slots (facets (sometimes called role restrictions)). An ontology together with a set of individual instances of classes constitutes a knowledge base. In reality, there is a fine line where the ontology ends and the knowledge base begins.” [Noy, 2000]
Not to be confused with ontologies (and/or taxonomies) in Philosophy or Life Sciences
Not all ontologies are created equal Not all ontologies are created equal
SCRIBE belongs to the fourth category: It has no constraints and was designed to support the programming of tools that allow domain experts to deal with entities natural to them (even if the recorded data is actually distributed).
In practice, ontologies are used -together with inferencing engines and rules-, for a variety of purposes. If we think of them as schemas, there are different ways
Purpose Instances Inferencing Examples
As a deductive system
Deductive System (axioms + deductive rules)
Part of the knowledge base
Defined by rules. Expert systems, Planning, Optimization.
As a data blueprint Constrain a domain Must conform to the normative schema determined by the ontology
Subsumption, class inferencing
Biomedical and life sciences (FMA, Radlex)
As a data classifier Classify open data Unknown formats Subsumption, class inferencing
Tag ontologies (MOAT, Echarte, SCOT, NAO, etc.)
As a data integrator Integrating pre-defined model to existing data sources
Instances are mapped, no constraint enforcement.
Subsumption, class, entity inferencing
SCRIBE
As data mapping vocabulary
Mapping to/from existing data sources
Mined instances determine the ontology/schema.
Subsumption, class inferencing
D2RQ (a tool)
Normative schema
IntegrativeSchema, depend
on instances
What makes a good ontology for data integration?What makes a good ontology for data integration?
Human Usability
Communicable. Naming, natural language support, etc.
Concise. A simple way to describe the key entities of the model and yet able to infer many facts
Consistent. Naming conventions and modeling patterns
Authoritative to domain experts
Documented, not just descriptions, but also provenance
Managed and maintained by people throughout the model lifecycle.
Reusable in similar domains, for similar instances.
System Usability
Scalable so large amounts of data can be parsed, stored and retrieved.
Efficient query and inferencing
Programmable solutions, both in open and closed data paradigms.
Open infrastructure and tools
A good ontology is a useful ontology, an ontology that both humans and systems can process.
The SCRIBE Model of Cities
Scribe design decisionsScribe design decisions
A good ontology is a useful ontology, an ontology that both humans and systems can process.
Human Usability
Communicable. Naming, natural language support, etc. Natural language naming, user readable labels
Concise. A simple way to describe the key entities of the model and yet able to infer many facts
Anchor classes: events, services, assets, KPIs. Simple and expressive OWL sublanguage, relation taxonomies
Consistent. Naming conventions and modeling patterns Clear boundaries between classes and instances.
Authoritative to domain experts Alignment with standards
Documented, not just descriptions, but also provenance Wealth of annotations
Managed and maintained by people throughout the model lifecycle. Class stewards, involvement of domain experts and end users
Reusable in similar domains, for similar instances. Mechanisms for modularization of extensions and customizations
System Usability
Scalable so large amounts of data can be parsed, stored and retrieved.
Caching mechanisms for DB data (?)
Efficient query and inferencing Ontology-based inferencing (?)
Programmable solutions, both in open and closed data paradigms. Data adapters and schema exploring (?)
Open infrastructure and tools Jena, DB2DRQ, Ruby on Rails, etc.
Authoritative•Aligned with standards (CAP, NIEM, MISA/MRM, UCore)•Validate with customer scenarios•Validated with open city data
SCRIBE data modelSCRIBE data model
SCRIBE is a non-normative, authoritative, modular, extensible semantic model for Smarter Cities.
It consists of a Core Model that includes common classes (events and messages, stakeholders, departments, services, city landmarks and resources, KPIs, etc.), extensions by domain and customizations by city.
Simple language•Classes + Inheritance + Relations + Inferencing•Based on standards (OWL-QL, SPARQL) •Mappable to UML•Metadata annotations and Tagging
SCRIBE Core Model
City Customization
Common building blocks
Extension
Weather
Water
Transportation
BuildingAndParcel
AssetManagement
Simple language•Classes + Inheritance + Relations + Inferencing•Based on standards (OWL-QL, SPARQL) •Mappable to UML•Metadata annotations and Tagging
Organization/Operation profile
FeaturesAuthoritative•Aligned with standards (CAP, NIEM, MISA/MRM, UCore)•Validate with customer scenarios•Validated with open city data
Simple language•Classes + Inheritance + Relations + Inferencing•Based on standards (OWL-QL, SPARQL) •Mappable to UML•Metadata annotations and Tagging
Authoritative•Aligned with standards (CAP, NIEM, MISA/MRM, UCore)•Validate with customer scenarios•Validated with open city data
The key concepts of the SCRIBE OntologyThe key concepts of the SCRIBE Ontology
1. Describes messages, events and services as they flow through the system
Before/aftertriggers
MessageMessageMessage
(Advisory)
EventEventEvent
(Storm, RoadWork)
WorkItem(RoadWorkWI)WorkItem
(RoadWorkWI)WorkItem(RoadWorkWI)
WorkItem(RoadWorkWI)WorkItem
(RoadWorkWI)
Protocol(InfrastructureWorkP)Protocol
(InfrastructureWorkP)Protocol(InfrastructureWorkP)
Before/aftertriggers
Before/aftertriggers
2. Represents types of city services (not the city organization itself) so the administrative structure of a city can be assembled from SCRIBE building blocks
Agency(WhitePlainsTraffic)
CityServiceAreaCityServiceArea Owns
Asset(pipe, valve)
City and Government Standards and SCRIBECity and Government Standards and SCRIBE
While most of the standards relevant to Smarter Cities are message exchange models (CAP, UCore, NIEM) or business planning (MISA/MRM) , SCRIBE integrates the (1) message-based models with (2) asset management and (3) services and their KPIs in an extensible model.
CAP UCore NIEM MISA/MRMCore entities Alert, message
certainty, security, urgency
Incident People, Places, Events and Things
Program, service, outcome, target group, outcome.
Advantages Simple to implement and read. Established standard
Extension mechanisms defined. Supported by DoD, DHS, DoJ.
Tools for search and subset extraction (SSTG) Established standard. Well defined extension process (IEPD)
International, municipality based
Issues Subject and related resources are underdefined
Not mature enough, incomplete.
Large (4000 concepts) and cumbersome (even with support tools) Not deep in any domain
Represents administration, business planning of a city, not its operation. Cumbersome to extend.
Representational Language
XML XML XML with schema substitution for inheritance
XML (rdfs?)
Smarter City Standards and SCRIBESmarter City Standards and SCRIBE
(1) A message is an event (with publisher/subscribers or requestors/responders) AND it has as a subject an (external/processing) event. In principle, a message could refer to another message.
ExternalEvent
Event
Entity(Person, Organization,
- item)
NIEM-BasedOverlap, superset, etc.
CAP-BasedOverlap, superset, etc.
Role(Person, Organization)
Stakeholder
hasRole ->isStakeholder ->
Message
Is-a
subject
ServiceArea(Public Safety, etc.)
(1)
Organization(CityOrganization)
causes
RoadRepair WorkOrder
Stakeholder1
TransportationDept
PlannerTom Travis
Intersection: Main And Hamilton
Maximo-BasedOverlap, superset, etc.
Asset WorkItem
The SCRIBE MetadataThe SCRIBE Metadata
Inferencing and object propertiesInferencing and object properties
There are three types of ‘horizontal’ relations: • HasAttribute (inv. attributeOf) for properties and attributes (name, identifier, etc.)• HasAggregateMember (inv aggregateMemberOf) for parts or members (hasChild, a process has process steps as members)• AssociatedTo (its own inverse) for everything else
We can do inferencing on extensions to SCRIBE
SCRIBE toolingSCRIBE tooling
Database SchemaDatabase Schema
Semantic model of events, city assets, geography and resources, city organization
and services, KPIs, processes,
Semantic model of events, city assets, geography and resources, city organization
and services, KPIs, processes,
EndUserEndUserApplication Developer/
ConsultantApplication Developer/
Consultant
SCRIBE is alsoa. A modeling processB. Tools to make the model usable. The first tool we’ve worked on, MIDO (Mapping Instance Data to Ontologies), allows the mapping of existing data to the SCRIBE model and is part of the process of customizing SCRIBE to a new city.
City Data Catalog
Content Content
Simple subset of OWL, directly mappable to UML
Simple subset of OWL, directly mappable to UML
ImplementationImplementation
Model Tooling Model Tooling
SCRIBE is written using standard RDF/OWL editors and software (Jena)
Edit, extend model Edit, extend model Query/Navigate Model and Data Query/Navigate Model and DataCustomize Model Customize Model Integrate with Data Integrate with Data
Standard OWL/XML (TopBraid, Protégé, Pellet, SPARQL, etc. )
Standard OWL/XML (TopBraid, Protégé, Pellet, SPARQL, etc. )
MIDO, DB2RQL, R2DQ, etc. MIDO, DB2RQL, R2DQ, etc. Form-based queries? Record-based navigation? Form-based queries? Record-based navigation?
SCRIBE Core Model
City Customization
MIDOMIDO
Customizing Scribe in different cities Customizing Scribe in different cities
Scribe is NOT closed. We know that cities have different organizations, different service levels and different KPIs. The Scribe model is designed to provide the building blocks (service types, city departments, KPI taxonomies, CAP messages) that can be customized to define the overall operations of a city
Scribe CORE
Standards (CAP, NIEM, MISA/MRM, etc.) Scenarios/Data (cities open data)
Washington D.C. Chicago Dublin
ServicesDepartments
AssetsKPIs
ServicesDepartments
AssetsKPIs
ServicesDepartments
AssetsKPIs
MIDO Maps city data to
Scribe. Populates model
with instancedata
Scenario
311 events in Washington D.C. 311 events in Washington D.C.
Suppose a Smarter City application that manages city operations wants to display citizen complaints (311 calls) on a map, filtered by a few user-defined constraints (times, locations, type of call, etc.)
A fraction of the 311 incident table (from DC Open Data) is below. Among the data we have:• Identifier• Type of service (code + description)• Time (ServiceOrderDate, ServiceResolution date, etc.) • Place (Lon/Lat, Ward, PSA, District, etc.) • The agency that should handle the request• Various qualifiers (enum types): priority, resolution, etc.)
311 Requests (2
010)
How to map 311 events to an existing modelHow to map 311 events to an existing model
311 Requests (2
010)
The application may access directly the 311 table by querying incidents according to given criteria: “SELECT SERVICEREQUESTID SERVICETYPECODE LATITUDE LONGITUDE WARD DISTRICT PSA DATEREPORTED FROM DC911 WHERE SomeConstraintHere”
OR The application may define an intermediate (data model) layer that:
ServiceRequest
DC311SvceReq
ID
Type
DateOrdered
Lon/Lat
…
Ward
Defines a ServiceRequest object that knows how to retrieve all the data from one or more tables.
Defines two objects, ServiceRequest, where all the
common data to all service requests is, and DC311SvceReq, which captures the info specific to DC.
IS-A
Event
A
B
C
Notice that in (C), inheritance can be applied to locations (wards, districts, addresses, Lon/Lat points are ways to describe a location) Also, we could push the model further and have all kinds of abstractions, say, an event class that captures ID, Time, Location and Type.
Now suppose that the application wants to add the visualization of crime incidents. The corresponding open data table is shown below. Notice that it looks similar to DC311… but not quite:
• ID’s have different format• Time is ReportDateTime, and has a time of day, not just a date• Offenses do not have codes• There’s no referring agency
From the point of view of the application:
We can create another query for the DC911 table and consolidate the information at the application level (requires recompiling)
We can add types and data to the object model, but this bloats the objects.
We can use the inheritance hierarchy to refactor the information in the model. IF the model is well thought out, the changes are minimal… But we’ll need inferencing, infrastructure to keep the graphs… We’ll be replicating RDF/OWL
Crime In
cidents (2010)
Mapping 911 (crime) incidentsMapping 911 (crime) incidents
A
B
C
… And there are net benefits to a model-driven, semantic approach:
1. Applications can be coded ‘in the abstract’. E.g., Display all current events independently of whether they are 311 or 911.
2. Applications can refine the metadata without having to touch the code or the underlying data. E.g. Display all sanitation requests
3. Applications can be shielded from the details of the databases, like in the case of implicit joins. E.g. Display the names of the dispatchers associated with active requests.
The SCRIBE model captures enough information about events to allow a small customization to work.
The right data integration point. A semantic model approachThe right data integration point. A semantic model approach
Step 1. Customizing SCRIBE for Washington D.C. Step 1. Customizing SCRIBE for Washington D.C.
We may want to customize SCRIBE for a variety of reasons
• SuperCans is a DC-specific program and it will likely remain in the DC specific classes.
• CollectingIllegalDumping or SeasonalCollection were not contemplated in the core, and they may be marked for promotion at a later date (using the modelPromotion annotation)
• Adding a new data property to a core class, like a DC-specific identifier
Note that constraints and rules in the DC model do not need to be reflected in the mapping to SCRIBE.
SCRIBE captures the basics of events, service types, dates, etc. but we don’t expect the model to be comprehensive. For example, we didn’t model all the types of services that the 311 table had.
To customize SCRIBE, we created a new file for DC, importing the core model.
Step 2. Mapping instance data to the model Step 2. Mapping instance data to the model
Next, we map the data in the columns to either a data property (transferring the data into that data property, like in the case of SIMPLEREQUESTID) OR a class (to match an enumerated type, which in the case of SCRIBE is represented as a taxonomy of classes.)
This mapping is done through a mapping model and tool called MIDO, whose details are not covered here. However, we can assume that the columns in the two tables have been mapped to the SCRIBE model AND the instance data can be accessed through the SCRIBE model.
ServiceRequest
ServiceRequestID
ServiceType
associatedTo
ServiceTypeDescriptor
codeData
hasDescriptor
Step 3. Query through the model. Query abstract classesStep 3. Query through the model. Query abstract classes
The data from DC Service Requests and Crime Incidents can now be queried together as events, not just as service requests or criminal incidents.
…
Notice that some of the data is missing in the original table… That’s still ok
Query: All Events in DC, with type, District and Ward
As shown previously. The inferencing in the ontology can be leveraged in a query.
Step 3. Query through the model. Annotation MetadataStep 3. Query through the model. Annotation Metadata
Query: Public Sanitation Service Requests
Step 3. Query through the model. Implicit joinStep 3. Query through the model. Implicit join
Everything in a semantic model is connected. The service request can be linked to the name of the dispatcher of the department.
Query: Select events associated to dept of Public Works and his dispatcher
Lessons Learned
Scribe design decisionsScribe design decisions
A good ontology is a useful ontology, an ontology that both humans and systems can process.
Human Usability
Communicable. Naming, natural language support, etc. Key to management and model validation
Concise. A simple way to describe the key entities of the model and yet able to infer many facts
Balance between simple language (RDF), conciseness and inferencing power is key to usability. Map to UML.
Consistent. Naming conventions and modeling patterns Use of relation taxonomy to infer relations despite extensions.
Authoritative to domain experts Merging standards is not enough. Alignment with standards allows a consistent model.
Documented, not just descriptions, but also provenance Limited benefit to end users unless coupled with sample instances or data entry forms
Managed and maintained by people throughout the model lifecycle. People not always available for the full lifecycle
Reusable in similar domains, for similar instances. Mechanisms for promotion of changes to the core.
System Usability
Scalable so large amounts of data can be parsed, stored and retrieved.
Not clear whether data should remain in RDB
Efficient query and inferencing Impact analysis queries may require a few seconds. This is OK.
Programmable solutions, both in open and close data paradigms.d A standard library of data adapters and mappings to SCRIBE are needed.
Open infrastructure and tools We used Jena, DB2DRQ, Ruby on Rails, etc.
For more informationhttp://researcher.ibm.com/view_project.php?id=2505 OR email [email protected]
References• A direct map of relational data to RDF, W3C working draft 14 March, 2011,
http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/• R2RML: RDB to RDF Mapping Language, W3C Working Draft 24 March 2011, http://www.w3.org/TR/r2rml/• The D2RQ Platform v0.7 - Treating Non-RDF Relational Databases as Virtual RDF Graphs, 2009-08-10,
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/• Hannes Bohring and Soren Auer, Mapping XML to Ontologies, citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.59.8897• T. nf• Rodrigues, P. Rosa, J. Cardoso, Mapping XML to existing OWL ontologies, citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.59.292• DB2OWL, A tool for automatic Database-To-Ontology mapping, http://citeseerx.ist.psu.edu/viewdoc/ summary?
doi=10.1.1.97.5970 • Municipal Information Systems Association/Municipal Reference Model (MISA/MRM), http://www.misa.on.ca/en/• National Information Exchange Model, http://www.niem.gov/ • D. Gonzales, C. Ohlandt, E. Landree, C. Wong, R. Bitar and J. Hollywood. The Universal Core Information Exchange
Framework, Assessing its Implications for Acquisition Programs, RAND report, 2011, http://www.rand.org/content/dam/rand/pubs/technical_reports/2011/RAND_TR885.sum.pdf
• D. Allemang, J. Hendler, Semantic Web for the Working Ontologist, Effective Modeling in RDF and OWL, Morgan Kaufman, 2008.
• Noy, McGuinness, Ontology Development 101: A Guide to Creating Your First Ontology. http://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness-abstract.html