objective fiction, i-semantics keynote
DESCRIPTION
"Objective fiction: the semantic construction of web reality" talks about current challenges for semantic technologies, and the Semantic Web in particular, focusing on cognitive and social dimensions of human semantics.TRANSCRIPT
Objective FictionThe semantic construction of (web) reality
Aldo [email protected], @lipn.univ-paris13.fr
RCLN, LIPN Université Paris 13, UMR CNRS, Sorbonne CitéSemantic Technology Lab, Institute for Cognitive Sciences, CNR, Rome, Italy
Work described jointly with STLab people in the last years:Valentina Presutti, Francesco Draicchio, Andrea Nuzzolese, Diego Reforgiato
Alessandro Adamou, Eva Blomqvist, Enrico Daga, Alfio Gliozzo, Alberto Musetti
My arguments
Semantic technologies only sparsely address real semantic phenomena
Semantic Framing is not explicit
Semantic technologies only sparsely address real semantic phenomena
My arguments
We need a semantic data science
Semantic Framing is not explicit
Semantic technologies only sparsely address real semantic phenomena
My arguments
Objectivity or fiction?
• Objective fiction!
• Data scientists contribute to build the reality we live in
• The Web gives them more empowerment
• Semantics should give them even more
• Is that happening?
Objective fictionBare fact
Berlusconi has been sentenced for tax fraud
Opinion
I like the decision of Berlusconi’s trial court
Framing
The arrest of Berlusconi’s would be an offense to millions of voters
Like Al Capone, he’s been sentenced for the least severe crime
Story-based repositioning
Berlusconi’s sentencing is like the story of Enzo Tortora’s (a talk show host on Italian television, who was falsely accused of drug trafficking)
Disinformation
The law that allows banning Berlusconi from public offices is unconstitutional
A snakes and ladders game about Berlusconi’s conviction
Semantic technologies have progressed significantly, but they still miss most relevant semantic phenomena in social life
How to synthetically tell what a text is about, in the open domain, i.e. without specific training?
What is its core meaning, emotional content, position in the evolution of its topic, etc.?
How to analytically mashup data in relevant ways, in the open domain, i.e. without someone putting the necessary intelligence within?
A sample correlation fallacy
• In 2010, a data scientist presented an analysis of Facebook status updates
• Focus was on terms break up, broken up
• Results show a curve that peaks in early summer and before Christmas
Reported by: Has the Internet changed Science?, by E. Pisani, Prospect, November 2010
Interpretation of the speaker
“relationships melt down because of the stress of spending time together”
Reported by: Has the Internet changed Science?, by E. Pisani, Prospect, November 2010
Question by an attendee
“maybe not about relationships, but the end of terms, e.g. broke up for Christmas last Tuesday”
Reported by: Has the Internet changed Science?, by E. Pisani, Prospect, November 2010
Partly open problems
• Data integration interpretation without designers (risk of correlation fallacy)
• Opportunistic reasoning: travel planning, financial opportunities, team building, etc.
• Smart text summarization
• Opinion mining on the right spots
• Domain dynamics: science evolution, scholar changes, market dynamics, ...
Human, social knowledge management is not exempt from framing: it is modulated by frames, metaphors, and stories that make something relevant through neural activation patterns
People can be aware of framing ...
Pareidolia
... but often they are not
Think about how to frame an issue using your value
For example, if the issue is poverty and your value is protection
From a Foot Print Strategies Inc. spin doctor’s presentation
... but often they are not
Being_at_riskframe (FrameNet)
“Every family deserves a safe and healthy place to live”
Restructured from a Foot Print Strategies Inc. spin doctor’s presentation
Think about how to frame an issue using your value
For example, if the issue is poverty and your value is protection
Political reframing
• Conservatives say
• We need tax relief
• We need a strong president to protect us
• Same sex marriage will undermine family life
• Trial lawyer. Frivolous lawsuits
• Progressives say
• Taxes are investments
• We have a weak president who didn’t protect us
• Marriage is the realization of love in a lifetime commitment
• Public protection attorney
Restructured from a Foot Print Strategies Inc. spin doctor’s presentation
I’m in good company
• Myself (you never know!) (ESWC2009 keynote): knowledge patterns as objects of empirical investigation
• Frank Van Harmelen (ISWC2011 keynote): route to empirical research: data science, data patterns
• Martin Hepp (EKAW2012 keynote): web semantics not necessarily coincident with DL and traditional OE
• David Karger (ESWC2013 keynote): what can the SW do for average users? Not much until now
• Enrico Motta (ESWC2013 keynote): what semantics in the current SW? Different forces, still faith in good old logic
• John Sowa (SemTech2013 lecture): patterns exist at different levels of data, ontologies, and reality
... on the shoulders of
• Köhler, Bartlett, Piaget, Fillmore, Minsky, and many cognitive and neuro- scientists ...
Hypotheses from cognitive semantics
Cf. 2012 Dagstuhl Seminar on Cognitive Science for the Semantic Web
Reality is “framed” by our prior knowledge, expectations, opportunities, goals, which are organized as conceptual frames and stories, and linked to our emotions
Quantity IS Vertical position
Facebook Timeline format
they play a role in narratives (idealized stories) used to drive our decisions and
motivate our plans
Cf. George Lakoff ’s The Political Mind
Risky situation
Criminal investigation
Part-whole
Desirability
Becoming visible
Public services
Arithmetic commutative
Price per unit
Precariousness
Institution IS Family
Affectivity IS Temperature
Family in Italian
Law
Being ‘mbari in Catania
Address microformat
Frames are diversemore or less abstract, complex,
metaphoric, or specific
many of them stay unconscious most of the time
most are evident in human artifacts
Frames create our individual counterpart to reality
Neural binding allows to “connect the pieces” that come from perception, recall, abstraction, and imagination
Neural binding works according to emotional paths (dopaminergic, noradrenergic), linked to narratives and frames (hypothesis)
Mirror neurons activate the same circuitry for actual perception, recall of perception, abstraction of perception, and imagination of new perceptions
They are activated in neural binding, emotional paths, somatic markers and mirror neurons
Frames are part of our biology
• Semantics of social reality is implemented in media applications but it is hard-coded
• It remains in the mind of designers
• Semantic Web is overlooking to make it explicit
Semantic expressivity?
• Is our semantics enough to support extraction, representation, and harnessing of social semantics?
• triples are simple structures
• classes represent arbitrary concepts
where is cognitive adequacy?
when does a class represent arbitrary data, and when is it a counterpart of a human knowledge pattern?
is that difference important in general?
Administrative frames
Geographic frames
Communication frames
DBpedia
When triplifying Wikipedia infoboxes, its designers lost the framing of boxes and internal sub-boxes
The case of Infoboxes
• Infobox framing is missing in the DBpedia ontology too
• If we mine the ontology to check what properties can be applied to what classes, the result is partial and often non-correspondent to the original frame
• Scraping heuristics may be more cognitively-sound ...
Interaction semantics
• Interfaces and interaction patterns convey frame semantics
• Schema induction and triplification of databases can be improved by exploiting interfaces exposing data, cf. data.cnr.it ontology design
• HTML pages and stylesheets contain a lot of framing knowledge, cf. Craig Knoblock’s work
• Infographics can change the way we interpret the same data
Cognitive principles of KR on the Web
Empirical Conservativeness
Relevance in Modeling
Structured Provenance
To be added to LOD principles
Cognitive principles of KR on the Web
Empirical Conservativeness
Relevance in Modeling
Structured Provenance
To be added to LOD principles
Empirical conservativeness
What is present with a function in (evident, extracted, emerging) empirical data should be preserved in its semantic representation
• It is a measure against “oversimplification”
• The case of Infobox framing loss is a sample violation of this principle
Empirical conservativeness
Special case
• Keeping interaction boundaries is a special case of empirical conservativeness
• Like neural binding (at the neural level) and linguistic framing (at the cognitive level), relevant boundaries of logical representations need to be represented
Cf. original Marvin Minsky’s frames:
“representations that mirror cognitive mechanisms”
Is there anything like that in OWL or RDF?
Maybe ontology modules, classes, named graphs, hasKey axioms
Not specific to the boundary problem, nor to framing or neural binding
*Very recent: new spec for named graphs accepts typing
• With infoboxes, we are still discussing a case of basic semantics, since it is fully presented in data
• More cases from social reality include slippery relations:
counting as, intentionality, action schemes, normative constraints, emerging patterns, frames, metaphors, irony,
stories, socio-technical task semantics
• Those relations are partly implicit, and the modeling practices we use on SW/LOD are not designed for that, contra Minsky
More semantics or more distinctions?
• I am not advocating for “more semantics” in terms of complexity, rather for more distinctions
• Human knowledge is relational in nature
• We need n-ary and multigrade relations, but arbitrary relations would be too much in current KR scenarios, then we can use them with smart reification patterns
• Classes are powerful primitives in logical languages, specially in description logics and triple-based languages
More semantics or more distinctions?
• Fixation on classes goes with a trade-off
• Classes need to be distinguished in terms of design
• Class-oriented representation needs a “push-up” to partly recover the lost structure
Types of classes have been distinguished in the past
• AI: sorts and types
• Formal Ontology: OntoClean metaclasses, based on formal criteria
• OWL2 punning: arbitrary typing of classes
Solutions span between the two extremes: heavy principles (OntoClean) - no principle at all (punning)
Class types?
• How about distinguishing classes that implement interesting social relations?
• This classification should produce “reference frames” when querying, reasoning, or reusing data and ontologies, as well as when aligning, extracting, and discovering data and ontologies
Pragmatic meta-classes
Relevance in modeling
When modeling a class, its design motivation (relevance) must be explicit: it should be typed with the reason for that relevance
Relevance in modelling
• “What’s special in that class?”
• E.g., it’s central in the data, it’s a frame, it’s an n-ary reification mechanism, it’s the result of a discovery algorithm, etc.
• A new vocabulary for metaclasses?
• Top-down principles do not work unless people adopt them or there are procedures to discover them in data
• Disseminating a good practice
• A research program of discovering and reusing class types for the common good
Structured provenance
When merging RDF triples, they should come with their provenance data
The RDF mix case (sig.ma)
Some results from STLab
We (STLab and RCLN) are researching on
knowledge patterns as keys for accessing meaning on the Web
Discovering Collecting
Reengineering Using
http://wit.istc.cnr.it/stlab-tools/
A broader vision: knowledge patterns and their façades
• The Knowledge Pattern (KP) Model is a discovery and analogy structure <C,L,I,D,W,T,V,X,F,S,R,U> such that a KP emerges out of invariances across multiple façades:
• C: concept graphs
• L: logical forms (some axiomatization of multigrade predicates)
• I: local inference rules (either classical or approximate)
• D: (relational) data
• W: linguistic data
• T: social tagging data
• V: interaction/visualization/formatting structures
• X: provenance data
• F: framing information
• S: sentic information
• R: relations to other KP
• U: use cases
All façades can provide features for stochastic processes
All façades should be encoded in RDF for interoperability and joint reasoning
Informal graphs
DL & Rule patterns
Data patterns
Cognitive patterns
Web and interaction patterns
Lexical and NLP data patterns
Ontology design
patterns
Task patterns
Socio-patterns
Aldo Gangemi, Valentina Presutti: Towards a pattern science for the Semantic Web. Semantic Web 1(1-2): 61-68 (2010)
Situation:Case in point
Description:Norm
Description:Compatibility
scenario
Description:Social norm
Situation:Entrenchment Case
SETTING
Assessment layer
Normative layer
Social layer
Description:Meta-Norm
Situation:Jurisprudential Conflict Case
satisfies
satisfies
satisfies
satisfies
hasSetting
hasSetting
hasSetting
hasSetting
A pattern framework:Descriptions & Situations
Sample application to modeling legal cases with
norms, conflicts, and entrenchment
Aldo Gangemi, Peter Mika: Understanding the Semantic Web through Descriptions and Situations. ODBASE 2003: 689-706
Layered pattern morphisms
An ontology design pattern describes a formal expression that can be exemplified, morphed, instantiated, and expressed in order to solve a domain modelling problem
• owl:Class:_:x rdfs:subClassOf owl:Restriction:_:y
• Inflammation rdfs:subClassOf (localizedIn some BodyPart)
• Colitis rdfs:subClassOf (localizedIn some Colon)
• John’s_colitis isLocalizedIn John’s_colon
• “John’s colon is inflammated”, “John has got colitis”, “Colitis is the inflammation of colon”
LogicalPattern(MBox)
GenericContentPattern (TBox)
SpecificContentPattern (TBox)
DataPattern (ABox)
exemplifiedAs morphedAs instantiatedAs LinguisticPattern
expressedAs
Logic Meaning Reference Expression
expressedAs
expressedAs
Abstraction
Aldo Gangemi, Valentina Presutti: Ontology Design Patterns. Handbook on Ontologies 2nd ed. (2009)
N-ary patterns in KR• Temporal indexing pattern
– (R(a,b))+t sentence indexing • quads, external time stamps
– R(a,b)+t relation indexing• reified n-ary relations (3D frames)
– R(a+t,b+t) individual indexing• fluents, 4D, tropes, “context slices” (4D frames)
– tR name nesting• ad hoc naming of binary relations
• More indexes for additional arguments
Aldo Gangemi, Valentina Presutti: A Multi-dimensional Comparison of Ontology Design Patterns for Representing n-ary Relations. SOFSEM 2013: 86-105
Andreas Scheuermann, Enrico Motta, Paul Mulholland, Aldo Gangemi and Valentina Presutti. An Empirical Perspective on Representing Time. K-CAP 2013
Centrality discovery in datasetsmo:Track
mo:MusicAr.st
mo:Playlist
mo:Torrent
mo:ED2K
tags:Tag
mo:Record
foaf:maker
rdfs:Literal
dc:+tle
dc:datemo:image
dc:descrip+on
mo:track
tags:taggedWithTag
mo:available_as
mo:available_as
mo:available_as
Valen&na Presu,, Lora Aroyo, Alessandro Adamou, Balthasar Schopman, Aldo Gangemi, Guus Schreiber: Extrac&ng Core Knowledge from Linked Data. COLD2011, CEUR-‐WS.org Vol-‐782.
Encyclopedic Knowledge Patterns: example
• An Encyclopedic Knowledge Pattern (EKP) is discovered from the paths emerging from Wikipedia page link invariances
• They are represented as OWL2 ontologies
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valen&na Presu,, Paolo Ciancarini: Encyclopedic Knowledge PaUerns from Wikipedia Links. Interna'onal Seman'c Web Conference (1) 2011: 520-‐536
Encyclopedic KP: input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
Encyclopedic KP: input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
dbpo:MusicalArtist
dbpo:MusicalArtist dbpo:Organisation dbpo:Place
Encyclopedic KP: input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
dbpo:MusicalArtistdbpo:MusicalArtist
dbpo:Organisation
dbpo:PlacelinksToMusicalArtist linksToPlace
linksToOrganisation
• Paths are used to discover Encyclopedic Knowledge Patterns– Such patterns should make it emerge the most typical types of things that the Wikipedia
crowd uses to describe a resource of a given type
Encyclopedic KP: input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
dbpo:MusicalArtistdbpo:MusicalArtist
dbpo:Organisation
dbpo:PlacelinksToMusicalArtist linksToPlace
linksToOrganisation
• Paths are used to discover Encyclopedic Knowledge Patterns– Such patterns should make it emerge the most typical types of things that the Wikipedia
crowd uses to describe a resource of a given type
Path à Pi,j= [Si, p, Oj]
k-means clustering on Path Popularity
Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity
Encyclopedic Knowledge Patterns
from Wikipedia Wikilinks
(@ISWC2011)
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valen&na Presu,, Paolo Ciancarini: Encyclopedic Knowledge PaUerns from Wikipedia Links. Interna&onal Seman&c Web Conference (1) 2011: 520-‐536
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity
Encyclopedic Knowledge Patterns
from Wikipedia Wikilinks
(@ISWC2011)
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valen&na Presu,, Paolo Ciancarini: Encyclopedic Knowledge PaUerns from Wikipedia Links. Interna&onal Seman&c Web Conference (1) 2011: 520-‐536
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
3 small clusters with ranks above 22.67%
Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity
Encyclopedic Knowledge Patterns
from Wikipedia Wikilinks
(@ISWC2011)
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valen&na Presu,, Paolo Ciancarini: Encyclopedic Knowledge PaUerns from Wikipedia Links. Interna&onal Seman&c Web Conference (1) 2011: 520-‐536
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
3 small clusters with ranks above 22.67%
Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity
Encyclopedic Knowledge Patterns
from Wikipedia Wikilinks
(@ISWC2011)
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valen&na Presu,, Paolo Ciancarini: Encyclopedic Knowledge PaUerns from Wikipedia Links. Interna&onal Seman&c Web Conference (1) 2011: 520-‐536
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
3 small clusters with ranks above 22.67%
Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity
1 alternative cluster (6-cluster) with ranks below 11.89%
Encyclopedic Knowledge Patterns
from Wikipedia Wikilinks
(@ISWC2011)
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valen&na Presu,, Paolo Ciancarini: Encyclopedic Knowledge PaUerns from Wikipedia Links. Interna&onal Seman&c Web Conference (1) 2011: 520-‐536
Serendipity in exploratory browsing
Aemoo: exploratory search based on EKP -‐ Seman'c Web Challenge @ISWC 2011 – Short listed, 4th place
http://www.aemoo.org
Andrea Giovanni Nuzzolese, Valen&na Presu,, Aldo Gangemi, Alberto Muse,, Paolo Ciancarini: Aemoo: exploring knowledge on the web. WebSci 2013: 272-‐275
Machine reading with FRED http://wit.istc.cnr.it/stlab-tools/fred/
Valen&na Presu,, Francesco Draicchio, Aldo Gangemi: Knowledge Extrac&on Based on Discourse Representa&on Theory and Linguis&c Frames. EKAW 2012: 114-‐129
Event and Frames from text
The New York Times reported that John McCarthy died. He invented the programming language LISP.
Resolu&on and linking
Vocabulary alignment
Frames/events
Taxonomy
Seman&c rolesMeta-‐level
Co-‐reference
Custom namespace
Types
Valen&na Presu,, Francesco Draicchio, Aldo Gangemi: Knowledge Extrac&on Based on Discourse Representa&on Theory and Linguis&c Frames. EKAW 2012: 114-‐129
http://wit.istc.cnr.it/stlab-tools/fred/
Sentic frames from texthttp://wit.istc.cnr.it/stlab-tools/sentilo
Sentic frames from text
Paul Newman thinks that Barack Obama is a great president!
http://wit.istc.cnr.it/stlab-tools/sentilo
But beware “patternicity”
Psychopathology of big data
• Pattern recognition vs. patternicity:
• Simple correlation fallacy
• Synchronicity
• Pareidolia
• Conspiracy theories
• Schizophrenic apophany
Web helps spreading patternicity
• More information
• Faster spread of information
• More difficult provenance checking
Finally, back to the objective fiction problem
Logic, ontologies, and data design practices construct a reality
Similarly to what social institutions do since millennia by playing with the many layers at which communication means can be morphed
The reality currently built by triple-based languages is basic
It looks more like a quiz-show than full-fledged social reality
• On the contrary, the reality constructed by current institutions and media is a glorious “objective fiction”
• A powerful system feeding a giant, analogical Matrix that is partly opaque to most people
• Our use of semantics should try to approximate the level of sophistication that objective fiction has gathered to now
• We have a responsibility of creating an added value: making objective fiction explicit
• When simple objective data are provided, its semantic representation is too simple to provide added value to users
• We need to raise our semantic grasp to institutional relations, hidden knowledge patterns, action, situation, sentic and metaphoric frames, leading stories, socio-technical tasks
• Semantic technologies should make us aware of real semantics, not just bare facts
That's very difficult, but the alternative is leaving real semantics to spin doctors
• Worse, simple triple-based semantics gives them more power
• Transparency and distributed decision making being easily accessible on an open Semantic Web: the Holy Grail?
Thanks for your attention!