1 towards scholarly publishing on the semantic web simon buckingham shum senior lecturer open...
TRANSCRIPT
1
Towards Scholarly Publishing Towards Scholarly Publishing on the Semantic Webon the Semantic Web
Simon Buckingham ShumSimon Buckingham ShumSenior LecturerSenior Lecturer
Open University, Knowledge Media Institute Open University, Knowledge Media Institute
Gary Li, Victoria Uren Gary Li, Victoria Uren John Domingue, Enrico MottaJohn Domingue, Enrico Motta
EPSRC DIMnet Workshop, Manchester, 7-8 Oct., 2002EPSRC DIMnet Workshop, Manchester, 7-8 Oct., 2002
2
In 2010, will scholarly work still be published solely in prose, or can we
imagine a complementary infrastructure that is ‘native’ to the emerging semantic, collaborative
web, enabling more effective dissemination and analysis of ideas?
3
Project facts and figuresProject facts and figures Scholarly OntologiesScholarly Ontologies (ScholOnto) Project (ScholOnto) Project
3 year project /started Feb. 20013 year project /started Feb. 2001
PI:PI: Simon Buckingham Shum Simon Buckingham Shum Co-Investigator’s:Co-Investigator’s: John Domingue, Enrico Motta John Domingue, Enrico Motta Research Fellows:Research Fellows: Gary Li, Victoria Uren Gary Li, Victoria Uren PhDs: PhDs: 5 related projects5 related projects
Partner: Partner: Academic PressAcademic Press
Synergy with other EPSRC projects at KMi:Synergy with other EPSRC projects at KMi: Advanced Knowledge Technologies Advanced Knowledge Technologies IRC IRC CoAKTinG: CoAKTinG: eScience Grid collaboration toolseScience Grid collaboration tools
4
OverviewOverview
Problem:Problem: little computational support little computational support for interpreting and analysing research for interpreting and analysing research literaturesliteratures
Approach:Approach: literatures as networks of literatures as networks of ‘claims’: connected concepts‘claims’: connected concepts
Theoretical basis:Theoretical basis: argumentation, argumentation, coherence relations, KB-hypertextcoherence relations, KB-hypertext
Infrastructure:Infrastructure: ClaiMaker – a ‘claims ClaiMaker – a ‘claims server’ to construct and analyse server’ to construct and analyse scholarly claimsscholarly claims
ProgressProgress to date to date
5
Phenomena of interest to scholarsPhenomena of interest to scholars
““Who’s building on the ideas in this paper, and in what way?”Who’s building on the ideas in this paper, and in what way?”
““Who’s challenged this paper?”Who’s challenged this paper?”
““Has anyone proposed a similar solution but from a different Has anyone proposed a similar solution but from a different theoretical perspective?”theoretical perspective?”
““Are there groups building on theory T, but who contradict each Are there groups building on theory T, but who contradict each other?”other?”
““Has anyone generalised method M from domain D to E?”Has anyone generalised method M from domain D to E?”
““Is there any software which tackles problem P?”Is there any software which tackles problem P?”
““What impact did Language L have?”What impact did Language L have?”
““Are there distinctive theoretical perspectives on problem P?”Are there distinctive theoretical perspectives on problem P?”
6
What students/researchers/information What students/researchers/information analysts want to knowanalysts want to know
AuthorityAuthority
ImpactImpact
Schools of thoughtSchools of thought
Intellectual lineageIntellectual lineage
ConsistencyConsistency
7
resourcesdocuments, datasets, etc…
metadata generally uncontroversial:
minimise inconsistency, ambiguity, controversy
domain ontologies richer formalisation of consensus:
minimise inconsistency, ambiguity, controversy
interpretations?interpretations?
8
“The Scent of a Site: A System for Analyzing and Predicting Information Scent, Usage, and Usability of a Web Site”
Web User Flow by Information Scent (WUFIS)
“Information foraging”
Information foraging theory
Information scent models
People try to maximise their rate of gaining information
?
extends
From From undifferentiatedundifferentiated, inter-document citations…, inter-document citations…
……to inter-to inter-conceptconcept,, semantic semantic connectionsconnections
11
ScholOnto in a nutshell…ScholOnto in a nutshell…
Literatures as Literatures as networks of conceptsnetworks of concepts…… ……which are which are grounded in documentsgrounded in documents Connections between nodes are Connections between nodes are claimsclaims
Core set of connection typesCore set of connection types, which can , which can be expressed in discipline-specific be expressed in discipline-specific dialectsdialects
Multiple claim structures from Multiple claim structures from diverse perspectivesdiverse perspectives
A server A server mediates and helps managemediates and helps manage the the complexity of the claims networkcomplexity of the claims network
12
Claim
Structure of a connective Structure of a connective ClaimClaim
LinkLink
Concept Type Optional classification of object(s)
in the context of this link
- Label: summarising... - Type - Polarity - Weight - Direction - Author - Timestamp
Object
- concept- data- set/claim
14
Claim
Structure of a connective Structure of a connective ClaimClaim
Link
Concept Type Optional classification of object(s)
in the context of this link
Object
- concept- data- set/claim
- Label: summarising... - Type - Polarity - Weight - Direction - Author - Timestamp
Set
15
‘‘Concepts’Concepts’ Succinct summaries of a publication’s contribution to the literature (granularity chosen by the user)Succinct summaries of a publication’s contribution to the literature (granularity chosen by the user) Optionally given a typeOptionally given a type
Example 1Example 1 [Theory][Theory] Salomon (1987) Salomon (1987) [Hypothesis][Hypothesis] Animations can supplant key cognitive processes in learning collision mechanics, impairing deep understanding Animations can supplant key cognitive processes in learning collision mechanics, impairing deep understanding [Data][Data] Animations explaining momentum in the tool Animations explaining momentum in the tool XtremePhysics XtremePhysics improve the performance of middle-high achieving 16 yr olds, but impair low achieversimprove the performance of middle-high achieving 16 yr olds, but impair low achievers
Example 2Example 2 [Problem][Problem] How to reduce disorientation in non-linear narrative? How to reduce disorientation in non-linear narrative? [Theory][Theory] Cognitive Coherence Relations (Knott and Sanders, 1999) Cognitive Coherence Relations (Knott and Sanders, 1999) [Theory] [Theory] Semiotics of CinemaSemiotics of Cinema [Framework] [Framework] Cinematic HypermediaCinematic Hypermedia
23
Conceptual claim-making template Conceptual claim-making template for an Evaluation Reportfor an Evaluation Report
24
Discovery ServicesDiscovery Services The The paybackpayback for modelling for modelling New forms of New forms of digital visibility digital visibility for researchfor research
GraphGraph-based services -based services Dense Dense cluster detectioncluster detection Scientometrics Scientometrics (e.g. co-citation at the semantic inter-concept level)(e.g. co-citation at the semantic inter-concept level)
OntologyOntology-based services-based services Semantic structuralSemantic structural search search Show Show supportingsupporting documents documents Show Show challengingchallenging documents documents Show a concept’s Show a concept’s lineagelineage
VisualizationsVisualizations to support to support navigation and queryingnavigation and querying
25
Identifying potentially significant clustersIdentifying potentially significant clusters
Simple linear SVMRules made with CHARADE outperform Naive Bayes and decision trees
Decision Forest classifier improves on C4.5 and kNN
Simple linear SVM is among the best reported text categorizers
CDM performs moderately better than Naive Bayes and decision trees
Optimised rules outperform Naive Bayes and decision trees
Decision trees and Naive Bayes perform well for text categorization
SVMs are well suited to text categorization
Support Vector Machines (SVM)
Naive Bayes underperforms other classifiersNaive Bayes is the worst classifier
Nearest Neigbour is one of the best categorizers
SVM and kNN outperform other classifiers
Which classifier is best?
Rule learning
Instance based learning
Bayesian learning
Decision tree learning
Machine learning
A 3-core cluster extracted from a network of claims and argumentation links. From hundreds of nodes modelling literature on text categorization, only those which connect to at least 3 other nodes in the cluster are presented (with link labels switched off). A flavour of key issues in the field is given without overwhelming the viewer.
28
Visualizing the ‘lineage’ (intellectual history) of a concept
Zooming, rotation, focusing and filtering
29
What documents challenge this one?What documents challenge this one?
1.1. Extract concepts for this documentExtract concepts for this document2.2. Trace concepts on which they buildTrace concepts on which they build3.3. Trace concepts challenging this setTrace concepts challenging this set4.4. Show root documentsShow root documents
31
Next stepsNext steps
ClaiMaker ClaiMaker releasedreleased wide interest from both researchers wide interest from both researchers
(academia/government/corporate) and publishers(academia/government/corporate) and publishers
Develop customisable software Develop customisable software agentsagents monitor the claims network for patterns of interest to usersmonitor the claims network for patterns of interest to users
Extend the Extend the discovery servicesdiscovery services tools to interrogate/navigatetools to interrogate/navigate
Extend the Extend the visualization servicesvisualization services making sense of the claims networkmaking sense of the claims network
Foster Foster user communitiesuser communities broad spectrum of science/arts/humanities to test generalitybroad spectrum of science/arts/humanities to test generality
32
Visualizing Argumentation (2002, in press), Springer
www.VisualizingArgumentation.info
Argument mapping for scholarly publishing, scientific and public policy debates, education, teamwork, and organisational memory