knowledge-based e-science nigel shadbolt the university of southampton

70
Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Upload: anissa-hudson

Post on 11-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge-Based e-Science

Nigel ShadboltThe University of Southampton

Page 2: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

The work of many people…

Harith Alani Steve Harris Nick Gibbins Yannis Kalfoglou Kieron O’Hara David Dupplaw Bo Hu Paul Lewis Srinandan Dashamapatra Duncan Macrae-Spencer Hugh Glaser

Les Carr David de Roure Wendy Hall Mike Brady David Hawkes Yorick Wilks Enrico Motta Carole Goble Simon Cox Andy Keane :

Page 3: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Advanced Knowledge Technologies IRC

www.aktors.org

Page 4: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Structure

A little history Shared Meaning and Content Services Collaboration Mapping Science Lessons Learnt

Page 5: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

A Little History

Page 6: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge Engineering: Evolution

Page 7: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge engineering is not about transfer but about modelling aspects of human knowledge

The knowledge level principle: first concentrate on the conceptual structure of knowledge and leave the programming details for later

Knowledge has a stable internal structure that can be analysed by distinguishing specific knowledge types and roles

Knowledge Engineering: Principles

Page 8: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

McBrien, A.M., Madden, J and Shadbolt, N.R. (1989). Artificial Intelligence Methods in Process Plant Layout. Proceedings of the 2nd International Conference on Industrial and Engineering Applications of AI and Expert Systems, pp364-373, ACM Press

Knowledge-Based System: PLS

Page 9: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Bull, H.T, Lorrimer-Roberts, M.J., Pulford, C.I., Shadbolt, N.R., Smith, W. and Sunderland, P. (1995) Knowledge Engineering in the Brewing Industry. Ferment vol.8(1) pp.49-54.

Knowledge-Based System: Brewing

Page 10: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Many systems deployed and in routine use Often embedded

Understood aspects of

Representing knowledge and dealing with uncertaintyHow to reason with knowledgeHow to acquire the knowledgeHow to run in real real time

Knowledge-Based Systems: Achievements

Page 11: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

How to disseminate capability?

How to reuse hard won knowledge?

How to maintain the knowledge-bases?

How to support communities of users and practitioners?

Knowledge-Based Systems: Problems

Page 12: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

The Semantic Web and e-Science…

Fundamentally changed the way we think about KA, knowledge engineering and knowledge management

Suggested a different way in which knowledge intensive components could be deployed

Also brought together a community unencumbered by close attention either to AI or Knowledge Engineering

New funding opportunities…

Page 13: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Making the Web Semantic…

Page 14: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Via meta content…

This is a type of object event and this is its title

This is the URL of the web page for the event

This is a type of object photograph and the photograph is of Tim Berners-Lee

Tim Berners-Lee is an invited speaker at the event

In a way that machines can deal with….

<owl:Class rdf:ID="Conference"><rdfs:subClassOf rdf:resource="#Meeting-Taking-Place"/><rdfs:subClassOf rdf:resource="#Publication-Type-Event"/>-<rdfs:subClassOf>-<owl:Restriction><owl:onProperty rdf:resource="#published-proceedings"/><owl:allValuesFrom rdf:resource="#Conference-Proceedings-Reference"/></owl:Restriction>

Page 15: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Can Annotate Anything

DatabasesScientific

structuresWorkflowPublicationsPeople

Web data set (XHTML)

Page 16: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

The Semantic Web is about Sharing Meaning

Linkage of heterogeneous information web content databases meta-data

repository multimedia

Via ontologies to share meaning

Using Semantic Web languages

Oncogene(MYC): Found_In_Organism(Human). Gene_Has_Function(Transcriptional_Regulation). Gene_Has_Function(Gene_Transcription). In_Chromosomal_Location(8q24). Gene_Associated_With_Disease(Burkitts_Lymphoma).

NCI Cancer Ontology (OWL)

<meta> <classifications> <classification type="MYC” subtype="old_arx_id">bcr-2-1-059</classification> </classifications></meta>

BioMedCentral Metadata (XML)

Web data set (XHTML)

Vocabulary (RDFS)

Page 17: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Sharing Meaning on the Web: Ontologies

A shared conceptualisation of a domain

Provides the semantic backbone

Lightweight and is deployed using W3C recommended standards

Page 18: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

A Philosophical Digression

Page 19: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Realist Stance

We engage with a reality directly Reality consists of pre existing objects with attributes Our engagement may be via reflection, perception or

language Philosophical exponents

Aristotle Leibnitz the early Wittgenstein :

Language and logic pictures the world Seen as a way of accounting for common

understanding Promised a language for science

Page 20: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

There is no simple mapping into external objects and their attributes in the world

We construct objects and their attributes This construction may be via intention and perception, it

may be culturally and species specific Philosophical exponents

Husserl Heidegger Later Wittgenstein :

Language as games, complex procedures, contextualised functions that construct a view of the world

Constructivist Stance

Page 21: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Ontologies - Current Context

The large metaphysical questions remain What is the essence of being and being in the world

Our science and technology is moving questions that were originally only philosophical in character into practical contexts Akin to what happened with natural philosophy from the 17th

century – chemistry, physics and biology

As our science and technology evolves new philosophical possibilities emerge Particularly when we look at knowledge and semantic based

processing

Page 22: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Shared Meaning and Content:The Role of Ontologies

Page 23: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

The AKT Ontology

Designed as a learning case for AKT

Adopted for our own Semantic Web experiments

Uses a number of Upper Ontology fragments

Reused in many contexts

Page 24: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

data sources

?

Mediate and Aggregate: UK Research Councils

Page 25: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

A Proposed Solution

data sources

gatherers Ontology knowledge repository

(triplestore)

applications

Page 26: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Raw CSV dataHeterogeneous tables

Processed RDF informationUniform format for files

Mediate and Aggregate: Ontologies

Page 27: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

An Application Service

Relatively simple could yield real information integration and interoperability benefits

Reuse was real but again lightweight

Ontology winnowing needed

Page 28: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Content harvested and published from multiple Heterogeneous Sources

Higher Education directories 2001 RAE submissions UK EPSRC project database Info on personnel, projects

and publications harvested for 5 or 5* CS departments in the UK

Automatic NL mining: Armadillo

All UK administrative areas (from ISO3166-2)

All UK settlements listed in the UN LOCODE service

Currently constructing spaces for other domains

Mediate and Aggregate: CS AKTive

Page 29: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Classification: Ontologies in Proteomics

Work of Brass, Stephens and colleagues at Manchester

Classifying protein function

Large areas of biology where this is a knowledge-intensive human function

Mass of sequencing data demands assistive technologies

Auto classification using OWL ontologies and reasoners

Page 30: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

From Amino Acids to Protein Domains

>1A5Y:_ PROTEIN TYROSINE PHOSPHATASE 1B

MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLHQEDNDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRVMEKGSLKCAQYWPQKEEKEMIFEDTNLKLTLISEDIKSYYTVRQLELENLTTQETREILHFHYTTWPDFGVPESPASFLNFLFKVRESGSLSPEHGPVVVHXSAGIGRSGTFCLADTCLLLMDKRKDPSSVDIKKVLLDMRKFRMGLIATAEQLRFSYLAVIEGAKFIMGDSSVQDQWKELSHEDLEPPPEHIPPPPRPPKRILEPHNGKCREFFPN

Page 31: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Ontology Mediated Classification

Proteins carry out function in a cell

Broad functional classes are “Protein Families”

Families sub-divided to family classifications

Class membership determined by “protein features”, such as domains, etc.

Classification enables comparative genomics - what is already known in other organisms

The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology

Page 32: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Ontological Mediated Classification: Results

The ontology system refined classification- DUSC contains zinc finger domain Characterised and conserved – but not in classification- DUSA contains a disintegrin domainpreviously uncharacterised – evolutionarily conserved

A new kind of phosphatase? Extending to fungi much less studied

Page 33: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

A Standards Digression

Page 34: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Standards are fundamental

HTML XML + Name Space + XML Schema

Topic Maps

SMIL

RDF(S)XOL

OWL

RDF

Unicode URI

Page 35: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

What is wrong with XML?

Nothing but… Data/Information oriented

applications are : Less concerned about serial order. e.g., patient records vs

poems

More concerned with aggregation. e.g., merging information

about a patient from different sources vs merging musical scores

Page 36: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Viewpoints on RDF and XML…

The difference between the RDF proponents and the XML proponents is fairly simple. In the XML-centric world parties can utilize whatever internal formats and data sources they want but exchange XML documents that conform to an agreed upon format, in cases where the agreed upon format conflicts with internal formats then technologies like XSLT come to the rescue.

RDF vocabulary, allows free composition with other RDF language… assertions can be mixed with assertions about weather, physics, business processes, weblogs and syndication, genealogy, politics, and so on, without the need for prior coordination between the designers of languages for these domains.

The Semantic Web is not an all-or-nothing proposition; it is a rubric describing a set of distinct (though related) technologies - RDF, FOAF, OWL, RSS, XML - all of which are designed to improve machine-to-machine communication [...].

[...] the many smaller victories already taking place, impact of RDF, RSS, and Web services.

The RDF position is that it is too difficult to agree on interchange formats so instead of going down this route we should use KR schema to map between formats. [...]

Page 37: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge-Based Services

Page 38: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Diverse and heterogeneous content Clinical examination

Notes

Imaging X-ray, Ultrasound MRI

Microscopy Histopathology

Treatment Protocol Records Re-assessment

Medical Records Case sets Individual patient records

Published background Epidemiology Medical Abstracts

Collaborative Medical Decision Making

Page 39: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

The Services and Framework

Image Analysis Services

Classification Services

Patient Data Retrieval Services

Image Registration

Natural Language Report Generation

Patient Records through web-service

Page 40: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Underlying Representation of Patients

<rdf:Description rdf:about='#g1p78_patient'> <rdf:type rdf:resource='#Patient'/> <NS2:has_date_of_birth>01.01.1923</NS2:has_date_of_birth> <NS2:involved_in_ta rdf:resource='#ta_soton_000130051992'/></rdf:Description>

<rdf:Description rdf:about='#ta_soton_000130051992'> <rdf:type rdf:resource='#Multi_Disciplinary_Meeting_TA'/> <NS2:involve_patient rdf:resource='#g1p78_patient'/> <NS2:consist_of_subproc rdf:resource='#oe_00103051992'/> <NS2:consist_of_subproc rdf:resource='#hp_00117051992'/> <NS2:consist_of_subproc rdf:resource='#ma_00127051992'/> <NS2:has_overall_impression rdf:resource='#assessment_b5_malignant'/> <NS2:has_overall_diagnosis>invasive carcinoma</NS2:has_overall_diagnosis></rdf:Description>

<rdf:Description rdf:about='#oe_00103051992'> <rdf:type rdf:resource='#Physical_Exam'/> <NS2:has_date>03.05.1992</NS2:has_date> <NS2:produce_result rdf:resource='#oereport_glp78_1'/> <NS2:carried_out_on rdf:resource='#g1p78_patient'/></rdf:Description>

<rdf:Description rdf:about='#oereport_glp78_1'> <NS2:type rdf:resource='#Lateral_OE_Report'/> <NS2:contains_roi rdf:resource='#oe_roi_00103051992'/> <NS2:has_lateral rdf:resource='#lateral_left'/></rdf:Description>

Page 41: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Shared meaning: People Institutions Norms

Are concepts and classes are end-products of epistemological and/or decision-making procedures

One needs to “recognise” instances of a particular class as such

Information indexed against an ontology and be treated declaratively (Tarskian view, OWL), but …

… they come into being procedurally against social and institutional norms,

Page 42: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Institutional Norms

Common false-positives in FNAC is misdiagnosis of apocrine cells as malignant condition (pleomorphic appearance signals malignancy; morphological characteristics trad. distinguishing classification criteria for pathologists)

For KR support, need to record not just the label relevant for diagnosis (“apocrine cells”) but also the means by which such a labelling was achieved

• NHS guidelines suggests for identification of apocrine cells (common false positive):

“Recognition of the dusty blue cytoplasm, with or without cytoplasmic granules with Giemsa stains or pink cytoplasm on Papanicolaou or haematoxylin and eosin stains coupled with a prominent central nucleolus is the key to identifying cells as apocrine.”

Page 43: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Formalised Procedures

For laboratory practice L(x, t) that specimen x is subjected to in context t (time, state variables for exptal/clinical conditions) a predicative attribute P(x) is identified with behavioural response B(x, t) leading to an implicit definition of P(x)

Page 44: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Procedures for Reproducibility

Specific criteria for identifying histopathological slides as instances of particular lesions – rule following props – make concept labelling reproducible

For Ductal Carcinoma in situ, Atypical ductal hyperplasia, procedural criteria reducesinter-expert variability

Criteria of Page et al (Cancer 1982; 49:751-758; Cancer 1985; 55:2698-2708), reported by Fechner in MJ Silverstein (1997). Ductal Carcinoma In Situ of the Breast

Page 45: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Norms and Rule-following

Concept use in medical practice requires the recognition of instances as instances of appropriate classes

Classes are assigned as proxies of groups of instances to respond in coherent ways to patterns of questioning

Class ascription needs to be reproducible Reproducibility is enhanced by rule-

following

Page 46: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

http://www.geodise.org

Page 47: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Nacelle Design

Computational fluid

dynamics

(1)specify the geometry,(2)generate a mesh(3)decide which code to

use for the analysis,(4)decide the

optimisation schedule(5)execute the

optimisation run

Design Optimisation of a Nacelle

Page 48: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge Engineering in Geodise

Techniques: CommonKADS knowledge

engineering methodologies, PCPACK

Knowledge models: Organization, agent & task templates, domain schema & inferences.

Representation:HTML (knowledge web), XML, CML, UML, flowchart.

Page 49: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge models: ontologies, knowledge bases

Tools: Protégé & OilEd

Representation: RDF, Rule Sets

Knowledge Engineering in Geodise

Page 50: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

WF Advisor

Semantic Service Description

A KB Approach to Semantic Service Composition

Page 51: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Lessons Learnt

Ontology mediated invocation of services Services as Problem Solving Methods or

Task Achieving Components Semantic service description is

indispensable for resource sharing and reuse Composition and workflow support will vary

from domain to domain Knowledge acquisition & conceptualisation

may still be the bottleneck

Page 52: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge-Based Collaboration

Page 53: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Simon Buckingham Shum (Open)David De Roure (So’ton)Marc Eisenstadt (Open)Nigel Shadbolt (So’ton)

Austin Tate (Edin)

AKT Integrated Feasibility Demonstrator

Page 54: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

collective sensemaking /

discussion capture

following through decisions/coordinating activities

synthesising artifacts

recovering information from meetings

awareness of agent status /

sense of ‘presence’virtual meetings

Integrating multiple methods of communicating

Page 55: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

mapping real time discussions/ group

sensemaking

following through decisions/coordinating activities

synthesising artifacts

recovering information from meetings

awareness ofcolleagues’

availability/ sense of ‘presence’virtual meetings

Testbed: Scientific project management (AKT PI NetMeeting, March 2003)

recovering information from meetings

NetMeeting

BuddySpace

Compendium

I-X Process panels

Page 56: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Meeting replay by time, speaker, contribution to discussion, and decision

Page 57: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

CoAKTinG @ NASA

Distributed Mars-Earth planning and data analysis toolsfor Mars Habitat field trial in Utah desert, supported from US+UK

Page 58: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

CoAKTinG NASA testbed

Compendium activity plans for surface exploration, constructed by scientists,

interpreted by software agents

Page 59: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

CoAKTinG NASA testbed

Compendium science data map, generated by software agents, for interpretation by Mars+Earth scientists

Page 60: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

CoAKTinG NASA testbed

Compendium-based photo analysis by geologists on ‘Mars’

Page 61: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

CoAKTinG NASA testbed

Compendium scientific feedback map from Earth scientists to Mars colleagues

Page 62: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

CoAKTinG NASA testbed

Compendium ‘portal map’ for Earth scientists, with the key links they need to prepare for a virtual meeting to analyse Mars scientists’

data

Page 63: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

CoAKTinG NASA testbed

Meeting Replay tool for Earth scientists, synchronising

video of Mars crew’s discussion with their Compendium maps

Page 64: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Lessons from CoAKTinG

Initially achived a lot with light weight integration via XML based architectures like Jabber

Recently large scale 3 store RDF repository with taxonomic capabilities and RDQL provided a deeper integration

Substantial amounts of integration via the common representation of RDF in the repository

Real time services for annotation of streaming media particularly challenging

Achieved collaborative virtual team support – and follow on VRE project

Page 65: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Knowledge Mapping for e-Science

Page 66: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

The Network Effect

We need tools to assist our understanding of emerging and established scientific work

Social Network Analysis Communities of

Interest/Practice Who is doing what,

where With what impact

Page 67: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Existing Citation Ratings…

Use of the literature Citeseer’s ‘most

cited author’ page. People use it to rate

themselves and others.

25 x ‘D Johnson’ in Citeseer: inaccurate!

Pure citation count not enough to prove ‘impact’?

Page 68: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Hubs and Authorities

Begin with existing measures: document count and citation count.

Apply Kleinberg (1998) ‘hubs/authorities’ analysis to data.

Note that higher citation count may not mean higher authority rating: quality citations are what count.

Page 69: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Turning Points and Centrality

Next step: include Chen’s Citespace/Pathfinder algorithm.

Allows us to find turning points in scientific development: Kuhn’s “paradigm shift” moment.

‘Centrality’ measure to be applied to same Citeseer data.

Page 70: Knowledge-Based e-Science Nigel Shadbolt The University of Southampton

Lessons Learnt

The content is primary It needs rich semantic

annotation via ontologies Services emerge/designed

to exploit the content

Ontologies Mediation and

Aggregation Lightweight works Mapping and alignment

Socio technical issues Ontologies and KA Usability and

requirements

Importance of Reference A fundamental problem

Push versus Pull Important to understand

the balance here

Fidelity Need for trust, provenance

and quality services