1 streams, structures, spaces, scenarios, and societies (5s): a formal digital library framework and...
TRANSCRIPT
1
Streams, Structures, Spaces, Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications
Marcos André GonçalvesDoctoral defenseVirginia Tech, Blacksburg, VA 24061 USA
2
Acknowledgments Funding:
CAPES, NSF, AOL Collaborators
Pavel Calado, Lilian Cassell, Marco Cristo, Patrick Fan, Ed Fox, Robert France, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Aaron Krowne, Alberto Laender, Claudia Medeiros, Naren Ramakrishnan, Berthier Ribeiro-Neto, Rao Shen, Hussein Suleman, Ricardo Torres, Layne Watson, Baoping Zhang, Qinwei Zhu, …
3
Publications and Accomplishments Book Chapters
4 published + 1 in press Journal/Magazine papers
8 published + 1 under revision + 1 accepted Conference/Workshop papers
25 published Other publications (poster and demo papers)
4 published Awards
3 (Lewis Trustee Award, AOL-CIT Fellowship– Honorable Mention, JCDL’04 Best Student Paper)
Helped supervise three Masters students
4
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
5
Motivation Digital Libraries (DLs): what are they??
No definitional consensus Conflicting views Makes interoperability a hard problem
DLs are not benefiting from formal theories as are other CS fields: DB, IR, PL, etc.
DL construction: difficult, ad-hoc, lack of support for tailoring/customization
Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. Lack of specific DL models, formalisms, languages
6
Hypotheses A formal theory for DLs can be built
based on 5S.
The formalization can serve as a basis for modeling and building high-quality DLs.
7
Research Questions1. Can we formally elaborate 5S?
2. How can we use 5S to formally describe digital libraries?
3. What are the fundamental relationships among the Ss and high-level DL concepts?
4. How can we allow digital librarians to easily express those relationships?
5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties?
6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?
8
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
9
Informal 5S Definitions: DLs are complex systems that
help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)
10
5SsSs Examples Objectives
Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending
Details the behavior of DL services
Societies Service managers, learners, teachers, etc.
Defines service managers, responsible for running DL services; actors, that use those services
11
5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
12
Glossary: Concepts in the Minimal DL and Representing SymbolsConcept Symbol Digital object do Metadata specification ms Set of metadata specifications mss Collection C Catalog DMC Repository R Event e Scenario Sc Services Se Actor Ac Service Manager SM Operation op Society Soc
13
5SStreams
text
audio
image
video do mss
R
C DMcIc
Se
Sc
e
SM
Ac
op
Scenarios
Societies
Top
Pr
Metric
Measurable
Measure
Structures
Spaces
Vec
ms
Static /Passive
Dynamic /Active
14
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
15
Streams
text
audio
image
video do mss
R
C DMc
describes
stores
is_version_of/ cites/links_to
Ic
Se
Sc
e
extendsreuses
SM
Ac
opexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Top
Pr Metric
Measurable
Measure
describes
employsproduces
employsproduces
employsproduces
Structures
Spaces
Vec
belongs_to
contains
ms
is_ais_a
precedeshappens_before
is_a
redefinesinvokes
contains
contains
Digital Library Formal Ontology
16
Ontology: Applications Expand definition of minimal DL by
characterizing typical DL services in the context of “employs” and “produces”
relationships Use characterization to:
reason about how DL services can be built from other DL components
as well as be composed with other services through extension or reuse
17
Ontology: Applications
18
Ontology: Taxonomy of Services
BindingBrowsingCustomizingDisseminatingExpanding(query)FilteringRecommendingRequestingSearching
AnnotatingClassifyingClusteringEvaluatingExtractingIndexingLinkingLogging
MeasuringRating
Reviewing (peer)Surveying
Training (classifier)TranslatingVisualizing
ConservingConverting
Copying/ReplicatingTranslating (format)
AcquiringAuthoringCataloging
Crawling (focused)DescribingDigitizingHarvestingSubmitting
PreservationalCreational
AddValue
Repository-Building
Information SatisfactionServices
Infrastructure Services
19
Ic
Acquiring
universalcollection
C
DMCIndexing
DescribingCataloguing
Linking
Hypertext
Submitting
AuthoringDigitizing
doi
mskjp
p
e
e
describes
p
p
p
e
e
p
e
p
Composition of key infrastructure services
20
Composition of additional services
SearchingBrowsing
queryanchor
Society
actor
C, {doi, i I}
Recommending Filtering Binding Visualizing Expanding query
user model/expr query/category {doj, j J}
{dor, r R} {dof, f F}
biuk
InformationSatisfaction Services
spj query’
fundamental
Rating Training
Infrastructure
Services (Add_Value)
composite
Requesting
handle
p pp
e e e{(doi, acj, rij), i I, j }
p
e
e
p p p p p
e e
classCt
e ee e
e
p
e
Indexing
IC
p
e
transformer
e
21
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
22
ApproachDomain Concepts (theory)
DLArchitecture
instance of
ModelingLanguage(Meta-Model)
Model
used to compose instance of
abstracted from
represented by
interpreted as
represented by
interpreted as
instance of
instance of
Running
DL DL
Actors
“Real”World
“real” worldobject
Q
23
Part 2: Tools/Applications
5S Meta
Model5SGraph
DL Expert
DL Designer
5SL DL
Model
5SGen
Practitioner
Researcher
TailoredDL
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Logging ModuleXMLLog
24
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
25
5SL: a DL Modeling language Domain specific languages
Address a particular class of problems by offering specific abstractions and notations for the domain at hand
Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping.
XML-based realization of 5S Interoperability Use of many standard sub-languages (e.g., MIME
types, XML Schemas, UML notations)
26
5SL – The Minimal DL Metamodel
Index
Actor
Search Manager
Index Manager
Document
Collection Catalog
Metadata
Service
Manager
Interface Manager
Community
Event
Scenario
Service
Browsing Manager
User
Interface
Scenarios (Meta-) Model
Spatial
(Meta-) Model
Meta-Models
Meta-ModelsPrimitives
Stream
(Meta-)ModelStructural (Meta-) Model
Text AudioVideo Image
Societal (Meta-) Model
Retrieval
Model
uses
runs
receiver
Repository Manager
27
<document name=`ETD'>
<stream_enumeration>
<stream
value=`ETDText'>
<stream
value=`ETDAudio'>
...
</stream_enumeration>
<structured_stream>
%XMLSchema%
<structured_stream>
</document>
Example of Document declaration in theStructures Model
<Society>
<Actor>
<Community name='Patron‘/>
<Attribute name='name‘
type='String'/>
<Attribute name='ID‘
type='Integer'/>
</Community>
<Community name='Student'>
<Service>Converting</Service>
</Community>
<Community name='ETDReviewer'>
<Service>Reviewing</Service>
</Community>
<Community name='ETDCataloguer'>
<Service>Cataloguing</Service>
</Community>
</Actor>
………
Example of Actors declaration in theSocieties Model
<SERVICE name ='Searching'>
<SCENARIO name='SimpleSearching'>
<NOTE>Simple scenario for an NDLTD
site searching service</NOTE>
<EVENT>
<SENDER>Patron</SENDER>
<RECEIVER>InterfaceManager</RECEIVER>
<OPERATION name=SearchCriteria/>
<PARAMETER>collection</PARAMETER>
<PARAMETER>query</PARAMETER>
</EVENT>
<EVENT>
<SENDER>InterfaceManager</SENDER>
<RECEIVER>SearchManager</RECEIVER>
<OPERATION name='Search'/>
<PARAMETER>collection</PARAMETER>
<PARAMETER>query</PARAMETER>
</EVENT>
<EVENT>
<SENDER>SearchManager</SENDER>
<RECEIVER>InterfaceManager</RECEIVER>
<PARAMETER name='Results'>WtdSet
</PARAMETER>
</EVENT> ….
Example of Service declaration in theScenario Model
28
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
29
Help users model their own instances of a digital library (DL) in the 5S language (5SL).
A simple modeling process which enables rapid generation of digital libraries
Features 5SGraph loads and displays a metamodel in a structured toolbox. The structured editor of 5SGraph provides a top-down visual building
environment for the DL designer. 5SGraph produces syntactically correct 5SL files according to the visual
model built by the designer.
5SGraph: A DL Modeling Tool
30
Overview of 5SGraph
Workspace
(instance model)
Structured
toolbox
(metamodel)
31
5SGraph: Other Key Features Flexible and extensible architecture Reuse of models
Load, save, and change common (sub-)models Synchronization of views Enforcing of semantic constraints
32
5SGraph Evaluation: Usability Study
Task 1 Task 2 Task 3 Completion Rate (%) 100 100 100
Mean Task Time (min) 11.3 11.4 15.1 Mean Closeness to Expertise 0.483 0.752 0.712 Mean Goal Achievement (%) 97.4 97.4 98.2
0
1
2
3
4
5
6
7
8
9
10
Pre-Understanding Post-Understanding
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Satis faction
Usefulness
33
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
34
5SGen Version 1 -- MARIAN as the target system
Focused on rich structures: semantic networks Behavior attached to nodes/links
Version 2 -- Shifted for later work to componentized (ODL) approach Focused on scenarios/societies Structures/Spaces encapsulated within components
(e.g., relational tables, indexes)
35
5SGen – Version 2: ODL, Services, Scenarios
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SGen
36
5SGen Proof of Concept: prototyping
CITIDEL VIADUCT NDLTD Union Catalog BDBComp
37
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
38
XML-based DL Log Standard Log analysis is a source of information on:
How patrons really use DL services How systems behave while supporting user
information seeking activities Used to:
Evaluate and enhance services Guide allocation of resources
Common practice in the web setting Supported by web servers, proxy caches
DL Logging can be more detailed.
39
DL Logging Features
Captures high level user and system behaviors
Organized according to the 5S framework Hierarchical organization (XML-based) Centered on the notions of events
Record events related to initial user inputs and final system outputs
Help to understand user interactions and the perceived value of responses
40
The XML Log Format
Log
SessionId MachineInfo StatementTransaction Timestamp
SessionInfo RegisterInfoEvent ErrorInfo
Action
Search Browse StoreSysInfoUpdate
SearchBy QueryString CatalogCollection PresentationInfo
StatusInfo
Timeout
41
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
42
Describing Quality in Digital Libraries What’s a “good” digital library?
Central Concept: Quality! Hypotheses of this work:
Formal theory can help to define “what’s a good digital library” by:
New formalizations of quality indicators for DLs within our 5S framework
Contextualizing these indicators/measures within the Information Life Cycle
43
Quality DimensionsDL Concept Dimensions of Quality Digital object Accessibility
Pertinence Preservability Relevance Similarity Significance Timeliness
Metadata specification Accuracy Completeness Conformance
Collection Completeness Impact Factor
Catalog Completeness Consistency
Repository Completeness Consistency
Services Composability Efficiency Effectiveness Extensibility Reusability Reliability
44
Digital Objects: Accessibility A digital object is accessible by an DL
actor or patron, if it
1. exists in the DL collections
2. is retrievable from the repository
3. is not restricted from access by metadata on rights for an actor or actor’s society
45
Digital Objects: Pertinence
Inf(doi) = information carried by a digital object or any of its descriptions
IN(acj) = information need of an actor
Contextjk = an amalgam of societal factors which can impact the judgment of pertinence by acj at time k. Factors include time, place, the actor’s history of
interaction, task, and factors implicit in the interaction and ambient environment.
46
Digital Objects: Pertinence The pertinence of a digital object doi to a
user acj is an indicator function Pertinence(doi, acj): Inf(doi) IN(acj) Contextjk defined as: 1, if Inf(doi) is judged by acj to be informative
with regards to IN(acj) in context Contextjk;
0, otherwise
47
Digital Objects: Relevance
Relevance (doi,q) 1, if doi is judged by an external-judge to be relevant to q0, otherwise
Relevance Estimate Rel(doi,q) = doi
q / |doi| |q|
Objective, public, social notion Established by a general consensus in the field, not
subjective, private judgment by an actor with an information need
48
Metadata Specifications and Metadata Format: Completeness
Refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not.
Completeness(msx) = 1 - (no. of missing attributes
in msx/ total attributes of the schema to which msx
conforms)
49
Metadata Specifications and Metadata Format: Completeness OCLC NDLTD Union catalog
00. 10. 20. 30. 40. 50. 60. 70. 80. 91
GWUD LSU
VTETD
MIT
UBC
PHYSNET
VTINDIV
VANDERBILT
NCSU
USASK
PITT HKU
HUMBOLT
OCLC
BGMYU
DRESDEN
VIENNA
GATECH
ETSU USF
MUENCHEN
UTENN
CCSD
WATERLOO
NSYSU
LAVAL
UPSALLA
CALTECH
UCL
WagUniv
50
Metadata Specifications and Metadata Format: Conformance
An attribute attxy of a metadata specification msx is cardinally
conformant to a metadata format/standard if: it appears at least once, if attxy is marked as mandatory;
its value is from the domain defined for attxy; it does not appear more than once, if it is not marked as
repeatable.
Conformance(msx) = ((attribute attxy of msx)
degree of conformance of attxy)/ total attributes).
51
Metadata Specifications and Metadata Format: Conformance Based on ETD-MS
0. 75
0. 8
0. 85
0. 9
0. 95
1
GWUD
LSU
VTETD
MIT
UBC
PHYSNET
VTINDIV
VANDERBILT
NCSU
USASK
PITT HKU
HUMBOLT
OCLC
BGMYU
DRESDEN
VIENNA
GATECH
ETSU
USF
MUENCHEN
UTENN
CCSD
WATERLOO
NSYSU
LAVAL
UPSALLA
CALTECH
UCL
WagUniv
52
Services: Efficiency/ Effectiveness Effectiveness
Very common measures: Precision, Recall, F1, 10-precision, R-Precision
Other services may have different measures: e.g., Recommending, etc.
Efficiency let t(e) be the time of an event e let eix and efx be the initial and the final events of
service sex
For service sex, efficiency is defined as:
Efficiency(sex) = t(efx) - t(eix)
53
Services: Extensibility and Reusability
A service Y reuses a service X if the behavior of Y incorporates the behavior of X.
A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows
of events.
54
Services: Extensibility and Reusability (2) Macro-Reusability(Serv) = no. of reused
services/ total number of services
Micro-Reusability(Serv) = number of lines of code of managers that implement (run) reused services/ total lines of code
55
Services: Extensibility and Reusability
Service Component
Based
LOC for implementing
service
LOC reused from
component
Total LOC
Searching – Back-end Yes - 1650 1650
Search Wrapping No 100 - 100
Recommending Yes - 700 700
Recommend Wrapping No 200 - 200
Annotating – Back-end Yes 50 600 600
Annotate Wrapping No 50 - 50
Union Catalog Yes - 680 680
User Interface Service No 1800 - 1600
Browsing No 1390 - 1390
Comparing (objects) No 650 - 650
Marking Items No 550 - 550
Items of Interest No 480 - 480
Recent Searches/Discussions
No 230 - 230
Collections Description No 250 - 250
User Management No 600 - 600
Framework Code No 2000 - 2000
Total 8280 3630 11910
Macro-Reusability = 4/16 = 0.25Micro-Reusability = 3630 / 11910 = 0.304
56
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Similarity
Pertinence
AccuracyCompletenessConformance
Seeking
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Accessibility
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Preservability
Describing
Similarity
Significance
Quality and the Information Life Cycle
57
Quality Model: Evaluation Focus groups
3 librarians Major points
Focus on DLs not traditional libraries Some indicators may have more theoretical than
practical use in some contexts Liked minimalist approach Interesting and potentially useful mainly for
education and evaluation
58
Outline Motivation: the problem
Hypotheses and research questions Part 1:Theory
5S: introduction, formal definitions The formal ontology
Part 2: Tools/Applications Language Visualization Generation Logging
Part 3: Quality Conclusions, Future Work
59
Conclusions We have answered the almost 40-year-old challenge
of Licklider to build a unified CS / LIS theory by Proposing and formalizing the first comprehensive
formal framework for digital libraries Showed how to move from theory to practice by
Applying the framework to the problems of modeling, generating, and evaluating (by logging and assessing the quality of) digital libraries
Materializing these applications into languages, tools, formats, etc.
Explaining and evaluating these applications (usability studies, focus groups, prototyping, etc.)
60
Future Work Theory
Apply to formally describe other systems Complete formal definitions of all services with
further events Load axioms in knowledge base to automatically
assess quality of models (correctness, etc.) Applications/Tools
Language Make different versions uniform Extend with METS, less complex scenarios, society models New metamodels
Domain/application oriented (e.g., archaeology, education) For traditional libraries
61
Future Work Applications/Tools
Visualization Integration with other tools
through Wizard New visualizations Applying as educational tool
Generation Use of Web services Incorporation of Native XML repositories Improvement of Scenario Algorithms
Logging Promote use Consider privacy issues New actions Deal with scalability issues
62
Future Work Quality
Development of more usage-oriented indicators Current indicators are mostly system-oriented Focus on log format and evaluation
Development of Quality ToolKit (5SQual) for DL managers with following features: Mapping tool to map local log format to standard XML Log format Components to implement all indicators Visualization of data and indicators Broken into several logical pieces to be used in the different phases
of the Information Life Cycle
Others, e.g., personalization Create theories, tools, languages, methods for
personalization based on 5S
63
Questions/Discussion?
Thanks!