Exploring Digital Libraries: Integrating Browsing, Searching, and Visualization

Download Exploring Digital Libraries: Integrating Browsing, Searching, and Visualization

Post on 08-Jan-2016

25 views

Category:

Documents

0 download

DESCRIPTION

Exploring Digital Libraries: Integrating Browsing, Searching, and Visualization. Paper by: Rao Shen, Naga Srinivas Vemuri, Weiguo Fan, Ricardo da S. Torres, Edward A. Fox. Slides by fox@vt.edu http://fox.cs.vt.edu with some modifications by lillian.cassel@villanova.edu - PowerPoint PPT Presentation

TRANSCRIPT

Exploring Digital Libraries: Integrating Browsing, Searching, and VisualizationPaper by: Rao Shen, Naga Srinivas Vemuri, Weiguo Fan, Ricardo da S. Torres, Edward A. FoxSlides by fox@vt.edu http://fox.cs.vt.edu with some modifications by lillian.cassel@villanova.eduOriginal version presented at JCDL 2006Acknowledgements (Selected)Sponsors: NSF grant ITR-0325579, ASOR, CWRU, ETANA, Vanderbilt U., Virginia TechFaculty/Staff: Lillian Cassel, Debra Dudley, Manuel Perez, VT (Former) Students: Marcos A. Gonalves, Doug Gorton, Aaron Krowne, Ming Luo, IntroductionWhats exploring?searching, browsing, investigating, studying, or analyzing for purposes of discovery, pursuing truth or facts about somethingAre browsing and searching duals or can they be converted to each other when certain conditions are met? Can we generalize these DL exploring services within a formal DL framework?Can the formal generalization guide development of exploring services for domain focused DLs?Related Work on Integrating Services in DLshas an exampleI3RsystemsIn 1980sfound inRABBITintegrating searching and browsingsystemsIn 1990ssystemsIn 2000sCODERDataWebhas an examplePESTOSenseMakerhas an exampleMIXScentTrailsBBQODLMARIANGeneralize DL exploring services such as browsing, searching, clustering, and visualization Exploration Space (Espa) is a SpaceEspa=(Q, Contents, OP_Set)Q is a set of conceptual representations for user information needsContents: associated with collection COP_Set is a set of operations on Q and Contents{OPviz, OPclu, OPs, OPb} OP_SetExploring Services FormalizationSample OP_Set: {OPviz , OPclu, OPs, OPb}OPviz: maps a set of digital objects to a visual markOPclu: gets similarity of a pair of subsets of collection and their associated contentsOPs: associates a query with a digital object and its contentsOPb: associates a traverse link with contents of the target node (i.e., follows a hypertext link)Exploring Services Formalization (Cont.)An Exploring Service (Eser) is a set of scenarios over an exploration space (Espa).Eser=(sc1, sc2, , sci, , scn), where sci is a sequence of eventseach event is associated with one or more of the operations in EspaExploring Services Formalization (Cont.)eiOP_SetSearching: OpsBrowsing: OpbClustering: OpcluVisualization: OpvizState DiagramExploring Services Formalization (Cont.)Reading the paperOverview firstWhat is this paper about?What is the main point or are the main points?What is the structure of the paper?Is this what you would expect to see in any well-organized conference paper?Are there sections that are specific to this project?Related workWhat is the oldest work cited?What is the most recent work cited?How large a body of work contributed to this project?How closely related are the works that this paper cites?Is there a good reason for each reference?How many of the cited works are by the same author (or some of the same authors) or from the same research laboratory?Definitions and notationsWhat specific terms are defined?Are these general terms that have particular meanings in this paper or are they new terms with no known meaning?Are there general terms defined, or notations that provide shorthand for use in the later discussions?List the terms. Discuss the meaning of each with a classmate. Is anything unclear? Try to answer each others questions or formulate a question for the class to address together.Note: the reference #10 is one we read earlier in the semester.Definition 2: A structure is a tuple (G,L, F), where G=(V,E) is a directed graph with vertex set V and edge set E, L is a set of label values, and F is a labeling function F: (V E) LDefinition 16: A digital object is a tuple do = (h, SM, ST, Structured-Streams) whereh H, where H is a set of universally unique handles (labels)SM = {sm1, sm2, , smn} is a set of streamsST = {st1, st2, , stm} is a set of structural metadata specifications;StructuredStreams={stsm1, stsm2, , stsmp} is a set of StructuredStream functions defined from the streams in the SM set (the second component) of the digital object and from the structures in the ST set (the third component.)The operationsExactly what operations are defined and are of interest in this paper?What relationships exist between and among the operations defined?New understandingThe paper states: Our theory-based approach to describing DL exploring services allows us to understand browsing and searching in a new way.What are all the exploring services discussed in the paper? How are these explored in the context of the ETANA-DL (which we initially looked at early in the semester). The Authors presentationThe paper as presented by the authorSlides provided by Dr. Edward A. FoxExploring Digital Libraries: Integrating Browsing, Searching, and VisualizationExcerpt fromJCDL 2006, Chapel Hill, NC, June 12, 2006Rao Shen, Naga Srinivas Vemuri, Weiguo Fan, Ricardo da S. Torres, and Edward A. Foxfox@vt.edu http://fox.cs.vt.eduSome adaptations by lillian.cassel@villanova.eduFor CSC 9010-Special Topics - Digital Libraries and other Web-based information presentationAcknowledgements (Selected)Sponsors: NSF grant ITR-0325579, ASOR, CWRU, ETANA, Vanderbilt U., Virginia TechFaculty/Staff: Lillian Cassel, Debra Dudley, Manuel Perez, VT (Former) Students: Marcos A. Gonalves, Doug Gorton, Aaron Krowne, Ming Luo, IntroductionWhats exploring?searching, browsing, investigating, studying, or analyzing for purposes of discovery, pursuing truth or facts about somethingAre browsing and searching duals or can they be converted to each other when certain conditions are met? Can we generalize these DL exploring services within a formal DL framework?Can the formal generalization guide development of exploring services for domain focused DLs?Related Work on Integrating Services in DLshas an exampleI3RsystemsIn 1980sfound inRABBITintegrating searching and browsingsystemsIn 1990ssystemsIn 2000sCODERDataWebhas an examplePESTOSenseMakerhas an exampleMIXScentTrailsBBQODLMARIANGeneralize DL exploring services such as browsing, searching, clustering, and visualization Exploration Space (Espa) is a SpaceEspa=(Q, Contents, OP_Set)Q is a set of conceptual representations for user information needsContents: associated with collection COP_Set is a set of operations on Q and Contents{OPviz, OPclu, OPs, OPb} OP_SetExploring Services FormalizationSample OP_Set: {OPviz , OPclu, OPs, OPb}OPviz: maps a set of digital objects to a visual markOPclu: gets similarity of a pair of subsets of collection and their associated contentsOPs: associates a query with a digital object and its contentsOPb: associates a traverse link with contents of the target node (i.e., follows a hypertext link)Exploring Services Formalization (Cont.)An Exploring Service (Eser) is a set of scenarios over an exploration space (Espa).Eser=(sc1, sc2, , sci, , scn), where sci is a sequence of eventseach event is associated with one or more of the operations in EspaExploring Services Formalization (Cont.)eiOP_SetSearching: OpsBrowsing: OpbClustering: OpcluVisualization: OpvizState DiagramExploring Services Formalization (Cont.)Theory-based approach to describing DL Exploring Services guides us to design and implement exploring services for ETANA-DLMulti-dimensional browsingSearching and browsing integrationVisualizationUsability evaluationAn Integrated DLEtana brings together several separate and different collections of materials into an integrated DL.Virtual Nimrin (http://www.case.edu/affil/nimrin/menu/nimrin.htm)Madaba Plains (http://www.madabaplains.org/home.html)Lahav Website (http://www.cobb.msstate.edu/dig/lahav/)Megiddo (http://www.tau.ac.il/humanities/archaeology/megiddo/index.html)And othersETANA-DL approachApplying and extending Digital Library (DL) techniques to solve key problems: making primary data available, data preservation, and interoperabilityModeling archaeological information systems using 5S to better understand the domain and design the system and the supporting servicesRapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks:eliciting requirementsrefining metamodel and union schemamodeling sitesmappingharvestingproviding useful servicesETANA-DL ArchitectureDigBase and DigKitLahavNimrinUmayriHisbanMegiddoJalulNew SitesDATABASEWRAPPERSETANA-DLUNIONCATALOGSearchUSERINTERFACEBrowseRecommendNotePersonalizeReviewVisualizationsArchaeologySpecificWork in progressETANA-DL Websitehttp://digbase.etana.org:8080/etana/servlet/StartExploring Service in ETANA-DLMulti-dimensional BrowsingSearching and Browsing IntegrationThe Important PointThese are independent digital libraries or databasesThe idea is to give an appearance of a single, integrated site with access to all the information in all the sources.Harvesting -- OAI- PMHThen how to make it all appear like one collection of materials?DL IntegrationWhat is DL IntegrationHide distributionHide heterogeneityEnable autonomy of individual componentWhy Integrationisland-DLsinability to seamlessly and transparently access knowledge across DLsUse various autonomous DLs in concert3 new sites2 new types of artifactsEtanaViz: Initial InterfaceEtanaViz: Bone records from NimrinEtanaViz: Total Number of Animal Bones across Nimrin Culture Phrases EtanaViz: Percentages of Animal Bones across Nimrin Culture Phrases Impression about ETANA-DL services BrowseSearchEtanaVizSave navigation path(SNP)Search within browsingcontext (SWBC)4.04.04.04.54.5To answer these question, we again turn to 5S.CODER: A Retrieval and Hypertext System using SGML and a Lexicon A Testbed for Artificial Intelligence Methods in Information Retrieval Working Toward a Comprehensive Testbed for AI in IR Proposed and supervised development of the COmposite Document Expert/extended/effective Retrieval System, 1985-92, which was used as a testbed for the study of artificial intelligence concepts in the field of information retrieval, and applied to electronic mail digests, navy messages, and literature (and thesaurus data) on cardiology. Expert-Based Retrieval A Knowledge-Based System for Composite Document Analysis and Retrieval Marian: Proposed and supervised development since 1991 of the MARIAN system, an alternative to searching on the VTLS system for library catalog data. The campus production service version of this runs on a collection of PC and NeXTstep machines in the Computing Center and is designed to support scores of simultaneous users. It builds upon data and code from the CED lexicon and CODER (both developed here in the 1980s). With support from NLM, and new support from NSF, it is being revamped in Java to become a key part of our research on digital libraries. URL: http://www.dlib.vt.edu/products/marian.html MARIAN is an indexing, search, and retrieval system optimized for digital libraries ODL: Open Digital Libraries (ODLs) are systems built as networks of extended Open Archives. ===========================================The RB++ categorizes the collection offline and uses a uniform category structure to present overviews of the collection and the retrieval results. It provides visualized category overviews of an information space and allows dynamic filtering and exploration of the result set by tightly coupling the browsing and searching functions.Cat-a-Cone used ConeTree to display the category labels of the documents retrieved, while the retrieved documents are organized as pages in a WebBook. Hieraxes, in combination with a grid display, offer a simple approach to searching result sets by using categorical and hierarchical axes. Users can see an overview by color-coded dots or bar charts arranged in a grid and organized by familiar labeled categories. They can probe further by zooming in on desired categories or switching to another hierarchical variable. Grouper was a dynamic clustering interface to web search results. It introduced the Suffix Tree Clustering (STC) algorithm. Kartoo is a web interface organizing search results retrieved from relevant web search engines by topics, that displays them on a 2-dimensional map. Theoretically, Kartoo provides a node-link graph. Flamenco: investigating how to build an intuitive interface for exploration and discovery within information collections using Hierarchical Faceted Categories (HFC)Do not read, explain, not the letterTo answer these question, we again turn to 5S.CODER: A Retrieval and Hypertext System using SGML and a Lexicon A Testbed for Artificial Intelligence Methods in Information Retrieval Working Toward a Comprehensive Testbed for AI in IR Proposed and supervised development of the COmposite Document Expert/extended/effective Retrieval System, 1985-92, which was used as a testbed for the study of artificial intelligence concepts in the field of information retrieval, and applied to electronic mail digests, navy messages, and literature (and thesaurus data) on cardiology. Expert-Based Retrieval A Knowledge-Based System for Composite Document Analysis and Retrieval Marian: Proposed and supervised development since 1991 of the MARIAN system, an alternative to searching on the VTLS system for library catalog data. The campus production service version of this runs on a collection of PC and NeXTstep machines in the Computing Center and is designed to support scores of simultaneous users. It builds upon data and code from the CED lexicon and CODER (both developed here in the 1980s). With support from NLM, and new support from NSF, it is being revamped in Java to become a key part of our research on digital libraries. URL: http://www.dlib.vt.edu/products/marian.html MARIAN is an indexing, search, and retrieval system optimized for digital libraries ODL: Open Digital Libraries (ODLs) are systems built as networks of extended Open Archives. ===========================================The RB++ categorizes the collection offline and uses a uniform category structure to present overviews of the collection and the retrieval results. It provides visualized category overviews of an information space and allows dynamic filtering and exploration of the result set by tightly coupling the browsing and searching functions.Cat-a-Cone used ConeTree to display the category labels of the documents retrieved, while the retrieved documents are organized as pages in a WebBook. Hieraxes, in combination with a grid display, offer a simple approach to searching result sets by using categorical and hierarchical axes. Users can see an overview by color-coded dots or bar charts arranged in a grid and organized by familiar labeled categories. They can probe further by zooming in on desired categories or switching to another hierarchical variable. Grouper was a dynamic clustering interface to web search results. It introduced the Suffix Tree Clustering (STC) algorithm. Kartoo is a web interface organizing search results retrieved from relevant web search engines by topics, that displays them on a 2-dimensional map. Theoretically, Kartoo provides a node-link graph. Flamenco: investigating how to build an intuitive interface for exploration and discovery within information collections using Hierarchical Faceted Categories (HFC)Do not read, explain, not the letter

Recommended

View more >