sept 20-21, 2001r. scott cost - cadip, umbc1 carrot ii collaborative agent-based routing and...
TRANSCRIPT
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 1
CARROT II
Collaborative Agent-based Routing and Retrieval of Text, Version 2CADIP Fall Research Symposium
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 2
Overview
A distributed, agent-based system for large scale, high bandwidth information retrieval and visualization. Carrot I, implemented ~1997,
demonstrated the distribution of queries to various backend systems through a single broker, using Telltale, with TKQML as a communication mechanism.
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 3
Outline
Project Review Goals Overview Issues Architecture
Progress Report
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 4
C2 Project Goals
Build a powerful, high-bandwidth distributed IR systemCreate a testbed for research in a variety of IR issuesFoster new and ongoing IR research at UMBC
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 5
Basic C2 Approach
A client submits a query to some agent in a distributed C2 system.That agent uses metadata about its collection and the collections around it to decide whether to handle or forward the query to another agent.Results are assembled, and returned to the client.
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 6
How does it work?
Single IR engine is replicated across multiple machinesEach engine gets a portion of the total document collectionEngines exchange metadata describing their collectionsEngines receive queries, and either answer or forward them as appropriate
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 7
Research Issues/Questions
Heterogeneity (information sources)Metadata (form, order, comparison)Query Management (routing, standing)Results FusionCorpus ManagementIntegration with Parallel Telltale, RAMA Index-based parallelism Storage-based parallelism
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 8
Flexible System
Form of system can change dramatically, based on: How system is distributed How metadata is distributed How queries are handled How fusion is handled Whether or not system adapts
dynamically to query performance and/or load…
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 9
Some example scenarios
Two peer agents, each managing a corpus (IR System is MG).
Each agent advertises metadata to the other. Queries directed at either, routed to appropriate agent.
Based on TREC WT10g Collection ~1,700,000 documents from the WWW N agents, one for each of the ~12,000 servers
represented in collection Topology of system inferred from link topology in
collection of web pages
An agent starts and runs a C2 system for a specific purpose.
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 10
C2 Architecture
C2 Agents Form the core of the C2 system
C2 Infrastructure Elements Provide effective communication and
control support
C2 Support Elements Control and provide access to system
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 11
C2 Agent
Java-based software agentCommunicates using the Jackal systemRuns a local corpus and metadata engine (currently MG)
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 12
Basic Node Architecture
Agent
Jackal Othernodes
IR EngineWrapper
DecisionInterface
IR System: Manages local corpus and metadata
IR System: Manages local corpus and metadata
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 13
C2 Infrastructure
Provides for efficient control of systemHierarchicalSeveral Types of Agent: Master Node Platform Cluster
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 14
Infrastructure
Master Agent
Node AgentControls one physical node
Next Node…
PlatformControls one JVM
Next Platform…
Cluster AgentControls one Jackal instance
Next Cluster…
C2 AgentC2 Agent
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 15
Infrastructure…
Infrastructure hierarchy allows for efficient propagation of control informationCommunication and coordination is localized to reduce overheadShape of tree can be modified to change performance
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 16
C2 Support
Master Controls the C2 system
ANS White pages communications support
Collection Manager Controls distribution of
documents/collections to C2 Agents
Logger Agent Logs system operational information
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 17
C2 Tools
Query Agent Supports the controlled presentation,
collection and analysis of large batches of queries
C2 System Visualizer Presents a graphical view of the flow
of queries through the system
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 18
C2 Tools: Visualizer
(screen shot)
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 19
For More Information …
For more details on the goals and design of the project, individuals are referred to documents on the Project site: http://acm.org/
~cost/carrot2/info.htm
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 20
3/6/12 Plan (From 9/2000)
3: Clear design, working prototype.6: Fully operational system, testing on real data.12: Publication ready results for one or more research questions. Tentative target of CIKM.50-75% complete: System still in test with scalability issues, design publications in press.
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 21
3/6/12 Plan (From 9/2001)
3: Exercise system and prepare initial results for publication.6: Expand system.12: To be determined.
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 22
External Publication Plans
WWW 2002Autonomous Agents 2002SIGIR 2002
Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 23
Academic Milestones
Monitoring and Control of a Distributed IR System M.S. Thesis, Srikanth Kallurkar (Fall ’01)
Integrating C2 as an Information Source for ITTALKS M.S. Project, Yogesh Nagappa (Fall ’01)
Integrating Telltale into the C2 System 691 Project, Jonathan Kessler and Matt Siegel (Fall ’01)
Visualization of a Distributed IR System 691 Project, Tom Laufert (Fall ’01)
Data Fusion in C2 Agents 691 Project, Mithun Sheshagiri (Fall ’01)
Query Caching in the C2 System M.S. Thesis, Hemali Majithia (Spring ’02)
A User-friendly interface to the C2 System Jacquelyn Nicole Winston, High School Intern (Spring ’01)