an effective search method for distributed information systems using a self-organizing information...

Download An effective search method for distributed information systems using a self-organizing information retrieval network

Post on 06-Jun-2016




0 download

Embed Size (px)


  • An Effective Search Method for Distributed Information

    Systems Using a Self-Organizing Information Retrieval Network

    Kouichi Abe and Toshihiro Taketa

    Department of Electrical and Information Engineering, Faculty of Engineering,

    Yamagata University, Yonezawa, Japan 992-8510

    Hiroshi Nunokawa

    Faculty of Software and Information Science, Iwate Prefectural University, Morioka, Japa n 020-0173

    Norio Shiratori

    Research Institute of Electrical Communication, Tohoku University, Sendai, Japan 980-8 577


    In this paper the authors propose a search system

    composed of a self-organizing information retrieval net-

    work as a method to effectively search for information

    managed in a distributed fashion. An information retrieval

    network is a virtual network consisting of World Wide Web

    (WWW) server hosts that have search functions. A search

    system making use of the features of the authors method

    can provide the following benefits: (1) information searches

    on a newly added WWW server host; (2) information

    searches that avoid a problem host; (3) information searches

    from any host. In this paper the authors demonstrate the

    validity of their idea by creating an information search

    system based on their idea and performing experiments on

    a university LAN. ' 2000 Scripta Technica, Electron

    Comm Jpn Pt 1, 84(3): 2937, 2001

    Key words: WWW; information retrieval network;

    self-organization; agent; information search.

    1. Introduction

    Currently a significant amount of research is being

    performed on information searches for information stored

    in a distributed fashion on the World Wide Web (WWW).

    Retrieval methods that use conventional information re-

    trieval networks can only designate static paths to the host.

    As a result, such methods are deficient in that they cannot

    respond to information searches on a newly added host or

    to problems on a network. Therefore, in this paper the

    authors propose a search method composed of a self-organ-

    izing information retrieval network in order to resolve the

    above problems. Here, an information retrieval network

    shall refer to a virtual network composed of various WWW

    servers and hosts.* A search system for WWW information

    created with this search method as its foundation will be

    able to offer the following benefits as its self-organizing

    features are used.

    ' 2000 Scripta Technica

    Electronics and Communications in Japan, Part 1, Vol. 84, No. 3, 2001Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J82-B, No. 5, May 1999, pp. 809817

    *A host running a WWW server.


  • (1) Searching for information newly added to a

    WWW server host

    A newly created WWW server host will automat-

    ically be included into the information retrieval

    n e t work by virtue of the composition of the self-

    organizing information retrieval network, and the in-

    formation on that WWW server host can then be


    (2) Searching for information while avoiding

    hosts with problems

    In a WWW server host on a self-organizing in-

    formation retrieval network, each WWW server host

    itself can control the path information for the host. As

    a result, even when a problem (e.g., host fails) arises

    in the WWW server host in the system, an effective

    information retrieval path can be determined by using

    the estimates obtained from the search results (referred

    to as the expected search value), and the information

    search can be continued.

    (3) Searching for information from any host

    In a self-organizing information retrieval net-

    work, each WWW server host manages by itself only

    the WWW information in its own host in order to take

    advantage of the self-organizing features. A database

    is then created based on this information, and search

    services are provided for the WWW information. Con-

    sequently, the user can search for information on any

    WWW server host on the system.

    In this paper, the authors set up an Agent-based

    Total Resource Access System (ATRAS) information

    retrieval system designed based on search methods

    composed of a self-organizing information retrieval

    system in a university Local Area Network (LAN),

    perform experiments to show the advantages obtained

    using the search system for WWW information using

    this method, and demonstrate the validity of this


    In Section 2 the basic technology for the authors

    method, including agents, self-management of distrib-

    uted information, and migration searches, is de-

    scribed, and the information retrieval mechanism

    using a self-organizing information retrieval system

    and the authors method is explained. In Section 3 an

    ATRAS information retrieval system designed to

    prove the validity of the search method consisting of

    a self-organizing information retrieval system is de-

    scribed. In Section 4 information search experiments

    using an ATRAS prototype system are performed in

    order to demonstrate the validity of the proposed

    method, and the results are described. In Section 5 the

    authors consider the validity of the search system for

    WWW information introduced by this method based

    on the ATRAS experimental results. Section 6 offers a

    conclusion and identifies future topics of study.

    2. Self-Organization of an Information

    Retrieval Network

    In this section the authors first explain agents, self-

    management of distributed information, and migration

    searches, the basic technology necessary to effectively use

    the self-organizing information retrieval network. Next the

    authors describe a self-organizing information retrieval net-

    work. Finally, the authors explain the information retrieval

    mechanism using their method.

    2.1. Agents

    Considerable research is being done on information

    searches using agents. However, the definition of an agent

    varies with each different research project [310]. In Ref.

    11, an agent is described as a computer system based in

    hardware or software that is autonomous, social, adaptable,

    and self-motivated. In Ref. 12 Internet agents are explained,

    with autonomous, permanent, individual, cooperative,

    adaptive, and dynamic mentioned as characteristics of an


    In this paper, an agent shall be defined as a program

    (or process) that is autonomous and social. Here, autono-

    mous means that the agent determines its next action in

    accordance with its own decisions. In other words, it oper-

    ates independently of the user. Social means that the agents

    can exchange data among themselves based on the particu-

    lar communications protocols each agent uses (or via an

    agent communications language). In addition, agents may

    be equipped with features other than autonomous or social

    behavior (for instance, dynamism).

    In the authors method five types of agents are used,

    as shown in Fig. 1. Each agent is explained below.

    (i) InfoManager

    There is one InfoManager on each WWW server host.

    It manages the WWW information on the host and provides

    search services for WWW information (refer to Section

    Fig. 1. Agents.


  • 2.2). It creates an information search catalog (WWW infor-

    mation database) from existing WWW search systems and

    in the same fashion from HyperText Markup Language

    (HTML) documents and then manages it. The information

    catalog (WWW Database) is created and managed for each

    system page and user page.

    (ii) InfoSeeker

    InfoSeeker performs information searches by mov-

    ing between WWW server hosts on an information retrieval

    network and then gathers together the results (refer to

    Section 2.3). Internally it has a search history table that

    stores the WWW server hosts that have been searched. The

    maximum number of retrieved hosts, the maximum number

    of retrieved hits, the maximum retrieval time, the maximum

    number of migration errors, and the maximum number of

    search results are also stored as information retrieval pa-

    rameters. Here, the maximum number of retrieved hosts is

    the maximum value for the number of WWW server hosts

    that can be moved by InfoSeeker. The maximum number of

    retrieved hits is the maximum number of search results that

    can be collected. The maximum retrieval time is the time

    limit for InfoSeeker to perform a retrieval. The maximum

    number of migration errors is the allowed number of fail-

    ures for movement between WWW server hosts. The maxi-

    mum number of search results is a value that restricts the

    number of search results returned from a single information

    catalog to InfoManager. Among these parameters, all ex-

    cept the maximum number of search results are used as

    conditions for the termination of information retrieval.

    (iii) SeekersManager

    SeekersManager manages the InfoSeeker informa-

    tion retrieval parameters and verification information.

    (iv) GateKeeper

    GateKeeper makes decisions regarding security man-

    agement and host destinations. It also manages the list of

    WWW server hosts (Hosts List) that InfoSeeker can move

    to next. It creates the information retrieval network using

    this host list (refer to Section 2.4). The host list is obtained

    from the default GateKeeper only when GateKeeper is



View more >