An effective search method for distributed information systems using a self-organizing information retrieval network

Download An effective search method for distributed information systems using a self-organizing information retrieval network

Post on 06-Jun-2016




0 download


<ul><li><p>An Effective Search Method for Distributed Information</p><p>Systems Using a Self-Organizing Information Retrieval Network</p><p>Kouichi Abe and Toshihiro Taketa</p><p>Department of Electrical and Information Engineering, Faculty of Engineering, </p><p>Yamagata University, Yonezawa, Japan 992-8510</p><p>Hiroshi Nunokawa</p><p>Faculty of Software and Information Science, Iwate Prefectural University, Morioka, Japa n 020-0173</p><p>Norio Shiratori</p><p>Research Institute of Electrical Communication, Tohoku University, Sendai, Japan 980-8 577</p><p>SUMMARY</p><p>In this paper the authors propose a search system</p><p>composed of a self-organizing information retrieval net-</p><p>work as a method to effectively search for information</p><p>managed in a distributed fashion. An information retrieval</p><p>network is a virtual network consisting of World Wide Web</p><p>(WWW) server hosts that have search functions. A search</p><p>system making use of the features of the authors method</p><p>can provide the following benefits: (1) information searches</p><p>on a newly added WWW server host; (2) information</p><p>searches that avoid a problem host; (3) information searches</p><p>from any host. In this paper the authors demonstrate the</p><p>validity of their idea by creating an information search</p><p>system based on their idea and performing experiments on</p><p>a university LAN. ' 2000 Scripta Technica, Electron</p><p>Comm Jpn Pt 1, 84(3): 2937, 2001</p><p>Key words: WWW; information retrieval network;</p><p>self-organization; agent; information search.</p><p>1. Introduction</p><p>Currently a significant amount of research is being</p><p>performed on information searches for information stored</p><p>in a distributed fashion on the World Wide Web (WWW).</p><p>Retrieval methods that use conventional information re-</p><p>trieval networks can only designate static paths to the host.</p><p>As a result, such methods are deficient in that they cannot</p><p>respond to information searches on a newly added host or</p><p>to problems on a network. Therefore, in this paper the</p><p>authors propose a search method composed of a self-organ-</p><p>izing information retrieval network in order to resolve the</p><p>above problems. Here, an information retrieval network</p><p>shall refer to a virtual network composed of various WWW</p><p>servers and hosts.* A search system for WWW information</p><p>created with this search method as its foundation will be</p><p>able to offer the following benefits as its self-organizing</p><p>features are used.</p><p>' 2000 Scripta Technica</p><p>Electronics and Communications in Japan, Part 1, Vol. 84, No. 3, 2001Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J82-B, No. 5, May 1999, pp. 809817</p><p>*A host running a WWW server.</p><p>29</p></li><li><p>(1) Searching for information newly added to a</p><p>WWW server host</p><p>A newly created WWW server host will automat-</p><p>ically be included into the information retrieval</p><p>n e t work by virtue of the composition of the self-</p><p>organizing information retrieval network, and the in-</p><p>formation on that WWW server host can then be</p><p>searched.</p><p>(2) Searching for information while avoiding</p><p>hosts with problems</p><p>In a WWW server host on a self-organizing in-</p><p>formation retrieval network, each WWW server host</p><p>itself can control the path information for the host. As</p><p>a result, even when a problem (e.g., host fails) arises</p><p>in the WWW server host in the system, an effective</p><p>information retrieval path can be determined by using</p><p>the estimates obtained from the search results (referred</p><p>to as the expected search value), and the information</p><p>search can be continued.</p><p>(3) Searching for information from any host</p><p>In a self-organizing information retrieval net-</p><p>work, each WWW server host manages by itself only</p><p>the WWW information in its own host in order to take</p><p>advantage of the self-organizing features. A database</p><p>is then created based on this information, and search</p><p>services are provided for the WWW information. Con-</p><p>sequently, the user can search for information on any</p><p>WWW server host on the system.</p><p>In this paper, the authors set up an Agent-based</p><p>Total Resource Access System (ATRAS) information</p><p>retrieval system designed based on search methods</p><p>composed of a self-organizing information retrieval</p><p>system in a university Local Area Network (LAN),</p><p>perform experiments to show the advantages obtained</p><p>using the search system for WWW information using</p><p>this method, and demonstrate the validity of this</p><p>system.</p><p>In Section 2 the basic technology for the authors</p><p>method, including agents, self-management of distrib-</p><p>uted information, and migration searches, is de-</p><p>scribed, and the information retrieval mechanism</p><p>using a self-organizing information retrieval system</p><p>and the authors method is explained. In Section 3 an</p><p>ATRAS information retrieval system designed to</p><p>prove the validity of the search method consisting of</p><p>a self-organizing information retrieval system is de-</p><p>scribed. In Section 4 information search experiments</p><p>using an ATRAS prototype system are performed in</p><p>order to demonstrate the validity of the proposed</p><p>method, and the results are described. In Section 5 the</p><p>authors consider the validity of the search system for</p><p>WWW information introduced by this method based</p><p>on the ATRAS experimental results. Section 6 offers a</p><p>conclusion and identifies future topics of study.</p><p>2. Self-Organization of an Information</p><p>Retrieval Network</p><p>In this section the authors first explain agents, self-</p><p>management of distributed information, and migration</p><p>searches, the basic technology necessary to effectively use</p><p>the self-organizing information retrieval network. Next the</p><p>authors describe a self-organizing information retrieval net-</p><p>work. Finally, the authors explain the information retrieval</p><p>mechanism using their method.</p><p>2.1. Agents</p><p>Considerable research is being done on information</p><p>searches using agents. However, the definition of an agent</p><p>varies with each different research project [310]. In Ref.</p><p>11, an agent is described as a computer system based in</p><p>hardware or software that is autonomous, social, adaptable,</p><p>and self-motivated. In Ref. 12 Internet agents are explained,</p><p>with autonomous, permanent, individual, cooperative,</p><p>adaptive, and dynamic mentioned as characteristics of an</p><p>agent.</p><p>In this paper, an agent shall be defined as a program</p><p>(or process) that is autonomous and social. Here, autono-</p><p>mous means that the agent determines its next action in</p><p>accordance with its own decisions. In other words, it oper-</p><p>ates independently of the user. Social means that the agents</p><p>can exchange data among themselves based on the particu-</p><p>lar communications protocols each agent uses (or via an</p><p>agent communications language). In addition, agents may</p><p>be equipped with features other than autonomous or social</p><p>behavior (for instance, dynamism).</p><p>In the authors method five types of agents are used,</p><p>as shown in Fig. 1. Each agent is explained below.</p><p>(i) InfoManager</p><p>There is one InfoManager on each WWW server host.</p><p>It manages the WWW information on the host and provides</p><p>search services for WWW information (refer to Section</p><p>Fig. 1. Agents.</p><p>30</p></li><li><p>2.2). It creates an information search catalog (WWW infor-</p><p>mation database) from existing WWW search systems and</p><p>in the same fashion from HyperText Markup Language</p><p>(HTML) documents and then manages it. The information</p><p>catalog (WWW Database) is created and managed for each</p><p>system page and user page.</p><p>(ii) InfoSeeker</p><p>InfoSeeker performs information searches by mov-</p><p>ing between WWW server hosts on an information retrieval</p><p>network and then gathers together the results (refer to</p><p>Section 2.3). Internally it has a search history table that</p><p>stores the WWW server hosts that have been searched. The</p><p>maximum number of retrieved hosts, the maximum number</p><p>of retrieved hits, the maximum retrieval time, the maximum</p><p>number of migration errors, and the maximum number of</p><p>search results are also stored as information retrieval pa-</p><p>rameters. Here, the maximum number of retrieved hosts is</p><p>the maximum value for the number of WWW server hosts</p><p>that can be moved by InfoSeeker. The maximum number of</p><p>retrieved hits is the maximum number of search results that</p><p>can be collected. The maximum retrieval time is the time</p><p>limit for InfoSeeker to perform a retrieval. The maximum</p><p>number of migration errors is the allowed number of fail-</p><p>ures for movement between WWW server hosts. The maxi-</p><p>mum number of search results is a value that restricts the</p><p>number of search results returned from a single information</p><p>catalog to InfoManager. Among these parameters, all ex-</p><p>cept the maximum number of search results are used as</p><p>conditions for the termination of information retrieval.</p><p>(iii) SeekersManager</p><p>SeekersManager manages the InfoSeeker informa-</p><p>tion retrieval parameters and verification information.</p><p>(iv) GateKeeper</p><p>GateKeeper makes decisions regarding security man-</p><p>agement and host destinations. It also manages the list of</p><p>WWW server hosts (Hosts List) that InfoSeeker can move</p><p>to next. It creates the information retrieval network using</p><p>this host list (refer to Section 2.4). The host list is obtained</p><p>from the default GateKeeper only when GateKeeper is</p><p>launched for the very first time. There are several default</p><p>GateKeepers on the information retrieval network stored</p><p>for each individual GateKeeper. In addition, the informa-</p><p>tion retrieval network consisting of one GateKeeper as its</p><p>foundation is referred to as a domain. An information</p><p>retrieval framework (the foundation of an information re-</p><p>trieval environment using the authors method) can be</p><p>expanded by connecting the GateKeepers in several do-</p><p>mains (refer to Fig. 2).</p><p>(v) Communicator</p><p>Communicator provides the user interface. It receives</p><p>search requests from the user, then issues search requests to</p><p>the SeekersManager and creates a list of the search results</p><p>for the user.</p><p>2.2. Self-management of distributed</p><p>information</p><p>Self-management of distributed information refers</p><p>to an information management method in which each</p><p>WWW server host manages only the WWW information</p><p>in that host, creates the information catalog, and provides</p><p>search functions for WWW information. This method</p><p>forms the foundation for the creation of the information</p><p>retrieval system that can make use of a distributed environ-</p><p>ment (load distribution, resource distribution, and so on).</p><p>This is realized through the management of WWW server</p><p>information and agents (InfoManager) that have search</p><p>functions.</p><p>2.3. Migration searches</p><p>Migration refers to mechanism in which an agent</p><p>itself moves to a remote host and then operates at that</p><p>destination. A migration search is a search method in which</p><p>an agent (InfoSeeker) moves between each WWW server</p><p>host on the information retrieval network while at the same</p><p>time performing information retrieval on each host and</p><p>collecting the search results. In addition, information re-</p><p>trieval using migration searches can be made parallel by</p><p>using more than one agent (InfoSeeker) during one search.</p><p>2.4. Self-organizing information retrieval</p><p>network</p><p>In this paper the information retrieval network is a</p><p>virtual network set up between each WWW server host</p><p>based on the self-management of distributed information.</p><p>Fig. 2. Information retrieval framework.</p><p>31</p></li><li><p>The agent (InfoSeeker) has a function that automatically</p><p>records the WWW server host at the migration point of</p><p>origin when it migrates from a WWW server host not in the</p><p>host list (the path table to the WWW server host that</p><p>InfoSeeker will move to next) in a WWW server host at</p><p>the migration destination. In this paper, this is referred</p><p>to as the self-organization of an information retrieval</p><p>network. This is realized through the following mecha-</p><p>nism in the agent (GateKeeper) that manages the host</p><p>list. If the host list managed by the agent (GateKeeper)</p><p>of host A is Alist and the host list managed by the agent</p><p>(GateKeeper) of host B is Blist, then Alist and Blist can be</p><p>represented as follows:</p><p>Alist = {a1, w1, . . . , am, wm} (m: the number of hosts)</p><p>Blist = {b1, w1, . . . , bn, wn} (n: the number of hosts)</p><p>Here, ai i 1, . . . , m and bj i 1, . . . , n are the WWW</p><p>server host names to which the agent (InfoSeeker) perform-</p><p>ing the migration search can move. wi i 1, . . . , m andwj j 1, . . . , n are the estimates (hereafter referred to asthe expected search values) for which the search results are</p><p>obtained using the WWW server hosts ai and bj. If we set</p><p>the total amount of information that matches all of the</p><p>search requests up to the present time in the WWW server</p><p>host a to Ra, and the total amount of migration performed</p><p>to the WWW server host a from the current host to Ma, then</p><p>the expected search value wa for the migration destination</p><p>WWW server host a can be obtained from wa Ra /Ma. As</p><p>a result, the expected search value varies depending on the</p><p>results of the migration search. The number of hosts (m and</p><p>n) for which migration can occur from the current host is</p><p>referred to as the degree of self-organization. As the degree</p><p>increases, the number of candidates that can be selected as</p><p>the next host rises, and the possibility of finding the search</p><p>information grows larger.</p><p>If the next destination host Bout from the host B is the</p><p>host A, then Bout can be expressed as</p><p>Furthermore, if the partner host connected to host A (in this</p><p>instance host B) is Ain, then Ain can be expressed as</p><p>Here,</p><p>(:= means substitution)</p><p>refers to the self-organization of the information retrieval</p><p>network. In other words, the self-organization of the infor-</p><p>mation retrieval network means that the connections be-</p><p>tween each host are set up automatically. The expected</p><p>search value for a newly added host uses the average value</p><p>for the expected search values for the total number of hosts</p><p>already listed in the host table. Furthermore, when the</p><p>number of hosts in the host table exceeds the maximum</p><p>allowed number of records, hosts are erased in sequence</p><p>from the host with the smallest expected search value until</p><p>the maximum allowed number of records is no longer</p><p>exceeded. The decision mechanism for the next migration</p><p>destination host is as follows:</p><p>(1) The expected search value for each host in the host</p><p>table is calculated.</p><p>(2) The selection probability S for each host is calcu-</p><p>lated from the expected search value using the following</p><p>equation.</p><p>(3) The next migration destination host is determined</p><p>by making the rate at which each host is selected equal to</p><p>the selection probability.</p><p>This method avoids concentrating the migration des-</p><p>tination hosts in one area. In addition, compared to a deci-</p><p>sion method using simple random numbers, this method</p><p>can choose as the next host one with a high probability of</p><p>yielding search results. The next topic involves proving the</p><p>validity of this decision method.</p><p>2.5. Information retrieval mechanism using a</p><p>self-organizing information retrieval</p><p>network</p><p>Figure 3 shows the framework for information</p><p>retrieval using the self-organizing information retrieval</p><p>network described in Section 2.4. The operation of the</p><p>network will be explained step by step (numbers in</p><p>parentheses correspond to the numbers in Fig. 3). Note</p><p>also that the explanation uses the agent names given in</p><p>Section 2.1.</p><p>(1) The user requests information retrieval from</p><p>Communicator.</p><p>(2) Comm...</p></li></ul>


View more >