automatic generation of hypermedia

6

Click here to load reader

Upload: christian-l-macher

Post on 02-Jul-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Automatic generation of hypermedia

ANALYTICA CHIMICA ACTA

ELSEVIER Analytica Chimica Acta 348 (1997) 465-470

Automatic generation of hypermedia

Christian L. Macher, Andreas Gloor, Em6 Pretsch*

Deparrment of Organic Chemistry, Swiss Federal Institute of Technology, Universitiitstr: 16, CH-8092 Ziirich, Switzerland

Accepted 23 January 1997

Abstract

It is demonstrated that a hypermedia representation of data stored in a database file can be generated automatically. Not only data nodes containing information, but also navigation tools (buttons, menu items) with their program codes, are created by the system. The user only has to input the principle of organization in terms of chemical structures.

Keywords: Data mining; Hypermedia creation; Databases

1. Introduction

Today, databases providing access to a large variety of archived data are indispensable for chemical

research and development. Besides supporting the lookup of data by a manifold of sophisticated search

possibilities [l], they can also be used to predict properties of unknown compounds on the basis of stored values on related ones. Prediction modules for

miscellaneous properties have been described includ-

ing e.g., NMR chemical shifts [2], boiling points [3],

lipophilicity [4,5], and mutagenic activity [6]. Collec- tions providing such tools are called intelligent data-

bases. However, in spite of the availability of these advanced systems, classical databases still have one inherent drawback: they do not support associative

exploration, i.e., browsing through the data. There-

fore, no results can be found without formulating an explicit query. Hypermedia, on the other hand, are

*Corresponding author. Tel.: +41 1 632 2926; fax: +41 1 632

1164; e-mail: [email protected].

ooO3-2670/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved,

PII SOOO3-2670(97)00097-4

specifically suited for this kind of data inspection and

thus provide a welcome complement of databases. They allow the implementation of elaborate naviga-

tion structures over data nodes. As an example,

SpecTool@ [7-91, a hypermedia application for the spectra interpretation of organic compounds, contains about 2000 nodes (called cards or pages) and roughly

50 different jump possibilities from each of them, i.e., a total of about 100000 connections. Programs for

generating hypermedia, such as HyperCard or Tool- Book, offer high flexibility and many advantages over

classical expert system shells too. This has been shown by the successful hypermedia implementation

[lo] of information systems developed earlier by using such shells [ 111. A more recent example of a hyper- media, the highly popular World Wide Web demon- strates the possibilities as well as the limitations of a

large unstructured data collection. Thus, one inherent danger of great flexibility is that the manifold possi- bilities become confusing and the user loses orienta- tion, i.e., gets ‘lost in hyperspace’. Easily understood tree-like structures superimposed over the network

Page 2: Automatic generation of hypermedia

466 CL Macher et a~./A~ly~ica Chin&x Acta 348 (1997) 465470

Fig. 1. Schematic representation of the PropTool hyperspace. The entries on the horizontal axis represent the different properties, i.e., o&mot/water partition coefficient (log F), water solubility (s), vapor pressure (P,, activity coeffkient (y), and Henry’s law constant (H), with the possibility of adding further items. The entries of the vertical axis correspond to the substance classes. Their automatic generation for reference data and user data is described in this iaper.

8800000 75.6400 73.9900 58.3200 84.000# 58.3200 104.0~ 75.6400 ~14.~0 75.6400 134.0000 58.3200 144.0000 41 .oooo 134.0000 41 .oooo 114.0000

1 2 2 0 2 3 1 0 7 8 4 0 6 7 4 0 5 6 4 0 4 5 4 0 3 4 4 0 3 8 4 0

) <name> Se~%aldehyde

> ~P~RTlTiON.K~w~ 1.480

% eUAPOUR.PRESSURE~ 2.90

> ~AQUEOUS.SOLUBILITW 2.90

> <HENRV.KH> -1.39

> ~flCTlUlT~.l~2~3 1484

0 0 0 0 0.0000 0 0 0 0 0.0000 c 0 0 0 0.0000 c 0 0 0 0.0000 c 0 0 0 0.0000 c 0 0 0 0.0000 c 0 0 0 0.0000 c 0 0 0 0.0000 c 0 0 0

Fig. 2. An entry of the database used for automatic hypermedia HyperCard 2.3 was used on a Macintosh Quadra generation. 650 computer for the generation of hypermedia.

INPUT

-Input of substructures(s) with Chemlntosh@ -Name substructure(s) - Select Master File -Select Substructure File

SUBSTRUCTURESEARCH

Find all molecules in the Master File that contain the above defined substructure(s)

f

GENERATION OF STACKS -TWO stacks per substructure am created, one for Validated Data

and the other 0”s for User Data -The stacks are named according to the name of the subkwtwe

- The so created stacks am embedded in the navigation system

+ QENERATlON OF THE CARDS IN THE STACKS

Cards 818 created according to the number of molecules found in the Master File via the substructure search

I

Menus with the subatrudurs names are created - Buttons for a& substructure are created on a Navigation Card

RLLINQ THE CARDS WITH IN~RMAn~

- Pidures of the structures of molecules are drown -The dated property information is drown

Fig. 3. Flow chart of the hypermedia generation process.

help to avoid this. Since they define one parent for each node up to the root, orientation becomes straight- forward. For example, the data in SpecTool@ are structured according to three principles, namely, the spectroscopic methods, compound classes, and kind of data. They are connected by means of parallel trees which not only help the orientation within one set but also allow facile cross navigation.

One drawback of hypermedia, so far, is the tedious manual work needed to generate them and to fill data nodes. Therefore, it is desirable to automatically create hypermedia from data collections. This work presents a prototype system for performing this task and shows that both the navigation and the data representation can be generated by a program.

Page 3: Automatic generation of hypermedia

C.L. Macher et al. /Analytica Chimica Acta 348 (1997) 465470 467

3. Results and discussion

The program Hyperspace Generator described here is implemented within PropTool, a hypermedia appli- cation for estimating and accessing physicochemical parameters [ 121. The hyperspace of PropTool is orga- nized according to three principles: kind of data (validated reference data, user data, tools), property (octanoVwater partition coefficient, water solubility, vapor pressure, activity coefficient, Henry’s law con- stant) and chemical substance class (see Fig. 1). Hyperspace Generator automatically generates the structure of the data section of PropTool and writes the reference data into new nodes. To demonstrate its capabilities of organizing unstructured data, a data file with connectivity and property information of 245 molecules [5] was generated in a standardized format (SDfile, cf. Fig. 2) and used as input.

The flow chart of the hypermedia generation pro- cess is shown in Fig. 3. Only the substructures that

define the principle of organization (i.e., the com- pound classes in Fig. 1) have to be input manually with their drawings and names. The former are used for the substructure search, the latter for naming the new navigation tools, i.e., the buttons and menu items that are created automatically. The user interface for this input is shown in Fig. 4. With the button NewS- ubstructureFile, the input procedure is started whereas with DeleteDutuStucks, the previously generated hyperspace is deleted. Thus, the user has the possibi- lity to design the hyperspace in an iterative way. The DrawStructure button is used to enter a new substruc- ture drawn with the commercially available program, ChemIntosh@ [13] and transferred to Hyperspace Generator via clipboard. Both the connectivity table and the picture of the substructure are stored together with its name entered on the input card (Fig. 4). The upper part of this card allows to check the input: upon selecting a name, the corresponding structure is shown on the top left.

List of Substructure Names:

Name:

+lewSubstructureFile IleleteDataStacks

RddToSubFile PerformSearch q p DrawStructure Top Riiii:: i ‘?

Fig. 4. Input card of the Hyperspace Generator.

Page 4: Automatic generation of hypermedia

468 C.L. Macher et al./Analytica Chimica Acta 348 (1997) 465470

1ogP ReferenceData

Fig. 5. Navigation card, generated automatically by the Hyperspace Generator.

After ending the manual input, the database is selected and the hypermedia generation is started with

the Perj&mSearch button (Fig. 4). First, the master file (cf. Fig. 2) is searched for each structure entered. If an item exhibiting a substructure in question is

found, it is copied into a new file automatically

generated for this substructure. Of course, an entry can occur simultaneously in several files if it contains several of the desired substructures. The next step is to create two stacks for each substructure, namely, one

for the data found and another for later user entries (Reference and User Data, respectively, see Fig. 1). The substructure names as entered above are used for identification. In each stack, the necessary number of cards are generated to accommodate the items found (six entries per card). Then, a pull-down menu (not shown) and a card containing the buttons of the substructures are generated as navigation tools (see Fig. 5). The corresponding jump commands are auto- matically set. In a final step (cf. Fig. 3), the structures

and properties are written in the data cards, one of which is given in Fig. 6.

4. Conclusions

The results show that data items stored in a database can be automatically organized into structurally related groups that are then displayed in a hypermedia environment. The principle of organization must be

provided by a chemist, whereas all other steps includ- ing the generation of linking commands are performed automatically. The HyperCard environment is spe- cially suited for developing such an application because anything that can be done manually can also be performed by a program (script) within HyperTalk, its programming language. Although only a small prototype system is used here to demonstrate the principles, there is basically no limitation so that any large database can be searched with this proce-

Page 5: Automatic generation of hypermedia

CL Macher et al. /Analytica Chimica Acta 348 (1997) 465470 469

Partition Coefficient,log Kow, at 25 "C

0.790 2.870

Br

2.270

:ard 1 of 2 Top e P

Fig. 6. Example of an automatically generated data card.

dure. It is, therefore, conceivable that hypermedia presenting subsets of large factual or spectroscopic databases can be designed for special purposes. With the advent of the new programming language, Java, a runtime display generation locally or via intemet becomes also possible.

Acknowledgements

This work was partly supported by the Swiss Fed- eral Office of Environment, Forests and Landscape (BUWAL). Our special thanks go to Dr. D. Wegmann for reviewing this manuscript.

References

[ 11 P. Willett, Similarity and Clustering in Chemical Information

Systems, Wiley, New York, 1987.

[2] W. Bremser, Anal. Chim. Acta, 103 (1978) 355.

[31

[41

[51

bl

[71

PI

[91

[lOI

M.D. Wessel and P.C. Jurs, J. Chem. Inf. Comput. Sci., 35

(1995) 841.

N. Bodor, A. Harget and M.-J. Huang, J. Am. Chem. Sot.,

113 (1991) 9480.

N. Bodor and M.-J. Huang, J. Pharm. Sci., 81 (1992)

272.

X.H. Song, M. Xiao and R.Q. Yu, Computers and Chemistry,

18 (1994) 391.

M. Cad&h, M. Farkas, J.-T. Clerc and E. Pretsch, J. Chem.

Inf. Comput. Sci., 32 (1992) 286.

A. Gloor, M. Cadisch, T. Kocsis, R. Biirgin Schaller, H.-J.

Hediger, J.T. Clerc and E. Pretsch, Anal. Chim. Acta, 295

(1994) 93.

A. Gloor, M. Cadisch, R. Burgin Schaller, M. Farkas,

T. Kocsis, J.T. Clerc, E. Pretsch, R. Aeschimann, M.

Badertscher, T. Brodmeier, A. Ftirst, H.J. Hediger, M.

Junghans, M. Kubinyi, M.E. Munk, H. Schriber, D.

Wegmann, SpecTool: A Hypermedia Book for Structure

Elucidation of Organic Compounds with Spectroscopic

Methods, Chemical Concepts, D-69442 Weinheim, 1994,

1996.

B. Bourguignon, P. Vankeerberghen and D.L. Massart, J.

Chromatogr., 592 (1992) 51.

Page 6: Automatic generation of hypermedia

470 CL Macher et al. /Analytica Chimica Acta 348 (1997) 465470

[ 1 l] A. Peeters, L. Buydens, D.L. Massart and P.J. Schoenmakers, Chromatographia, 26 (1988) 101.

[12] Ch.L. Macher, R.P. Schwarzenbach, E. Pretsch, in: J. Gasteiger (Ed.), Software Development in Chemistry 10, Proceedings of the 10th Workshop, Computer in Chemistry,

Hochfilzen, 20-21.11.1995, Gesellschaft Deutscher Chemi- ker, Frankfort am Main, 1996, p. 97.

[13] Chemlntosh 3.4, SoftShell International Ltd. Grand Junction, co.