The Functional Data Model as the Basis for an Enriched Database Query Language

Download The Functional Data Model as the Basis for an Enriched Database Query Language

Post on 02-Aug-2016

219 views

Category:

Documents

6 download

TRANSCRIPT

Journal of Intelligent Information Systems 12, 139164 (1999)c 1999 Kluwer Academic Publishers. Manufactured in The Netherlands.The Functional Data Model as the Basisfor an Enriched Database Query LanguageROBERT AYRES r.ayres@rmcs.cranfield.ac.ukDepartment of Informatics and Simulation, Cranfield University, Royal Military College of Science, Shrivenham,Swindon, Wiltshire SN6 8LA, UKEditors: Peter M.D. Gray, Peter J.H. King, Larry KerschbergAbstract. Conventional database languages rely on the user specifying what relations are to be used whenevaluating a query. Consequently they preclude queries which involve searching for unspecified connections orassociations in the database. In this paper we present Hydra, a functional language with all the facilities to define,update and query a database, which also enables users to carry out associational queries. Hydra uses a graph-based data model in which nodes represent values or entities and arcs the relationships between them. Associationalfacilities are made possible by the provision of built-in functions which find paths through the database graph.The mappings between sets of nodes in the database graph are represented as functions at the Hydra languagelevel and it is as lists of such functions that associational results are returned. The use of a functional language isimportant since such languages allow functions to be returned as results; such an approach could not be adoptedin a logic-based language which would not permit predicates to be returned as answers. Hydra also allows usersto define general computational functions which are not considered to form part of the database. This use of twosets of functions achieves a computationally complete system which extends the query power of previous databasesystems without compromising their expressive or query power.Keywords: functional data model, functional programming, graph databases, semantic networks1. IntroductionIn current database query languages it is not possible to express a query that correspondsto a question such as:Is there a connection between John and Mary?One of the reasons for this is that the record-oriented data model used by most systemslimits the semantic expressiveness of the database. A separate problem is that standardquery languages rely on the user specifying the relations to use in evaluating the query. Itis the nature of the query above that the user does not know.There are a number of domains where such open-ended queries need to be supported.Biologists model eco-systems in terms of food-webs and are often interested in the waythat species can be linked by the food chain or shared habitats. Likewise, in sociologysocial network analysis is concerned with studying groups of actors (such as people, com-panies, clubs, etc.) in terms of networks of relationships and associations (Wasserman and140 AYRESFigure 1. Portion of an instance-level Hydra database.Faust, 1994). Such applications have normally relied on special-purpose programs withgraph-searching facilities. The drawback of this approach is that the data collected can-not be managed as an ordinary database with all the advantages of integrity constraints,general query facilities and so on. It would of course be possible to build special-purposeapplications over a relational (or other) system but this would be to produce specific so-lutions to a general problemdata in many application domains is usefully viewed as anetwork.In this paper we present Hydra, a computationally complete functional language ex-tended with the facilities to define and query a database. In this respect it is similar toprevious functional database languages, such as FQL (Buneman and Frankel, 1979) andFDL (Poulovassilis and King, 1990). The novelty of Hydra lies in the facilities it providesto query the way in which database values are associated with each other. These associ-ational facilities are provided without losing any of the retrieval power of previous querylanguages. Hydra uses a restricted functional data model in which the database becomes agraph with the nodes representing entities or scalar values and labelled arcs between nodescapturing attributes and relationships. A portion of a Hydra-style database graph is shownin figure 1. The advantage of such a representation scheme for data is that associationalqueries can be processed by searching for a path through this database graph. For example,finding an association between John and Mary can be seen as equivalent to looking for adirect or indirect connection between the database nodes which correspond to those entities.In Hydra the database graph is built up by giving defining equations for a set of functionswhich correspond to the different kinds of association recorded in the database. For example,the database fragment shown in figure 1 can be constructed by the Hydra script shown infigure 2 in which a class of person entities is created and populated before two databaseentity person;create person John, Mary, Sue;age :: person -> int;child :: person -> [person];child John =+ Mary, Sue;age John = 55;age Sue = 30;Figure 2. Portion of Hydra script to create a database.THE FUNCTIONAL DATA MODEL 141functions are declared. The defining equations for these functions can be seen as creatingthe instance-level arcs of the database graph. Conventional queries can be carried out byapplying database functions to parametersfor example, age John; which evaluates to55. More importantly though the Hydra language contains built-in primitives to search fordatabase associations. Hence the question What is the connection between John and Mary?can be answered using the built-in Hydra function trail as follows:trail 1 John Mary;which returns the list[[John, child, Mary]]containing all the direct paths connecting John and Mary. The user can specify longersearch paths as intrail 3 55 30;which will return the result[[55, ~age, John, child, Sue, age, 30]]where the ~age function corresponds to an arc traversed backwards.The benefit of integrating associational features in the way they have been in Hydrais that they are general purpose and can be used on any database. This offers advantageseven in applications where the data might not normally be modelled in terms of a network.For example, the user can carry out queries corresponding to questions such as Find outeverything known about John or Find an entity which is directly or indirectly associatedwith 55, Mary, and Sue. In Hydra these questions can be answered with the user-definedfunctions known and centre (introduced later) as follows:known John;centre [55, Mary, Sue];The former returns the result[(ml.age, [55]), (child, [Mary, Sue]), (~child, [])]representing everything that is known about John. The latter query returns the list [John]containing all the values in the database graph with a direct link to each of 55, Mary, andSue.These queries are conceptually simple but cannot be carried out in standard databasesystems. Such systems generally use records as the fundamental data structure; this meansthat the semantics of connections are not preserved in the data. Moreover, standard querylanguages are first order and do not allow a second order query (one quantified over thedatabase schema) to be formulated. A further problem, and one which motivates the use142 AYRESof a functional query language, is that imperative or logic-based languages treat valuesand functions (or predicates) in different ways and so do not provide a framework wherefunctions can be returned as query results.The use of a graph-based data model combined with a functional language has otheradvantages. Schema design is simplelargely a question of identifying the relevant entitytypes to be modelledand the schema can be extended without needing to change theexisting design or data organisation. The use of a graph-based data model means that thedatabase is naturally viewed as a network and this has been exploited in the development ofa graphical interface, VisualQ (see figure 3). VisualQ allows the naive user to use a smallset of queries based on the associational primitives to explore a Hydra database and placethe result on a free-form canvas. The use of a canvas allows multimedia data to be naturallyintegrated into a database view.Apart from the ability to return functions as results the use of a functional languageto define and query the database has further advantages. It provides a declarative querylanguage which, by allowing the user to define other functions (which are not consideredto form part of the database) results in a computationally complete language. Finally, theuse of lazy implementation techniques (Jones, 1987), where the evaluation of expressionsis delayed until their result is required to evaluate some outer expression, is appropriate fora database query language since it minimises retrievals from secondary storage.Figure 3. The VisualQ graphical interface to Hydra.THE FUNCTIONAL DATA MODEL 143Hydra is not the first system to provide specialised graph-searching facilities. Other sys-tems, such as GraphLog (Consens and Mendelzon, 1990, 1993), have not, however, inte-grated these with a full database system. GraphLog, for example, uses a directed graphas a way of representing a logic program graphically and does not provide the unre-stricted searching capabilities available in Hydra. Where such facilities have been inte-grated with a database they have been restricted to particular data types, as with the RM/Tmodel (Codd, 1979) or in certain geographical information systems where there are spe-cial features for representing and querying transportation systems, as in GraphDB (Guting,1994).In the rest of the paper we first give an overview of the data modelling issues which must beaddressed. Section 3 gives an overview of the Hydra language concentrating particularly onthe novel features of the language; examples of how these features may be used are givenin Section 4. Section 5 gives an overview of a graphical interface, VisualQ, which wasdeveloped as a front-end to the language. Finally, in Section 6 we outline our conclusionsand discuss further work.2. Data modelling issuesThe motivation behind the design of Hydra was to produce a language which could ex-press and process queries concerned with retrieving direct or indirect associations betweendatabase elements. For this to be possible the language must use a data model in which allsuch associations are explicitly represented.This requirement effectively rules out the use of a record-oriented data model such asthe relational model. As has been pointed out by Kent (1979) records have a numberof weaknesses for data modelling. One of these is that there is not necessarily a one-to-one correspondence between application entities and database records. In a relationaldatabase an entity may be represented by several records appearing in different tables. Forexample, an individual may be both an employee and a customer of a company so recordscorresponding to the individual may appear in both employee and customer tables in thecompanys database. Similarly, records only distinguish between entities to the extentthat they record distinguishing attributes of those entitiesa record only really impliesexistence of at least one occurrence of an entity or relationship. These issues are addressedin object-oriented models by providing object-identifiers (or surrogates) which can be usedto directly model application entities. However, there is a further problem with record-based systems: the semantics of the application links captured by attributes in records(or foreign key relationships) are not preserved in the database and so cannot be directlyretrieved.These data modelling problems are avoided in graph databases in which application datais modelled in terms of a set of binary relationships between sets of entities or scalar values(Kent, 1979). Using this approach the database becomes a labelled digraph. The retrievalof associations between database entities can consequently be treated as a search for a paththrough the database graph. Other queries are also possiblefinding all the neighbours ofa given node, for instance, is equivalent to retrieving everything known about the entity orvalue represented by the node.144 AYRES2.1. The data model of HydraThe underlying data model of Hydra is a labelled, directed graph whose nodes correspondto atomic values such as strings or integers, or surrogates representing application entities.A Hydra database can thus be thought of as a set of binary relations between sets of atomicvalues. For instance, the two relationsage(Person,Integer)child(Person,Person)constitute a database schema where Person is a set of entities corresponding to peo-ple in the application domain. The instance level database would be built up by intro-ducing surrogates corresponding to Person entities and defining the extents of the tworelations.In order to accommodate a binary relational database within the syntactic framework ofa functional language these relations are represented as functions at the level of the Hydralanguage. Two kinds of functions may be defined: a relation such as age would be definedas a single-valued function of typeperson -> integerand a relation such as child as a multi-valued function of typeperson -> [person]where [person] denotes list of person. A function such as age will be partial so a null-valueis also introduced at the language level to cater for situations where the application of asingle-valued function to a value is undefined. List-valued functions are used to accommo-date relations such as child since sets cannot be easily supported in functional languages.This means that an arbitrary (but consistent) order is imposed on the result of an applicationof a multi-valued function.A Hydra database is declared by giving type declarations for a set of single and multi-valued functions representing application data. The instance-level data is entered by givingdefinitions to the functions introduced. Such database functions are termed primary in theHydra language to distinguish them from other, general computational functions which theuser may also define.Application of a function, such asage to an entity of type person, corresponds to followingthe relation forwards (from person to integer); in order to follow a relation backwards(from an age to people with that age) Hydra provides a set of converse functions. For eachdatabase function, such as age or child, the system automatically maintains the conversefunction, denoted by prefixing the function with a tilde (e.g., ~age, ~child). Hence thefunction ~age has typeinteger -> [person]THE FUNCTIONAL DATA MODEL 145and when applied to an integer returns a list of people with that age. All converse functionsare list-valued, the lists they return corresponding to sets of results on which a consistent,but arbitrary, order is imposed.An important point about this data model is that it entails no loss of expressive powercompared to the relational (or any other record-oriented) data model. Just as any n-aryrelation can be re-expressed using nC 1 binary relations so can it be re-expressed usingnC 1 functions.This data model is clearly very close to that of Daplex (Shipman, 1981; Kulkarni andAtkinson, 1986) but with some differences. The insistence, in the data model, on atomictypes and functions of at most one parameter ensures that associations between values canalways be seen as simple paths through the database graph. Were constructed types (orfunctions of more than one parameter) to appear in the database then the process of findingassociations between entities or values (the motivation behind the language) would becomplicated and such paths might not have a simple representation in terms of user-definedfunctions.A further difference is that Hydra, like FDL, treats what Daplex calls base and derivedfunctions in a uniform mannerindeed functions can combine both extensional and in-tensional defining equations. As a consequence the distinction made in relational systemsbetween base and derived tables has no analogue in Hydra.3. Overview of HydraThe use of a graph-oriented data model ensures that, in principle, a query system can searchfor arbitrary associations between database nodes and return the associations found asresults. However, standard query languages are first-order, that is they do not allow the userto carry out queries which are quantified over the relations in a database schema. Moreover,they do not provide a framework in which functions or relationships can be returned asresults. In contrast functional languages, in which functions are first-class citizens, allowfunctions to be returned as results and so provide an ideal framework for the inclusion ofassociational facilities. In Hydra, such facilities are provided by augmenting the set ofbuilt-in functions with a small set of second-order primitives which use schema-informationto find actual or potential associations between entities and return the results in the form oflists of functions.Hydra is a polymorphic functional language with a syntax similar to that of Miranda(Turner, 1985). It incorporates many of the features of modern functional languages suchas: user-defined types, polymorphism, higher-order functions, lazy evaluation, and listcomprehensions. It allows the user to build up a database by declaring and defining aspecial class of functions, called primary functions. The user may also define generalpurpose (computational) functions, termed secondary, which are not considered to formpart of the database.Below we give an overview of Hydra presenting, in turn, its type system, its database def-inition and query facilities, its general computational facilities, and finally the associationalquery facilities which are provided by built-in functions.146 AYRES3.1. Type systemOne of the characteristics of queries which correspond to questions such as:What is the connection between John and Bill?orFind everything known about John.is that the types of the results cannot be known in advance. In particular, the way thatresults of such queries are represented in Hydraas lists of database values (nodes) ordatabase values and functions (arcs)gives rise to heteromorphic lists which are not well-typed according to the conventional polymorphic type system generally used by functionallanguages.However, extending the query power of the database language clearly requires suchheteromorphic lists to be supported. In Hydra this has been done by augmenting the typesystem with a universal type which is taken to be the union of all other types and bypreserving type information at run time. We review the conventional features of the typesystem first before introducing its novel features.Atomic types. In Hydra atomic values are those which are considered to have no internalstructure. It is precisely atomic values which may appear as nodes in the database graph.Three kinds of built-in atomic type are supported: integers, represented by a non-empty string of digits such as 0 or 190, strings, which are enclosed in double quotes as in "" (the empty string), "abc1" or"n"n"" (using the backslash convention of C), and booleans, represented by True and False.Standard operations are provided on all these types and future implementations will alsoinclude support for a real type.The only atomic types which the user may introduce are classes of database surrogates.These are introduced using the keyword entity, hence the declarationsentity person;entity location;introduce the entity classes person and location. The entity classes of Hydra are likeenumerated types except that they are dynamicentities may be added or removed at anypoint. Hydra uses visible surrogates which may either be introduced by the user as increate person RobertoBonni;create person ColinNewmarch, JohnSmith;or generated by the system as increate person;THE FUNCTIONAL DATA MODEL 147which will automatically generate a surrogate such as Person001. The system ensures thatall surrogates are unique. Surrogates (entities) can be removed as followsdelete Person001;and the system will automatically update function definitions to remove references toPerson001. The use of visible surrogates means that, from a functional programmingperspective, surrogates can be treated as nullary constructor functions. The surrogate strat-egy of Hydra (along with features for changing surrogates) is discussed in greater detailelsewhere (Ayres and King, 1995).Constructed types. Hydra provides two built-in constructed typeslists and tuples. Listsare enclosed in square brackets and their elements separated by commas. Hence [1,2,3] isa list of integers whose type is designated as [int]. Lists are constructed using the normallist constructor operator (represented by:) so [1,2,3] is really just a syntactic variantfor the expression 1:2:3:[] where [] is the empty list. Tuples of two or more items areenclosed in ordinary brackets and the items separated by commas. Thus (1,True) is a pairformed from an integer and a boolean whose type is designated as (int,bool).The user may define new sum-of-product types and their constructors. For example,the typesday ::= Mon | Tue | Wed | Thur | Fri | Sat | Sun;date ::= JUL int int| GREG int int int| DAYCOUNT int date;represent days of the week and dates measured in different calendars. User-defined data-types can be recursive, as with the third alternative for date which represents a date as a daycount from a base date. Examples of valid dates are thus:JUL 321 1997;GREG 12 7 1996;DAYCOUNT 10319 (JUL 365 1979);Polymorphism. Hydra, in keeping with other functional languages, supports polymorphicor generic types. For instance, the type definitiontree a ::= TREE (tree a) (tree a)| LEAF adefines a generic binary tree type which can be instantiated to particular tree types byreplacing the polymorphic type variable a with a specific type. Hence:TREE (LEAF 1) (TREE (LEAF 2) (LEAF 1));148 AYRESis a value of type tree int. Such polymorphic types encapsulate the essential structureof a set of related types and allow generic manipulation functions to be encoded.The universal type. In addition to the entity classes and standard type features outlinedabove, Hydra provides a special universal type denoted by ?. The universal type representsthe union of all types (functional and non-functional) and is needed to accommodate someof the results which may be produced by the associational primitives (introduced below).For example, using the associational features it is quite simple to construct a heteromorphiclist such as [John, age, 29]. Such a value would give rise to a type error in conventionalfunctional languages but in Hydra it can be assigned the type [?].One implication of the universal type is that it is no longer possible to use the typeinference techniques of standard functional languages (Milner, 1978). In Hydra the type ofa function must be declared before its definition is given. This is not a disadvantage in adatabase context since type declarations of functions also serve as integrity constraints.A separate implication of the support for heteromorphic structures (such as the list above)is that type information must be preserved at run time. This is primarily so that the systemcan determine how to display values of differing types but type information is also exploitedby some of the features of the language to be introduced below.3.2. Database definition, update, and query facilitiesA database schema is declared in Hydra by introducing entity classes and giving the typedeclarations of one or more primary functions. Primary functions are specifically intendedto model application data and must be consistent with the underlying data model outlinedabove. Hence they must be declared with an atomic domain and a range which is either ofatomic type or of list of atomic type. For example, a single-valued function to representpeoples ages may be declared asprimary age :: person -> int;and a multi-valued function to record the locations frequented by individuals asprimary frequents :: person -> [location];The instance-level database is built up by defining the primary functionsthese definitionscan be incrementally updated. For a single-valued function, such as age, we can givedefinitions as followsage RobertoBonni = 32;age ColinNewmarch = 56;Where multi-valued functions are concerned the definitions are set-oriented. The definitionfrequents RobertoBonni =+ KingGeorge, RonsGym;THE FUNCTIONAL DATA MODEL 149means that RobertoBonni frequents the two locations KingGeorge and RonsGym in ad-dition to whatever locations he is already recorded as frequenting.The definitions of primary functions can combine both extensional and intensional equa-tions. For example, a possible default definition for the function age isage x = 21;Information is retrieved by applying primary functions to parameters as in the expressionage RobertoBonni;which is evaluated using best-fit pattern matching. The evaluator first looks for a definingequation of the form age RobertoBonni = ... and findsage RobertoBonni = 32;so returning the result 32. Had the query age JohnSmith; been entered the evaluatorwould have failed to find a definition of the form age JohnSmith = ... and then lookedfor one with a variable on the left hand side, found the default equation given above, andso returned the result 21.As mentioned above, converse functions are automatically maintained by the system: tofind all the people who are 32 years old the user can enter the query~age 32;which returns the list [RobertoBonni] given the database so far built up.Defining-equations for functions can be removed from the database by simply givingtheir defining pattern without a right hand side. Thusage RobertoBonni = ;removes the definition of age for RobertoBonni andage x = ;removes the default definition for age. The entire function (declaration and definition) canbe removed by the commanddelete age;For multi-valued functions there is the facility to carry out set-oriented deletions. Thus thestatementfrequents RobertoBonni =- KingGeorge, RonsGym;150 AYRESremoves KingGeorge and RonsGym from the locations frequented by RobertoBonni andof course the declarationfrequents RobertoBonni = ;removes all the locations which RobertoBonni has been defined as frequenting.Null values. Support for single-valued primary functions means that a null value has tobe introduced into the language so that a result can be returned when a primary function isundefined. Hydra supports typed null values which are designated by prefixing the nameof an atomic type with a question mark. Hence ?person is a null value of type person, andage JohnSmith will evaluate to ?int.Null values are associated with types in this way so that they can be used as parametersfor some of the type-sensitive query facilities presented below.More complex queries. Hydra provides a function like which retrieves all databaseentities of the same type as its parameter. Hencelike JohnSmith;returns a list of all the person entities in the database[RobertoBonni, ColinNewmarch, JohnSmith]The same result would have been returned if the null person had been used instead as inlike ?person;. The like primitive can be used with parameters of any type to return allthe values of the same type that have so far been defined or used. Hencelike Mon;returns the list [Mon, Tue, Wed, Thur, Fri, Sat, Sun] andlike age;returns the singleton list [age]. The behaviour of like is slightly different for integersand strings since these values are predefined. Hence the expression like ?int; returnsthe list [21,32,56] of all the integers which are explicitly used in function definitions.When used in conjunction with list abstractions the like primitive makes it possible toexpress SQL-like queries. Hence the expression[age x | x THE FUNCTIONAL DATA MODEL 151side of the list abstraction. Hence[x | x 35];gives [ColinNewmarch]the list of all persons over 35 years of age.Just as we can convert any relational data model into an equally expressive data modelin Hydra so any query which might be made on the original relational database can bere-expressed in Hydra using like combined with list abstractions and possibly some user-defined computational functions (discussed below).Function composition. Hydra provides a number of built-in function composition oper-ators. For instance, if the user has defined a function partner as followsprimary partner :: person -> person;then using the built-in function composition operator (denoted by a full stop) the user canform the functionage.partnercorresponding to the relationship age of partner of or even~age.age.partnercorresponding to people of the same age as the partner of . It is not possible to combinelist-valued functions in the same way since the expression partner. ~age (intended tocorrespond to the relation partners of people of age) is not well-formedthe range of~age is [person] so we cannot directly compose partnerwith the function. To overcomethis problem Hydra provides two further, specialised composition operators. The first(denoted by the infix operator ..) allows a single-valued function to be composed with amulti-valued function to produce a multi-valued function. Hencepartner..~ageis well-formed and correctly encodes the partners of people of age relation. Two multi-valued functions can be composed using a further operator (denoted by ...) to produce asingle multi-valued function. For example, if the user has defined a multi-valued functionchild to associate an individual with his or her children thenchild...childrepresents the relationship grandchildren.152 AYRESType-sensitive features. In Hydra it is possible to define database functions which takeor return any kind of atomic value. For instance the user can define a function icon whichwill, for any atomic value, return the name of a bitmap to be used when displaying thevalue (the icon function is used in this way by the graphical interface discussed later). Thefunction is declared asprimary icon :: ? -> string;Note that the use of ? in this context really means atomic rather than universal since primaryfunctions are only defined on atomic types. The user can give definitions for icon such asicon KingGeorge = "pub.xbm";icon JohnSmith = "johnsmith.xbm";to associate bitmaps with particular values. Atomic types can be used in pattern speci-fications to associate icons with classes of value in the absence of an exact match. Theequationsicon (x::person) = "person.xbm";icon (x::location) = "building.xbm";associate default bitmaps with entities of type person or location. Of course, an overalldefault can also be specified by omitting the type specification as inicon x = "point.xbm";The best-fit pattern matching means that for any parameter to theicon function the evaluatorwill first look for a precise match, then for a match on the basis of the parameters type, andfinally for a general default equation.The equality test of Hydra is heteromorphic and makes use of type information to comparevalues. Hence the testsTrue == 1;JohnSmith == [JohnSmith];are both well-formed and evaluate to False. As well as comparing atomic and constructedvalues it is also possible to compare functions on the basis of their syntactic identity. Hencethe testage == parent;is well-formed and returnsFalse. This facility does compromise referential transparencythe property of declarative languages where two identical subexpressions always evaluateto the same resultto the extent that it is possible for the user to give two functions, f andg say, identical definitions and yet for the test f == g to evaluate to False. In practiceTHE FUNCTIONAL DATA MODEL 153though this is unlikely to pose a problem and, from a database perspective, given that suchfunctions will certainly have different application semantics it is questionable whether atest purely on the basis of the functions denotations would be preferable. These issues arediscussed in greater detail elsewhere (Ayres and King, 1995).3.3. General computational facilitiesThe computational power of primary functions is limited by the type-restrictions imposedon their definitions. To extend the language to computational completeness a further classof functionstermed secondaryis supported. These are user-defined functions that arenot considered to form part of the database so their types and definitions are unconstrainedby data-modelling considerations.As with primary functions, secondary functions must be declared before their definitionis given. A polymorphic function to return the length of a list can be declared and definedassecondary length :: [a] -> int;length (x:xs) = 1 + length xs |length [] = 0 ;and used in expressions such as length (like ?person) to return the number of personsurrogates defined in the database.Secondary functions are evaluated using top-to-bottom pattern matching; different defin-ing equations are separated by vertical bars. This approach has been taken since secondaryfunctions are likely to have a relatively small number of defining equations and top-to-bottom pattern matching can be implemented more efficiently than the best-fit approachused with primary functions. A further difference is that the definition of a secondary func-tion must be given in one go and cannot be updated (though it can be removed altogether asin the command delete length; and then redefined). Secondary functions are treated inthis way because their definitions, which do not track an application domain, are stable andunlikely to be updated. Their implementation can also be optimised using all the standardfunctional program compilation techniques (Jones, 1987)the current implementation ofHydra uses a supercombinator implementation strategy for secondary functions.A function to test whether a value is null can be defined using a secondary function asfollowssecondary isnull :: a -> bool;isnull x = not (x == x);This definition makes use of the property of Hydra nulls that a test such as ?int == ?intreturns False. The isnull function can be used in a function ml defined assecondary ml :: a -> [a];ml x = if (isnull x) [] [x];154 AYRESThus (ml ?int) returns [] and (ml 2) returns [2]. The purpose of ml is to coercesingle-valued functions to behave in the same way as multi-valued ones. Hence, the ex-pressionml.age JohnSmith;returns a singleton list if the age of JohnSmith has been defined and an empty list otherwise.The ml function is used by the associational primitives discussed below.3.4. Associational facilitiesThe novel feature of Hydra lies in the associational facilities it provides. These enable theuser to determine the way in which database entities or values are related to each other andto program generalised searches of the database. These associational facilities are providedthrough the mechanism of built-in primitives which make use of schema information todetermine what functions can be applied to atomic values or the ways in which atomicvalues are associated with each other. There are four primitivesfrom, to, trail, andlink which are introduced below.The primitive from returns all the primary functions which can be applied to a value.Given the schema shown in figure 4 (where multi-valued functions are represented withdouble-headed arrows), the queryfrom JohnSmith;returns the answer[works at, ml.sex, pers desc]The from function is evaluated by first determining the type of its parameter and theninspecting the schema to return a list of primary functions with the same domain. The orderof the functions in this list (and in the lists returned by other associational functions) isimplementation dependent but consistent between updates to the database.Figure 4. Portion of schema of a criminal intelligence database.THE FUNCTIONAL DATA MODEL 155Figure 5. Instance level fragment of criminal intelligence database.The from primitive may be used to find all the information held on JohnSmith with thequery[(f, f JohnSmith) | f [location] and of type person -> [string]. With the universaltype, however, the list can be assigned the type [person -> [?]]. Consequently, the typeof from is taken to bea -> [a -> [?]]Note that if from is applied to a value which is not atomic (and thus could not appear in thedatabase) the empty list is returned.The primitive to is similar to from except that instead of returning primary functions itreturns converse functions. Hence (given the schema in figure 4) the queryto KingGeorge;156 AYRESreturns the result[~place, ~works at]The primitive trail is concerned with finding paths through the database graph whichconnect entities or values. The querytrail 1 JohnSmith RomaRest;searches the database for direct connections between JohnSmith and RomaRest. Theresult is returned in the form of a list of lists each one of which represents a separate pathconnecting the two values. A path is represented as an alternating sequence of nodes andarcs, thus an answer to the above query might be[[JohnSmith, works at, RomaRest]]If the original query had been expressed instead astrail 1 RomaRest JohnSmith;the same connection would have been found but expressed instead as[[RomaRest, ~works at, JohnSmith]]The first parameter to trail limits the length of path which will be searched for. Thustrail 3 JohnSmith Inc045;will return the result[[JohnSmith, works at, RomaRest, ~place, Inc045],[John, works at, RomaRest, ~works at, MarcoBonni,~involved, Inc045]]corresponding to two connectionsJohnSmith works at the place where the incidentoccurred and JohnSmith works at the same place as MarcoBonni who was involved inthe same incident. The paths which traverse the string constants "waiter" and "male"are not returned in the result. This is because a constraint that scalar values may onlyappear at the ends is imposed on the paths retrievedintermediate nodes on a path must beentities. This condition is imposed to eliminate paths which probably have little interest orsignificancesuch as the path [JohnSmith, sex, "male", ~sex, MarcoBonni].The evaluation of trail is carried out by first using the schema information to determinewhat connections could exist between the two values (given their types) and the database isthen queried to confirm the existence of actual paths. The evaluator automatically eliminatesany paths which contain a loop, that is in which the same node appears more than once.THE FUNCTIONAL DATA MODEL 157The last primitive, link, carries out the same search astrail above but returns the answerin the form of simple or composed functions corresponding to the paths found. Thuslink 1 JohnSmith RomaRest;might give a result such as [works at] signifying that there is only one direct connectionbetween the two entities. Searching for longer connections results in composed functionsbeing returned. For examplelink 3 JohnSmith Inc045;will return[~place...works at, ~involved...~works at...works at]Applying any of the functions in the result list to JohnSmith will return a list of entitieswhich will include Inc045 as one of the elements.The associational primitives outlined above can be used with typed nulls to directly queryschema information. Hence from ?person returns a list of primary functions with the typeperson as domain; to ?person a list of converse functions with person as domain; andlink 2 ?person ?person shows all the ways in which person entities could be directlyor indirectly associated.4. ExamplesThe advantage of integrating associational primitives into Hydra is that it becomes possibleto define functions which carry out complex searches on the database. For example, theprimitives from and to may be used to find everything that is known about a databasevalue. Such a facility is most simply encoded as a secondary function, which we callknown, declared and defined assecondary known :: a -> [(a -> [?], [?])];known x = [(f,f x)|f [a] -> [a];append (x:xs) ys = x : append xs ys |append [] ys = ys ;Given the instance-level database fragment shown in figure 5, the queryknown MarcoBonni;158 AYRESreturns[(works at,[RomaRest]), (ml.sex,["male"]),(pers desc,["waiter","tall"]),(~involved,[Inc045])]where Inc045 is an incident entity. It is a relatively simple matter to extend the defi-nition of known so that it will expand the database around a given node to a specifieddepth.A similar function to known, which finds all the neighbours of a given node in the databasecan be defined assecondary nbrs :: a -> [?];nbrs x = edups [n | f [a] -> bool;member x (y:ys) = (x == y) or (member x ys) |member x [] = False ;Consequently a query such as nbrs JohnSmith will give the result["waiter", "limps", "male", RomaRest]The nbrs function has been defined to be used as a building block in a function centre tobe discussed below.Suppose someone wishes to search the criminal intelligence database to find the identityof a limping man who is believed to be connected with a particular restaurant (RomaRestin our example). If the way in which this man is associated with the restaurant is known(as a customer or the proprietor, for instance) the search is relatively simple. Howeverthe situation can easily arise where the manner in which the known entities or valuesare associated to the unknown person may not be known. In conventional databases thissituation is problematical and can only be resolved by trying out large numbers of queries orwriting programs to scan the database. Given that, in Hydra, we can find all the neighbouringnodes of any item in the database it is simple to find entities or values which are directlyconnected with any number of known values or entities.A function to do this, centre, which takes a list of entities or values and returns a listof all those database nodes which have a direct association with each of the items in theTHE FUNCTIONAL DATA MODEL 159parameter list can be declared assecondary centre :: [?] -> [?];and defined using the nbrs function as follows:centre [] = [] |centre [x] = nbrs x |centre (x:xs) = inter (nbrs x) (centre xs);where inter takes two lists and returns a list of items which appear in both the lists. It isdeclared and defined assecondary inter :: [a] -> [a] -> [a];inter (x:xs) ys = if (member x ys)(x : inter xs ys)(inter xs ys) |inter [] ys = [] ;Hence a query such ascentre [RomaRest, "limps", "male"];will return the singleton list [JohnSmith].It is relatively simple to generalise the definition of centre so that it can cope withindirect as well as direct associations or so that it can find the entities or values which areassociated with the highest number of elements in the parameter list. The important pointabout functions such as known and centre is that they are easily encoded and provideuseful facilities which are not available in standard database systems.5. A graphical query interfaceOne problem which arises in Hydra is that the semantics of the results returned may oftenbe obscured. For example an answer such as[[John, ~involved, Inc003, involved, Bill],[John, works at, KingGeorge, ~place, Inc003, involved, Bill]]to the query trail 4 John Bill; obscures the fact that the two paths overlap. Clearly agraphical representation of the result would be preferable. A further problem with a textualinterface is that it excludes multimedia datatypes such as pictures or text.These weaknesses motivated the development of VisualQ, a graphical interface to Hydra(Abreu, 1995; Ayres and Abreu, 1997). VisualQ exploits the restricted functional data modelof Hydra by allowing the user to draw out the database in the form of a graph. Starting from160 AYRESa blank canvas the user can place nodes corresponding to values in the database and browsethrough the database. Entities are represented by icons and connections between entitiesor values (corresponding to primary function definitions) are shown as labelled arcs. Anadvantage of using a free-form canvas is that multimedia data (such as pictures or text) canbe easily incorporated.The VisualQ interface is shown in figure 6. This shows part of a criminal intelligencedatabase in which a particular incident (Inc100, represented by a running-man icon) andsome associated information is displayed. The text block on the left gives a descriptionof the incidenta bank robbery. Entities connected to the incident, a car and a suspectalong with his description, are also shown. The user has also managed to retrieve fromthe database a known criminal, MarcoBonni who fits the suspects description, along withhis photo. This association was retrieved by using a VisualQ option to invoke the centrefunction using the three known attributes of Sus100a.VisualQ is at the same time a drafting tool and a database query interface. Thus thecomponents of the diagram can be dragged on the canvas to make it more readable. Alsoa limited set of queries can be invoked by clicking on values shown on the canvas. Thoseentities or values which are displayed with a question mark underneath them have con-nections to other values which are not shown on the canvas. Clicking on the car entityF234GHJ automatically invokes the query known F234GHJ; on the underlying databaseand the results of this query are shown in the sub-window in the middle of the canvas.This sub-window shows all the functions (underlined) which can be applied to the en-tity and, indented, the results of applying them. Thus we can see that F234GHJ is a redVauxhall Cavalier. Values in the sub-window which are prefixed by *** already appearon the canvas. Entities or values in the sub-window can be transferred to the canvas byclicking on them and their connection with F234GHJ will automatically be drawn in. WhenVisualQ draws an entity (such as F234GHJ) on the canvas it carries out the query iconF234GHJ; on the underlying database to determine what icon is to be used to represent theentity.In figure 6 the user has selected the two entities F234GHJ and MarcoBonni by high-lighting them. The underlying trail function of Hydra can now be invoked (throughthe option on the left of the windows menu bar) to determine if there is any connectionbetween MarcoBonni and the car used in the incident. The result of the trail query is au-tomatically placed on the canvas giving the view shown in figure 7 where the user hasretrieved an additional incident description. It appears from this result that the car usedin the bank robbery was dumped at a location near the place of work (an Italian restau-rant) of MarcoBonni. Note that the association labelled near between the place wherethe car was dumped and the restaurant is a purely intensionally defined primary functionwhich uses the map coordinates of locations to determine nearby locations held in thedatabase.This partial overview of VisualQ demonstrates that the results of Hydra queries can bepresented in a form appropriate for the naive user. Currently VisualQ only uses a smallnumber of underlying Hydra functions but is being extended to present the results of otherHydra-defined queries. This will produce a flexible system in which new Hydra functionscan be quickly incorporated into a user-friendly interface.THE FUNCTIONAL DATA MODEL 161Figure6.TheVisualQgraphicalinterfacewiththeendsofatrailqueryhighlighted.162 AYRESFigure7.ResultoftrailqueryformattedbyVisualQ.THE FUNCTIONAL DATA MODEL 1636. ConclusionThe use of a restricted functional data model in the development of Hydra has yieldedseveral benefits: It allows the language to incorporate features to query the way in which database entitiesor values are associated with each othera query capability which is not available instandard database systems and which is provided without compromising any standardquery features. The restricted model ensures that the database has a simple network representation whichhas been exploited in the VisualQ interface. The functional view makes possible a relatively clean integration of the database witha computationally complete programming language. The way in which functional lan-guages treat functions as first-class citizens makes it possible to return functions them-selves as query results in certain circumstances. This integration would not be so easy ina logic-oriented language, such as Prolog, due to the syntactic distinction made betweenpredicates and atoms.The only expense of the data model, from a user perspective, is the distinction which must bemade between primary and secondary functions. However, we maintain that the benefits ofthis separationmaking the data model explicit, allowing associational queries, permittingdifferent update and evaluation behavioursfar outweigh the inconvenience it imposes.Further work. Currently the Hydra system is still under development and there is morethan one avenue of research which needs to be investigated. The main ones are: To investigate the properties of the type system to formally demonstrate its soundnessand possibly to modify it to provide support for class hierarchies. To investigate changing the data model. There are several kinds of change which need tobe investigated. These include altering it to incorporate temporal or certainty information.The simplicity of the data model makes it a good candidate for experimentation withincorporating further semantics. The other avenue is to investigate whether to incorporatecomplex objects but limit the search space for associational queries so that any internalstructure of such objects would be effectively ignored when looking for associations. Thiswould widen the applicability of the data model to domains where there is no advantage indecomposing information down to its most basic values and connectionsfor example,in holding co-ordinates in geographic or scientific data. To enhance the interface to ensure that the full query power of Hydra is not obscured.An obvious first step is to incorporate facilities to provide the standard query facilities ofrelational databases and then extend these possibly with further Hydra-specific facilities.The interface also needs to be enhanced to permit schema and data definition and update.ReferencesAbreu, R. (1995). A Visual Query Interface to the Associational Functional Database Language Hydra. Mastersthesis, Birkbeck College, University of London.164 AYRESAyres, R. and Abreu, R. (1997). VisualQ: A Graphical Interface to Facilitate Database Exploration. Submitted forPublication.Ayres, R. and King, P.J.H. (1995). Entities, Functions, and Surrogates in Functional Database Languages. InB. Werner (Ed.), Proceedings of Basque International Conference on Information Technology, BIWIT 95. SanSebastian, Spain: IEEE Computer Society Press.Ayres, R. and King, P.J.H. (1996). Querying Graph Databases Using a Functional Language Extended withSecond Order Facilities. In R. Morrison and J. Kennedy (Eds.), Advances in Databases, 14th British NationalConference on Databases, BNCOD14. Edinburgh, UK: Springer-Verlag.Buneman, P. and Frankel, R.E. (1979). FQLA Functional Query Language. In P.A. Bernstein (Ed.), ProceedingsACM SIGMOD 79 Conference (pp. 5258). ACM.Codd, E.F. (1979). Extending the Database Relational Model to Capture More Meaning. ACM Transactions onDatabase Systems, 4(4), 397434.Consens, M.P. and Mendelzon, A.O. (1990). GraphLog: A visual formalism for real life recursion. Proceedingsof the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 404416).Consens, M.P. and Mendelzon, A.O. (1993). Hy+: A hygraph-based query and visualisation system. Proceedingsof the 1993 ACM SIGMOD International Conference on the Management of Data (pp. 511516). ACM Press.Guting, R.H. (1994). GraphDB: Modelling and querying graphs in databases. Proceedings of the 20th InternationalConference on Very Large Data Bases. Santiago, Chile.Jones, P.S.L. (1987). The Implementation of Functional Programming Languages, Prentice-Hall International.Kent, W. (1979). Limitations of Record-Based Information Models. ACM Transactions on Database Systems,4(1), 107131.Kulkarni, K.G. and Atkinson, M.P. (1986). EFDM: Extended Functional Data Model. The Computer Journal, 29,3846.Milner, R. (1978). A Theory of Type Polymorphism in Programming. Journal of Computer and System Science,17(3), 348375.Poulovassilis, A. and King, P.J.H. (1990). Extending the functional data model to computational completeness.Proceedings of EDBT90 (vol. LNCS 416, pp. 7591). Venice, Italy: Springer-Verlag.Shipman, D. (1981). The Functional Model and the Data Language DAPLEX. ACM Transactions on DatabaseSystems, 6(1), 140173.Turner, D.A. (1985). Miranda: A Non-Strict Functional Language with Polymorphic Types. In J.P. Jouannaud(Ed.), Functional Programming Languages and Computer Architectures. Springer-Verlag. Lecture Notes inComputer Science No. 201.Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge UniversityPress.

Recommended

View more >