adaptable cache service and application to grid caching

20
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137 Published online 22 December 2009 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1552 Adaptable cache service and application to grid caching Laurent d’Orazio 1, , , Claudia Roncancio 2 and Cyril Labb´ e 2 1 LIMOS CNRS UMR 6158 ISIMA, Complexe Scientifique des C´ ezeaux, 63173 Aubi` ere Cedex, France 2 LIG CNRS UMR 5217 Bˆ atiment IMAG C-220, rue de la Chimie, 38400 Saint Martin d’H` eres, France SUMMARY Caching is an important element to tackle performance issues in largely distributed data management. However, caches are efficient only if they are well configured according to the context of use. As a conse- quence, they are usually built from scratch. Such an approach appears to be expensive and time consuming in grids where the various characteristics lead to many heterogeneous cache requirements. This paper proposes a framework facilitating the construction of sophisticated and dynamically adaptable caches for heterogeneous applications. Such a framework has enabled the evaluation of several configurations for distributed data querying systems and leads us to propose innovative approaches for semantic and cooperative caching. This paper also reports the results obtained in bioinformatics data management on grids showing the relevance of our proposals. Copyright © 2009 John Wiley & Sons, Ltd. Received 27 February 2009; Revised 2 July 2009; Accepted 19 October 2009 KEY WORDS: cache; semantic; cooperation; data querying; adaptability; grid 1. INTRODUCTION Grids are widely used to supply computing and storage resources required by many scientific domains, such as particle physics, meteorology or bioinformatics. In such contexts, data querying involves many sites and large amounts of data. The submitted queries are sent to the relevant, often distant, data sources to be evaluated, and the corresponding result sets are then supplied. The load on data sources and networks increases with the number of queries. This may lead to long response times when the clients are numerous and the involved data are voluminous. Correspondence to: Laurent d’Orazio, LIMOS CNRS UMR 6158 ISIMA, Complexe Scientifique des C´ ezeaux, 63173 Aubi` ere Cedex, France. E-mail: [email protected] Contract/grant sponsor: French Ministry For Research (ACI Masse de donn´ ees) Copyright 2009 John Wiley & Sons, Ltd.

Upload: independent

Post on 19-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2010; 22:1118–1137Published online 22 December 2009 inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1552

Adaptable cache service andapplication to grid caching

Laurent d’Orazio1,∗,†, Claudia Roncancio2 andCyril Labbe2

1LIMOS CNRS UMR 6158 ISIMA, Complexe Scientifique des Cezeaux, 63173Aubiere Cedex, France2LIG CNRS UMR 5217 Batiment IMAG C-220, rue de la Chimie, 38400 SaintMartin d’Heres, France

SUMMARY

Caching is an important element to tackle performance issues in largely distributed data management.However, caches are efficient only if they are well configured according to the context of use. As a conse-quence, they are usually built from scratch. Such an approach appears to be expensive and time consumingin grids where the various characteristics lead to many heterogeneous cache requirements. This paperproposes a framework facilitating the construction of sophisticated and dynamically adaptable cachesfor heterogeneous applications. Such a framework has enabled the evaluation of several configurationsfor distributed data querying systems and leads us to propose innovative approaches for semantic andcooperative caching. This paper also reports the results obtained in bioinformatics data management ongrids showing the relevance of our proposals. Copyright © 2009 John Wiley & Sons, Ltd.

Received 27 February 2009; Revised 2 July 2009; Accepted 19 October 2009

KEY WORDS: cache; semantic; cooperation; data querying; adaptability; grid

1. INTRODUCTION

Grids are widely used to supply computing and storage resources required by many scientificdomains, such as particle physics, meteorology or bioinformatics. In such contexts, data queryinginvolves many sites and large amounts of data. The submitted queries are sent to the relevant, oftendistant, data sources to be evaluated, and the corresponding result sets are then supplied. The loadon data sources and networks increases with the number of queries. This may lead to long responsetimes when the clients are numerous and the involved data are voluminous.

∗Correspondence to: Laurent d’Orazio, LIMOS CNRS UMR 6158 ISIMA, Complexe Scientifique des Cezeaux, 63173Aubiere Cedex, France.

†E-mail: [email protected]

Contract/grant sponsor: French Ministry For Research (ACI Masse de donnees)

Copyright q 2009 John Wiley & Sons, Ltd.

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1119

Caching is usually employed to tackle such problems in many contexts, particularly on theInternet [1–3] or in distributed database management systems [4–6]. On the one hand, cachingreduces the response time perceived by users, duplicates the frequently referenced data and re-duces the load on both data sources and network. On the other hand, caching increases theavailability, since it enables to access the cache content, even in case of data sources or net-works unavailability. This may occur because of failures but also because grids are variable en-vironments where data sources may appear or disappear, due, for example, to disconnection ormaintenance.A cache is efficient only if it is well configured according to its context of use. They are there-

fore usually built from scratch and require careful tuning and dynamic adaptation. Such workbecomes quite time consuming and difficult when considering environments, such as grids havingheterogeneous and dynamic characteristics.This paper proposes ACS, an adaptable cache service. ACS enables to built fine configured

cache services, considering sophisticated approaches, such as those including semantic analy-sis, in-cache query evaluation capabilities and load balancing on distributed cooperative caches.ACS has been validated building various types of caches. Experiments have been performed in areal context, with bioinformatics data (the Swiss-Prot data source [7]) on a French grid platform(Grid’5000 [8]). The results obtained show that using caches built with ACS enables to highlyimprove the performances, especially the response times perceived by bioinformaticians. They alsoconfirm the interest of such a kind of ‘framework’ to facilitate the construction of ad hoc cachingsolutions.This paper is organized as follows. Section 2 presents ACS, a framework to build cache services.

Sections 3 and 4, respectively, focus on using ACS to build semantic and cooperative caches.Section 5 presents the prototype and our experiments in a middleware for data management ongrids. The related work is described in Section 6. Finally, Section 7 concludes this paper and givesthe research perspectives.

2. ADAPTABLE CACHE SERVICE

The ACS framework enables to build caches for various contexts. It supports both static and dynamicadaptations. It enables to create a cache service specific to a context, as well as considering itsevolution to adopt new strategies. ACS is component based. This allows dynamic caches’ adaptationto ensure high level of performances, even in variable environments. Such adaptations may be simpleparameterization or composition which modifies the cache’s architecture by adding, removing orreplacing components (for example, changing the replacement policy).This section presents the main functionalities of a cache service: content management, replace-

ment management, resolution management. The optional functionalities are beyond the scope ofthis paper and will not be detailed.The ContentManager provides mechanisms to manipulate elements in the cache. Whenever

an element is to be accessed, the cache is asked to search it. Since searching is a frequent operation,it has to be particularly efficient. The cache content is usually represented by a data structure (vector,tree, hash table, etc.) named content chosen according to the application context. Implementationof methods for the content management is directly linked to this choice. It also has to be noted that

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1120 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

according to the ContentManager chosen, it is possible to supply memory or disk caches. Thereplacement policy, captured in the ReplacementManager, is in charge of determining theset of entries which have to be removed from the cache, in particular when the cache is full.The ResolutionManager captures the process executed to retrieve an element when a cachemiss occurs. In addition to considering communication with servers, it may also be in charge ofconsidering cooperative caching. Cooperative caching is detailed in Section 4.The CacheManager provides an interface to make a cache usable by clients (users, applications,

others caches). This component is used as a glue for the internal composition of all componentsof the cache, as well as for the external composition with other caches. External compositionis based on load and lookup methods supplied by the CacheManager. load is used byapplications or lower-level caches, requiring the object to be supplied, initiating a resolution in caseof miss. lookup, which is mandatory in some cooperative caching approaches (see Section 4),supplies a particular access to the cache content. It enables to request an element from the cache,without considering the resolution protocol if a miss occurs. A miss will then result in a nullanswer.

3. ACS AND SEMANTIC CACHING

Semantic caching [4–6] allows to exploit the resources in the cache and the knowledge containedin the queries themselves. As a consequence, it enables effective reasoning, delegating part of thecomputation process to the cache, reducing both data transfer and the load on the servers. When aquery is posed at a cache, it is split in two disjoint pieces: (1) a probe query, which retrieves theportion of the result available in the local cache, and (2) a remainder query, which retrieves anymissing data in the answer from the relevant server. If a remainder query exists then it is sent tothe server for processing.ACS is particularly interesting to build semantic caching. In fact, it enables deploying fine and

complex caches for many contexts. Such a fine configuration is facilitated using the separation ofconcerns principles. In particular, ACS distinguishes content management and semantic capabilitiesthat are themselves decomposed in query analysis and evaluation.

3.1. Content management

In a semantic cache, different granularities are used for data management. For example, in multidi-mensional databases the cache content is composed of chunks [9], coming from the decompositionof the data according to the different dimensions. In this paper, we will focus on more generalgranularities, with query result caching and predicate-based caching.

3.1.1. Query result caching

With query result caching [4], an entry is identified by a query and its value is the set of theobjects satisfying such a query. Figure 1(a) shows such a content. An entry is, for example,(year=2006,{obj1,obj2}). The query year=2006 is the entry’s identifier, whereas{obj1,obj2} is its value.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1121

Figure 1. Content management in semantic caching: (a) semantic query result cache; (b) semantic predicate-basedcache; and (c) semantic dual cache.

Query result caching enables efficient access to elements in the cache, since an entry aggregatesobjects according to the submitted queries. Replacement is quite simple, since the eviction of anentry does not impact other elements present in the cache. In fact, when an object is referencedby several queries, it is replicated in each entry. The number of copies of this element equalsthe number of queries by which it is referenced. This leads to non-optimal storage management.Figure 1(a) illustrates duplication of objects obj1 and obj4. In fact, obj1 is associated withqueries year=2006 and species=virus, and obj4 is associated with queriesspecies=virus and author=Blanchet. To avoid replication, one may choose to ensure thatan object is always associated with a single query. However, in that case, cache management ismore complex and constrained, making it impossible to keep in the cache two popular queries ifthey are not disjoint.ACS proposes generic components to easily build query result caches. In particular, content

management and resolution management are similar as for a basic cache. However, it is possibleto consider semantic replacement policies, such as the Manhattan strategy [4], considering thepercentage of utilization of an entry.

3.1.2. Predicate-based caching

In predicate-based caching [5], the query results are decomposed into two levels: an index associatesa query with a list of identifiers, each identifier enabling to access the corresponding objects.Figure 1(b) shows such a content. obj1 and obj2 in the answer of year=2006 have identifiersid1 and id2, respectively.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1122 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

With predicate-based caching, storage management is quite optimal, since objects are sharedby several queries without replication. When a query is deleted from the cache, the replacementpolicy removes only objects that are not referenced by other queries. Such a strong consistencymay be problematic. For example, a query may be removed from the cache, without freeing storageresources, if all the corresponding objects are referenced by other queries or if the answer is empty(as, respectively, illustrated by queries species=virus and year=2010). It also may be notedthat indirect access to objects might be inefficient, in particular, for large numbers of elements.View cache [10] is a variant of predicate-based cache proposed in centralized systems. In viewcaching, objects are not stored. A view cache only considers evaluations, associating a query withthe identifiers of the objects. The objects are then accessed directly on the data source.ACS provides specific components to support predicate-based caching. Such specific components

concern content and replacement management. PredicateBasedContentManagers enableaccessing objects via queries, whereas storage management is done according to object granularity.Similarly, with PredicateBasedReplacementManagers choosing a victim is done accord-ing to queries, whereas freeing storage space is done in terms of objects. Other functionalities, suchas resolution, can be implemented with generic components.

3.1.3. Dual caching

ACS allowed us to build and to compare several cache solutions. We also proposed a new approach,called dual cache [11], for managing query results. Dual cache is based on the flexible cooperationbetween a query cache and an object cache (see Figure 1(c)). On the one hand, the query cacheassociates queries with the corresponding identifiers of objects. On the other hand, the object cacheenables retrieving object using their identifier. When a query is submitted to a dual cache, it is firstforwarded to the query cache. It may result in a hit or a miss. There is a query hit if entries of thequery cache can be used to answer the query. In that case, a list of identifiers is retrieved. If this listis not empty, it is used to load the list of the corresponding objects via the object cache. Figure 1(c)shows (year=2006,{id1,id2}) in the query cache and (id1,obj1) and (id2,obj2) in theobject cache. It also shows obj3 in the object cache, whereas it is not referenced by the entriesof the query cache. On the contrary, the object obj6 is not present in the object cache, but isreferenced in the query cache.Dual cache optimizes storage resources, avoiding replication of objects shared by several queries

and enabling to store more cache entries. It enables storing query results in the query cache withoutkeeping corresponding objects in the object cache. As a consequence, the load on the serversmay be less important and access may be more efficient, in particular if data sources provideefficient index mechanisms. In addition, transfers according to object granularity help in reducingthe amount of data transferred, avoiding retrieving already stored objects. Flexible configurationof query caches and object caches is particularly relevant to establish fine cooperation, allowingdifferent cooperations for query caches and object caches.

3.2. Query management

The QueryManager of the semantic cache isolates two distinct, but cooperative componentsto offer analysis and evaluation capabilities. By analysis capabilities, we consider the semantic

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1123

process of comparing a submitted query with the cache content to deduce semantic overlap or se-mantic mismatch. By evaluation capabilities, we mean operators to locally evaluate queries on ob-jects in the cache.

3.2.1. Query analysis

Queries are analyzed to identify the cache entries useful in answering a submitted query q. Whenq is submitted, four types of hit may arise.Exact hit: q is already pre-calculated in the cache. This is the best situation. It is handled

by the CacheManager. In the other cases more complex analyses are performed by theQueryAnalyzer component.Extended hit: There is no exact hit but the answer can be obtained from the content of the cache.

Two situations may arise. In the first case, the identifier of an entry e1 of the cache is equivalentto q. In that case all objects referenced by e1 answer q. In the second case, an identifier of anentry e2 of the cache subsumes q. Thus, the result of q can be obtained from the answer to e2.However, some kind of filtering process is required, for example, a selection or a projection. It canbe achieved locally by the QueryManager.

Example 1. Consider the cache in Figure 1(c) and let q be year>2005∧ year<2007, there isan extended hit using the entry whose identifier is year=2006. In that case, no filtering processis necessary.

Example 2. If q is year=2006 ∧ species=bacteria, there is an extended hit using the entryyear=2006. In that case, the answer may be calculated evaluating the query submitted on theobjects identified by id1 and id2.

Partial hit: Two cases arise whether (1) an entry’s identifier e subsumes q or (2) q overlaps eand q �⊂ e. The answer of e is a part of the global answer of q. In this situation q is split in aprobe query, which is the part known by the query cache and a remainder query correspondingto the missing part [4]. Objects in the answer to the probe query are retrieved as in the caseof an exact hit. Remainder queries or miss resolution events can be solved by data servers orcooperative sites.

Example 3. If q is year>2005, there is a partial hit with the entry year=2006which participatesto the answer of q. The other part of the answer is retrieved by the remainder query year>2005∧¬(year=2006).

Example 4. Suppose a query submitted year=2006 ∧ author=Blanchet and a query cachecontaining an entry for year=2006 ∧ species=virus and whose value is V. In that case, onepart of the answer is obtained via a filtering on V, whereas the other part is retrieved submitting theremainder query year=2006 ∧ author=Blanchet ∧¬ (species=virus) to data sources.

Query miss: q is totally disconnected from all entries of the query cache.The QueryAnalyzer component chooses the relevant queries. The CacheManager takes in

charge the decomposition in probe and the remainder queries. As query matching can be a very

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1124 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

complex process, it is crucial to be able to judge when it is worth doing it. The complexity of thisprocess is strongly related to the evaluation capabilities of the QueryManager.

3.2.2. Query evaluation

The evaluation capabilities are defined by the operators (selection, projection, ordering, grouping,etc.) that can be evaluated by the cache on the objects it contains. As a matter of fact, when therelevant objects according to a submitted query q are present, the cache uses its own evaluationcapabilities to process the results. This is performed by the QueryEvaluator component, whichis used when query inclusion or partial covering is detected. As it can be assumed that the cacheworks with a small amount of data compared with servers, it could be argued that a rich set of queryoperators should be provided by the cache. However, this would be interesting only if the queryingcapabilities are handled efficiently. Two questions arise: (1) When is it worth using such querycapabilities? (2) Which, among the cache side and server side, is better suited to evaluate certainoperators? For example, if it is known that a special sorting algorithm fits better to an identifiedaccess pattern than the general one used by a server, then it could be added to the cache queryingcapabilities. Moreover, reducing the number of operators in the cache is also fruitful for queryanalysis.The features built in a QueryManager provide high flexibility for deploying a cache archi-

tecture. Supplying and configuring this component is the results of a tradeoff which is contextdependent. It is important to take into account the complexity of typical queries, resources allo-cated to the semantic cache, server and network loads. It is also important to know the main purposeof the cache. If its final goal is to save the servers’ resources then an enhanced QueryManager isrequired. All this knowledge has to be considered to choose the appropriate level of functionalityfor the QueryManager.

4. ACS AND COOPERATIVE CACHING

Generally, resolving a cache miss consists in retrieving data via servers. However servers mightbecome a bottleneck, due to computation or/and data transfer. That is why using other caches(called siblings) to resolve a cache miss, contacting servers in the last resort, can help in reducingthe load on the servers and balancing the amount of data transferred. Such a technique, referred toas distributed caching, is well known in file systems and on the Internet.

4.1. Vertical resolution

Vertical resolution, used in file systems [12] or on the Internet [1,13,14], consists in resolving acache miss via a Parent Cache (PC) (Figure 2(a)). If the PC does not contain the required object,it retrieves it from the servers and adds it in its own storage space. Since PC is accessed by many‘lower-level caches’, this solution reduces the load on the servers. For example, the result of arequest from cache c1 can be used to answer the same request from c2.Vertical resolution is quite easy to deploy and administrate. In fact, lower-level caches only have

to consider a PC in the resolution protocol. Vertical resolution is particularly employed with the

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1125

Figure 2. Cache miss resolution: (a) vertical and (b) horizontal.

Internet, where caches are deployed on proxies [15]. It has to be noted that vertical resolution isefficient only for users sharing some interests [16]. PCs may support high load. They also correspondto sensible points in the global system.ACS captures vertical resolution in the ParentResolutionManager. Vertical resolution

is triggered using the load method of the ParentResolutionManager. Since a PC mustsupply an answer, it is important to note that a PC access is done with the load method of theCacheManager. A PC may use simple or cooperative resolution. As a consequence, it is possibleto have hierarchies of caches, when a PC also uses a ParentResolutionManager. The toplevel cache proposes a resolution protocol using servers.

4.2. Horizontal resolution

Horizontal resolution is based on the concept of sibling cache (see Figure 2(b)). When a requestresults in a miss, it is forwarded to siblings. They supply the answer if it is present in their storage,null otherwise. If none sibling can supply the answer, the request is sent to servers.Horizontal caching is particularly relevant for scalability, since resources are proportional to the

number of caches in the system. It enables load balancing and increases the availability in caseof servers’ failure. In addition, unavailability of a cache is not critical, data being accessible viaother caches or servers. However, locating data may be difficult, in particular with a high numberof caches. Two main techniques are proposed in the literature: flooding and catalogue-based.

4.2.1. Flooding resolution

With flooding resolution [17–20], a miss results in forwarding a request to all siblings. The firstpositive answer is considered or, in case of all null answers, the request is forwarded to the servers.The answer is then added to the cache on which the query has first been submitted. Flooding enablesgood load balancing. In fact, considering the first positive answer will avoid retrieving answers fromheavy loaded siblings. Unfortunately, flooding is problematic since the number of requests increasesquadratically with the number of siblings. As a consequence, caches and communication links maybe high loaded. A solution is to consider different groups of caches according to their interests [21].

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1126 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

ACS captures flooding resolution in the FloodingResolutionManager, which is usedconjointly with a ResolutionManager. Such a ResolutionManagerwill be used if requestsresult in miss on all siblings. The FloodingResolutionManager forwards the request to allsiblings, using the lookup method of their CacheManager‡. If these requests fail, the serversare contacted, using the load method of the ResolutionManager.

4.2.2. Catalogue-based resolution

With a catalogue-based resolution, a catalogue is accessed to determine which siblings can answera request. This reduces the number of requests sent to siblings, and as a consequence, it reducesbandwidth consumption. However, to be worthy the catalogue has to correspond to the effectivecontent of the caches. Strong or weak consistency can be applied, according to the consideredcontext. A catalogue may be shared [22–29], reducing cost of administration and synchronization,or local [28,30,31], increasing reliability and load balancing.ACS captures catalogue-based resolution in the CatalogueResolutionManager, used

with a ResolutionManager. The CatalogueResolutionManager consults the associ-ated Catalogue using the lookup method. If the request results in a hit on the Catalogue,the request is forwarded to the corresponding siblings, using the lookup method of theirCacheManager, in order to avoid them to initiate a resolution if a cache miss occurs. If theserequests fail, servers are contacted, using the load method of the ResolutionManager. Theseinteractions are valid for both local and shared catalogues.

4.3. Non-resolution-based cooperative caching

Other cooperative caching approaches are proposed in the literature. For example, some of themaim at sharing storage resources among all caches in order to supply a big virtual cache [24,32–36].In that case, cooperation aims at distributing queries on caches, for example, according to semantic.Then if a cache miss occurs, data are retrieved from the servers. Studying distributed caching isbeyond the scope of this paper, that is why in the following, we will only consider resolution-basedcooperative caching.

5. PROTOTYPE AND VALIDATIONS

The prototype of ACS (introduced in Section 5.1) has been used to build various types of caches.In particular, we have proposed sophisticated approaches for bioinformatics data management onthe grid. Section 5.2 introduces this experimental context. Section 5.3 focuses on the validation ofACS in terms of development. Section 5.4 presents performance analysis tools used to evaluatecaches built with ACS. Finally Section 5.5 illustrates the results obtained in our experiments.

‡No resolution is initiated if a cache miss occurs.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1127

Table I. Description of nodes of Grid5000 used for experimentations.

Site Machine Processor Memory (GB) Disk

Lille IBM eServer 326 AMD Opteron 248 2.2GHz 4 SATANancy HPProLiant DL145G2 2x AMD Opteron 246 2.0GHz 2 SATARennes Sun Fire V20z 2xAMD Opteron 248 2.2GHz 2 SCSISophia Sun Fire X4100 2x dual core AMD Opteron275 2.2GHz 4 SASToulouse Sun Fire V20z AMD Opteron 248 2.2 GHz 2 SCSI

5.1. Prototype

A prototype of ACS§ has been developed in Julia, a Java instance of the Fractal componentmodel [37]. The Fractal component model supplies introspection, reflectivity and dynamic recon-figuration. Java has been chosen since the Java virtual machine enables execution on heterogeneousplatforms. In addition, it supplies many useful predefined classes (JDK1.5 compatibles).The library of ACS supplies several components. For example, LRU, MRU, FIFO, LIFO replace-

ment policies implemented by the LRUReplacementManager, MRUReplacementManager,FIFOReplacementManager and LIFOReplacementManager components. These compo-nents are generic and can be used for any type of data.

5.2. Experimental context

ACS has been used in a context of massive genomic data management. Genomic applications arecomputation and data transfer consuming, making it relevant to use caches. Experiments wereperformed using Swiss-Prot, a database of protein sequences largely used in bioinformatics. Itconsists of a large ASCII file (0.75GB) composed of about 210 000 records, each uniquely identified.A record is composed of a protein sequence and its annotation. Records follow an attribute-valuemodel, each line being composed of the name of the attribute and its value. Updates on Swiss-Protare rare (usually once a week). As a consequence, the considered context is mainly read only andis well suited for using caches, particularly client caching.In such a context, caches are deployed in a specialized architecture, driven by Gedeon [38],

a middleware for data management. Gedeon can be seen as a hybrid data management system,between a DBMS and a file system. It provides querying capabilities, enabling to evaluate querieson files’ content and associated metadata. Gedeon has been deployed on Grid5000, the very largeFrench platform for grid experiments. Clusters on four sites have been used. See Table I for adescription of the nodes of each site. For all clusters, nodes in a site are interconnected via a1GB s−1 network, whereas sites are interconnected via wide area network at 10GB s−1.For our experiments, two kinds of architectures have been used. The first one, called simple

server, consists in a basic client–server architecture. Clients request data on a node which is incharge of the whole data source. With such an architecture, the data source is a bottleneck, since

§Available at the following address: http://ligforge.imag.fr/projects/acs/.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1128 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

a single node is in charge of the evaluation process. The union of servers architecture is a goodsolution to such a problem. In such an approach, the data source is horizontally partitioned inton equally sized files, managed by a specific node. When a query is submitted, it is forwarded tothe n nodes for a parallel evaluation (horizontal partition avoids interactions between the differentservers). The results are then aggregated on the client side to build the final answer.

5.3. ACS validation

This section aims at validating ACS by trying to show ‘the easiness’ in building caches. All of thepresented caches are memory caches. Considering disk caches is the object of future works. Wewill first present a ‘basic query cache’ without semantic analysis and evaluation. In other words,such a cache only considers exact hits. We will then present semantic and cooperative caches.

5.3.1. Basic query caching

This query cache uses well-known and efficient strategies. Replacement is done using an LRUpolicy and the content is managed via a hash table. Since ACS supplies components based on thesestrategies, no development has been required for these aspects. In fact, most of the developmentprocess focuses on resolution and management of servers’ answers. Indeed, in the consideredcontext, the servers supply data flows. We have proposed a specific ResolutionManager ableto manipulate data flows. Since the size of an answer may be bigger than the size of the cache,we have used an AdmissionManager based on the size (such an AdmissionManager issupplied by the component library of ACS). Totally, the components reused from the library ofACS represent 88% of the code of the basic query cache proposed for such an application.

5.3.2. Semantic caching

This section presents the construction of semantic caches using ACS. First, it will present theanalysis and evaluation components used by the different semantic caches. Then it will focus onthe construction of the different semantic caches with ACS: a query result cache, a predicate-basedcache and a dual cache.

5.3.2.1. Semantic management. Discussions with bioinformaticians have led us to first considersimple semantic cache, considering data selection exclusively. As a consequence QueryAnalyserand QueryEvaluator proposed only consider such an operation. Other operations will be con-sidered in the future works.The QueryEvaluator employed enables to apply selection on cache entries. For that we

reused the evaluation component used in the Gedeon middleware. The development of theQueryEvaluator has thus not been costly, most of the work consisting in linking the mid-dleware, coded in C, and the cache, developed in Java through system calls.Selection queries consist of conjunctions of terms. For our experiments, we have considered

only extended hits corresponding to equivalence and query inclusion in an entry. Such overlapsare particularly relevant, since they enable to obtain the query result using only one cache entry,minimizing the cost of evaluation. Obviously, equivalence is the most favorable case, since no

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1129

evaluation is required and the analysis may be interrupted when the first equivalent entry is found.For an inclusion, we have chosen to analyze all the cache entries in order to evaluate the query onthe smallest entry including the posed query, in order to minimize the evaluation process.Only simple equivalences and inclusions are considered to reduce the analysis cost. Two queries

are equivalent only if they are composed of the same set of terms.The QueryAnalyzer considers inclusions based on additions of terms. A query q1 contains

a query q2 if all the terms of q1 are present in the query q2. For example, q1=Eukaryota ∈OC contains the query q2 = Eukaryota ∈ OC ∧ Alveolata ∈ OC. Other inclusions are notconsidered by such a QueryAnalyzer.The query search process implemented by the QueryAnalyzer uses a transformation of queries

into bits vectors. A cache entry is then identified by a bit vector. Bit vectors comparisons provide verygood performances. The transformation process is based on a dynamic translation table, between aselection term and a position in the bit vector. When a query is posed, it is decomposed accordingto the conjunctions, in order to have only primitive terms of the form attribute operationvalue. Each term is then searched in the translation table. If it is absent from the table, it isassociated to the first available position. In the case of an empty cache, the terms of the firstqueries will be associated with the first position. When a query is removed from the cache, all itscomposing terms that are not referenced by other queries in the cache are deleted. It has to be notedthat two consecutive positions are associated with a term: one position for the term and the otherfor its negation. Such an approach enables to distinguish the absence of a term from the negation.Obviously, it is important to reserve a large enough number of terms. Such a number depends on thenumber of cacheable queries and on their complexity. This approach is inspired from the semanticsignatures [39], except the dynamic association between positions and terms. With the consideredQueryAnalyzer, using bits vector enables to efficiently detect equivalent queries as their bitvectors are identical.

Example 5. With an empty cache and a conversion table consisting by a 10 size bits vector,the query Eukaryota∈OC ∧ Alveolata∈OC is transformed in the vector 1010000000,where the position 0 corresponds to Eukaryota∈OC and the position 2 to Alveolata ∈ OC(¬(Eukaryota∈OC) and ¬(Alveolata ∈ OC) being, respectively, associated with the posi-tions 1 and 3). The bit vector associated with Alveolata∈OC ∧ Eucaryota∈OC is also1010000000.

Bit vectors are also relevant to detect inclusions of queries due to a refinement via additionalselections. In fact, the inclusion of a vector associated with the query q2 in a vector associatedwith a query q1 is detected if all the bits set to 1 in q1 are also set to 1 in q2. Consider, forinstance, the preceding example and the query Eukaryota∈OC. Its vector is 1000000000.All bits set to 1 in this vector are also set to 1 in 1010000000 which is associated with queryEukaryota∈OC ∧ Alveolata∈OC.5.3.2.2. Content management. In order to compare the different configurations of semantic caches,we have implemented with ACS a query result cache, a predicate-based cache and a dual cache. Themain objective was to compare the management of semantic regions, that is why configurations ofcaches for other behaviors were equivalent. The replacement policy used is LRU. The data structureemployed is a hash table.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1130 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

Using the library supplied by ACS and the ResolutionManager developed for a basic querycache, building a semantic query result cache based on the principles presented in [4] requires onlyimplementing a specific CacheManager. First, such a CacheManager enables to check theadmissibility of a query’s answer, via an AdmissionManager based on a size policy. It alsoincludes semantic management to consider extended hits resulting from an equivalence betweenthe posed query and an entry or an inclusion of the posed query in an entry. Finally, only 19% ofthe final code were produced for this semantic query result cache. In other words, ACS allowed thereuse of 81% of the code.To develop semantic predicate-based caches, based on the specification proposed in [5], no func-

tional code was proposed at all. On the one hand, the library of ACS supplies aContentManager enabling to manage entries in terms of predicates and objects, as well asa ReplacementManager based on an LRU policy for such a type of cache. On the other hand,the ResolutionManager and the CacheManager employed for the query result cache canbe reused. As a consequence, the development of a semantic predicate-based cache focuses on thebindings of the different components.The construction of a dual cache is based on the construction of a query cache and an object cache.

Specific components are then required to manage the cooperation between these two caches. First,we have developed a ResolutionManager for the object cache, enabling to retrieve objects viatheir identifiers. The cooperation between the query and the object caches is then captured in their re-spective CacheManager. The CacheManager of the object cache enables the CacheManagerof the query cache to access objects in case of exact or extended hits. It also enables to add objectsafter a cache miss resolution. In order to manage semantic caching, the CacheManager of thequery cache must consider interactions with the QueryAnalyzer, whereas the CacheManagerof the object cache has to consider interactions with the QueryEvaluator.

5.3.3. Cooperative caching

In order to manage horizontal resolution in large-scale environments, we proposed to adopt ageneric notion of proximity [40]. This allows to manage relevant cooperative caches networkswhere the proximity between caches is calculated according to physical (load, bandwidth, etc.)or/and semantic (data, interest, etc.) parameters. This generic notion of proximity is very flexible.Besides enabling to propose existing approaches, such as peer-to-peer cooperation (where theproximity can be measured according to the number of hopes), proximity facilitated the settingup of dynamically adaptable networks of caches, which may be crucial in variable environments.Proximity is particularly useful in dual cache, since different proximities can be used for querycaches and object caches.Using a physical proximity based on the characteristics of the infrastructure for object cache

enables fine load balancing. We have established networks of object caches located in a samecluster. This enables to take advantage of provided high communication capabilities, permittingefficient transfers.Using a semantic proximity to build cooperative semantic caches reduces load on data sources.

Cooperation between query caches makes it possible to avoid evaluating some queries, corre-sponding objects being accessed via their identifiers, generally using efficient index mechanisms.Semantic proximity was based on communities. However, it has to be noted that semantic networks

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1131

were built at the deployment according to the workload chosen. Dynamically managed semanticcache networks have not been experimented.

5.3.4. Cooperative and semantic caching

We have proposed cooperative semantic caches, based on a dual cache with a flooding-basedhorizontal resolution. This choice is motivated by the fact that data are distributed on many sitesand there are several users.We have constructed a flexible cooperative semantic cache, using flooding-based Resolution

Manager, using semantic proximity for query caches and physical proximity for object caches. Thiswas facilitated by the generic FloodingResolutionManager proposed by ACS. However,two specific components had to be proposed according to the context: the TopologyManagerand the SiblingsAnswerManager.We have used the same TopologyManager for query caches and object caches. Communi-

cations are based on RMI. As a consequence, information associated with each sibling caches,managed by the TopologyManager, is the address of the cache, as well as its name. The differ-ence between the two types of caches comes from the establishment of the topologies. Query cachesare regrouped according to their semantic proximity, whereas object caches are regrouped accord-ing to their location. Two query caches cooperate, if their users belong to the same community.We consider that two users belong to the same community if at least 70% of their queries belongto the same group of species. In this context, we distinguish four communities: users interestedin Eukaryota, on Archaea, on Viruses and on Bacteria. Object caches cooperate only if they aredeployed on the same cluster. It has to be noted for query caches and object caches that topologiesare established at the deployment.Several AnswerManagers have been proposed. Query caches and object caches provide dif-

ferent treatments to positive answers. Whereas an object cache forwards without any operation theobtained result to a sibling, a query cache contacts an object cache to retrieve the correspondingobjects via their identifiers. Concerning the management of an answer to be ignored, query cachesand objects caches set the pointer to null, in order to enable the garbage collector to delete thecorresponding answer.

5.4. Tools for performance analysis

This section presents tools used to evaluate caches built with ACS, in particular a dual cacheinstance and a semantic and cooperative cache. This section first details the workload used for ourexperiments. Then it presents the performance metrics studied in our experiments.

5.4.1. Workload generation

There are two main ways to generate workloads to test these kinds of systems. The first one is touse real traces. This approach seems to give a good approximation of real use cases but finallya trace is just a particular case and often it does not represent the reality in its whole generality.Furthermore, if the main purpose is to understand why a solution is adapted to a given context, theuse of traces will not highlight the mechanisms in action. The second approach is to use synthetic

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1132 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

workload. Its main drawback is to be synthetic but this type of workload can be tuned easily. Iftraces are available they can be used for the choice of the model and for its calibration. The choiceof a model is crucial to give a good representation of a target context. Our purpose here is tounderstand the way the different caching solutions are working, that is why we choose a syntheticworkload.Classical synthetic workloads used in benchmarks, such as TPC¶ , or Polygraph‖ for instance,

do not consider semantically related queries, whereas we consider it as an important behavior forsemantic caching. We use Rx , a synthetic semantic workload [41]. In such a micro benchmark,queries correspond to progressive refinements. The first query is general and the following onesare more and more precise and thus reduce the set of matching elements. In an Rx workload, xis the ratio of subsumed queries. For example, with R50, half of the queries will be issued byconstraining the former queries. In the presented experiments, the workload is composed of queriescorresponding to a single selection term, or to conjunctions of between two to four selection terms.In order to simulate a context with semantic locality we choose for our experiments R40 and R60workloads.In addition to the semantic locality, we introduce the notion of community. Community is used

to group users having the same interests. The requests from the members of a community tendto focus on a particular subset of records. In the particular case of Swiss-prot, we have createdgroups of interest according to the tree of life. Each record belongs to one of four different groups:Eukaryota, Archaea, Viruses and Bacteria. Thus, for each group, we defined a community of userssupposed to be specifically interested in this group. In our experiments, 60% of the queries issuedby any users concern the records shared by its community. The last 40% requests are uniformlydistributed among the other data.

5.4.2. Performance metrics

One of the most important metrics to study is the mean response time which is strongly relatedto the hit ratio. But the server’s load and the amount of data transferred from servers to clientsare also important metrics. As a matter of fact using a cache saves servers and network resources.As a consequence the selected performance metrics involve: mean response time, hit ratio and theamount of transferred data.

5.5. Experimental results

These experiments aim at analyzing the impact of different semantic caches in a grid context.The description of nodes of Grid5000 used for this experiment are presented in Table I. Queryevaluation involves an union of three servers. As usual in grids, the clients are distributed. Fiftyclients uniformly distributed on four clusters generate queries according to a given workload andeach of them uses its own cache. For a single experiment all caches are of the same type, eitherpredicate-based cache, query result cache or dual cache.

¶ http://www.tpc.org/.‖http://polygraph.ircache.net/.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1133

It has to be noted that experiments in a smaller context were performed to compare a sys-tem without cache to systems with different semantic caches. These experiments have clearlyshown that the system without cache is not scalable. In fact, all queries are directly submitted toservers, making them overload. It was thus quite impossible to measure the performances of asystem without cache in a large-scale context. That is why in the following we focus on semanticcaches. In addition, we consider caching at the middleware level. Consequently, traditional blockcaching, requiring modification of Gedeon (that manages streams of data), have not been taken intoaccount.The results of a first experiment, with a total number of submitted queries equal to 5000, servers

deployed on clusters located in Lille, Rennes and Sophia-Antipolis, clients distributed on clustersin Lille, Rennes, Sophia-Antipolis and Toulouse, generating queries according to the R60 workloadand each of them uses its own 0.5GB cache (with 0.01GB allocated to the object cache for thedual cache), are illustrated in Table II. It can be noted that caches lead to a dramatic reduction inthe number of communications with the servers since many queries result in exact or extended hits.As a consequence, the amount of data transferred from servers to clients is highly reduced. We canthen conclude that a system with caches is more robust, as it saves the server and network resources.The experiment shows that the dual cache is the best approach for such a context, since with a dualcache the mean response time is the lowest (about a 35% reduction compared with other semanticcaches). Such an improvement can be explained by the reduction of the amount of data transferredbetween servers and clients, as well as the greatest exact and extended hit rates. In fact, a dual cacheenables to keep a large number of queries, regardless of the storage of the corresponding objects.That is why even if the load on the servers is the same for all semantic caches (about 24%), it hasto be noted that for dual caches only a third (about 8%) consists of query evaluations, whereastwo thirds consist of identifier lists. As a consequence, when data accesses are more efficient byidentifiers than by the corresponding queries, using a dual cache is recommended.Another experiment enables to show the impact of cache cooperation in dual cache. In this

experiment, queries are generated according to a community-based R40 workload. In addition to theworkload used, this experiment differs from the previous one by the clusters used for servers (Nancy,Rennes and Sophia-Antipolis), the size of the cache (0.325GB cache, with 0.01GB allocated to theobject cache) and the total number of submitted queries (2500). Table III presents the results for thedifferent dual caches. The results show that using cooperation enables to reduce the mean responsetime (about a 70% reduction). On the one hand, semantic proximity between query caches enablesto reduce the number of evaluations on the servers, as well as the amount of data transferred, sincedata sources are used via identifier accesses avoiding to retrieve already stored objects. On the otherhand, the physical proximity between object caches enables load balancing, reducing the bandwidth

Table II. Specific performance metrics for semantic caches in a grid context.

Semantic Response Exact Extended Load on Queries evaluated Transferredcache time (s) hit (%) hit (%) servers (%) on servers (%) data (GB)

Query result cache 73.52 19.16 56.38 24.46 24.46 187.526Predicate-based cache 71.01 26.46 49.70 23.84 23.84 185.464Dual cache 47.26 52.94 39.02 23.34 8.04 132.197

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1134 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

Table III. Specific performance metrics for dual caches in a grid context.

Dual Response Evaluation on Transferred Transferred data betweencache time (s) servers (%) data (GB) clients and servers (GB)

Basic 103.3 34 30.4 30.4Cooperative 24.4 9 25.1 11.5

consumption between the clients and servers. In fact, some objects are retrieved via object cachesbelonging to the same cluster, providing high-speed data access.

6. RELATED WORKS

This paper tackles different domains related to caching, in particular adaptable (see Section 2),semantic (see Section 3) and cooperative (see Section 4) caching. This section presents some ofthe main works related to adaptable and grid caching.Different ACSs are available, such as Memcached [42], Cache4j [43], ShiftOne Java Object

Cache [44], OSCache d’OpenSymphony [45], Ehcache [46] and Java Caching System [47]. In ouropinion, Perseus and CaLi are the most relevant for applications in grid. Perseus [48] is a frameworkfor building persistent object managers, supplying a cache service. Building cache services withPerseus is particularly interesting since such services are dynamically adaptable, thanks to theFractal component model. Perseus does not consider semantic and cooperative caching. CaLi [49]is a C + + framework for building local or distributed caches. CaLi makes it possible to buildefficient caches, considering cooperative caching. It does not support semantic caching, making itimpossible to use local evaluation resources. In addition, caches cannot be dynamically adapted,making them difficult to use in variable environments.None of the mentioned works have been deployed in a grid. Uddin Ahmed et al. [50] and Cardenas

et al. [51,52] do so adopting a mediation-like approach. Intelligent Cache Management [50] focuseson the problem of network latency. It proposes to store data in distributed DB replicated across thegrid. User SQL queries are submitted to the cache which decomposes them into sub-queries forlocal and remote domains, and builds afterwards the final results. Cardenas et al. [51,52] propose adistributed cache service for grid computing. Such a service offers semantic cache functionalitiesby using hierarchical cache architecture. A kind of global cache federates grid node caches byusing a global catalogue. A metadata catalogue helps to localize data in data sources. Finally someproposals focus on load balancing and fault tolerance in large-scale environments, such as dCache[53], a cache for grids, used for data management in particle physics. Our proposal is orthogonalto these works. It has to be noted that some very context-specific cooperative caches are proposedin the literature. For example, Trystram et al. [54] propose a cache solution for parallel multiplesequence alignments. Such solution is composed of a cache for pairwise alignments and a secondone to store multiple alignments. Pairwise entries are used to compute multiple ones. Pomares et al.[55] propose a cache solution specific to health information systems. It is based on the combinationof three types of caches which could be implemented using ACS. To the best of our knowledgeACS is the most complete proposal for constructing a very large variety of caching solutions.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1135

7. CONCLUSION

This paper has presented ACS, a framework to built adaptable cache services. ACS enables to buildsophisticated caches for various contexts, considering both semantic and cooperative caching. Itfacilitates cache services implementation. As a consequence, developers will focus more on theirneeds than on the implementation process. In addition, caches built with ACS are dynamically adapt-able, making them relevant to be used in variable contexts. The results obtained with experimentsin real environments are particularly interesting.In the near future, we aim at considering synchronization and more issues in context-aware

caching [56]. A good way to introduce a generic solution for synchronization protocols is to coupleACS with a duplication framework, such as RS2.7 [57]. Such a solution will enable high flexibilityin choosing a well-suited consistency protocol, enabling, for example, to consider degrading perfor-mance usage patterns such as false sharing for block or page caching. In variable environments, it isimportant to supply caches able to adapt themselves and to provide continuously good performanceaccording to the context. Important work on efficient monitoring has to be done to obtain worthyautonomic caching solutions.

ACKNOWLEDGEMENTS

This work was supported by the French Ministry of Research through the ACI ‘Masses de Donnees’. Weacknowledge the contributions to this work of Fabrice Jouanot, Yves Denneulin, Olivier Valentin, and membersof the Gedeon project. We also thank the anonymous referees for their many helpful comments.

REFERENCES

1. Chankhunthod A, Danzig PB, Neerdaels C, Schwartz MF, Worrell KJ. A hierarchical internet object cache. Proceedingsof the USENIX Annual Technical Conference, San Diego, U.S.A., 1996; 153–164.

2. Chidlovskii B, Roncancio C, Schneider M-L. Semantic cache mechanism for heterogeneous web querying. ComputerNetworks 1999; 31:1347–1360.

3. Barish G, Obraczka K. World wide web caching: Trends and techniques. Communications Magazine, IEEE 2000;38(5):178–184.

4. Dar S, Franklin MJ, Jonsson BT, Srivastava D, Tan M. Semantic data caching and replacement. Proceedings of theInternational Conference on Very Large Data Bases, Bombay, India, 1996; 330–341.

5. Keller AM, Basu J. A predicate-based caching scheme for client-server database architectures. The Very Large DataBases Journal 1996; 5(1):35–47.

6. Ren Q, Dunham MH, Kumar V. Semantic caching and query processing. IEEE Transactions on Knowledge and DataEngineering 2003; 15(1):192–210.

7. Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C,Phan I, Pilbout S, Schneider M. The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic AcidsResearch 2003; 31(1):365–370.

8. Cappello F, Caron E, Dayde M, Desprez F, Jegou Y, Primet P, Jeannot E, Lanteri S, Leduc J, Melab N, Mornet G, Namyst R,Quetier B, Richard O. Grid’5000: A large scale and highly reconfigurable grid experimental testbed. Proceedings of theIEEE/ACM International Workshop on Grid Computing, Seattle, U.S.A., 2005; 99–106.

9. Deshpande PM, Ramasamy K, Shukla A, Naughton JF. Caching multidimensional queries using chunks. Proceedings ofthe ACM SIGMOD International Conference on Management of Data, Seattle, U.S.A., 1998; 259–270.

10. Roussopoulos N. An incremental access method for viewcache: Concept, algorithms, and cost analysis. ACM Transactionson Database Systems 1991; 16(3):535–563.

11. d’Orazio L, Roncancio C, Labbe C, Jouanot F. Semantic caching in large scale querying systems. Revista ColombianaDe Computacion 2008; 9(1):33–57.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

1136 L. D’ORAZIO, C. RONCANCIO AND C. LABBE

12. Smith AJ. Cache memories. ACM Computing Survey 1982; 14(3):473–530.13. Danzig PB, Hall RS, Schwartz MF. A case for caching file objects inside internetworks. SIGCOMM Computer

Communication Review 1993; 23(4):239–248.14. Bestavros A, Carter RL, Crovella ME, Cunha CR, Heddaya A, Mirdad SA. Application-level document caching in the

internet. Proceedings of the International Workshop in Distributed and Networked Environments, Whistler, Canada, 1995;166–173.

15. Glassman S. A caching relay for the world wide web. Proceedings of the International World Wide Web Conference,Geneva, Switzerland, 1994; 165–173.

16. Muntz D, Honeyman P. Multi-level caching in distributed file systems—or—your cache ain’t nuthin’ but trash. Proceedingsof the USENIX Winter Conference, San Francisco, U.S.A., 1992; 305–313.

17. Leff A, Wolf JL, Yu PS. Replication algorithms in a remote caching architecture. IEEE Transactions on Parallel andDistributed Systems 1993; 4(11):1185–1204.

18. Malpani R, Lorch J, Berger D. Making world wide web caching servers cooperate. Proceedings of the InternationalConference on the World Wide Web, Boston, U.S.A., 1995.

19. Wessels D, Claffy K. ICP and the squid web cache. IEEE Journal on Selected Areas in Communication 1998; 16(3):345–357.

20. Vixie P, Wessels D. Hyper text caching protocol (htcp/0.0), 2000.21. Tay TT, Feng Y, Wijeysundera MN. A distributed internet caching system. Proceedings of the IEEE Conference on

Local Computer Networks, Tampa, U.S.A., 2000; 624–633.22. Franklin MJ, Carey MJ, Livny M. Global memory management in client–server database architectures. Proceedings of

the International Conference on Very Large Data Bases, Vancouver, Canada, 1992; 596–609.23. Dahlin MD, Mather CJ, Wang RY, Anderson TE, Patterson DA. A quantitative analysis of cache policies for scalable

network file systems. Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of ComputerSystems, Nashville, U.S.A., 1994; 150–160.

24. Dahlin M, Wang RY, Anderson TE, Patterson DA. Cooperative caching: Using remote client memory to improve filesystem performance. Proceedings of the Symposium on Operating Systems Design and Implementation, Monterey, U.S.A.,1994; 267–280.

25. Gadde S, Rabinovich M, Chase J. Reduce, reuse, recycle: An approach to building large internet caches. Proceedingsof the International Workshop on Hot Topics in Operating Systems, Cape Cod, U.S.A., 1997; 93–98.

26. Makpangou M, Pierre G, Khoury C, Dorta N. Replicated directory service for weakly consistent distributed caches.Proceedings of the International Conference on Distributed Computing Systems, Austin, U.S.A., 1999; 92–100.

27. Povey D, Harrison J. A distributed internet cache. Proceedings of the Australian Computer Science Conference, Sydney,Australia, 1997.

28. Rabinovich M, Chase J, Gadde S. Not all hits are created equal: Cooperative proxy caching over a wide-area network.Computer Networks and ISDN Systems 1998; 30(22–23):2253–2259.

29. Menaud J-M, Issarny V, Banatre M. A new protocol for efficient cooperative transversal web caching. Proceedings ofthe International Symposium on Distributed Computing, Andros, Greece, 1998; 288–302.

30. Fan L, Cao P, Almeida J, Broder AZ. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACMTransactions on Networking 2000; 8(3):281–293.

31. Rousskov A, Wessels D. Cache digests. Computer Networks and ISDN Systems 1998; 30(22–23):2155–2168.32. Valloppillil V, Ross KW. Cache array routing protocol v1.0. Internet Draft, 1998.33. Khaleel A, Reddy NAL. Evaluation of data and request distribution policies in clustered servers. Proceedings of the

International Conference on High Performance Computing, Calcutta, India, 1999; 55–60.34. Karger D, Sherman A, Berkheimer A, Bogstad B, Dhanidina R, Iwamoto K, Kim B, Matkins L, Yerushalmi Y. Web

caching with consistent hashing. Computer Networks 1999; 31(11–16):1203–1213.35. Cecchet E. Whoops!: A clustered web cache for dsm systems using memory mapped networks. Proceedings of the

International Conference on Distributed Computing Systems, Vienna, Austria, 2002; 806–811.36. Aldinucci M, Torquati M. Accelerating apache farms through ad-hoc distributed scalable object repository. Proceedings

of the European Conference on Parallel and Distributed Computing, Pisa, Italy, 2004; 596–605.37. Bruneton E, Coupaye T, Leclerq M, Quema V, Stefani J-B. An open component model and its support in Java. Proceedings

of International Symposium in Component based Software Engineering, Edinburgh, U.K., 2004; 7–22.38. Vanlentin O, Jouanot F, d’Orazio L, Denneulin Y, Roncancio C, Labbe C, Blanchet C, Sens P, Bonnard C. Gedeon,

un intergiciel pour grille de donnees. Proceedings of the French Conference on Operating Systems, Perpignan, France,2006.

39. Chidlovskii B, Borghoff UM. Semantic caching of web queries. The Very Large Data Bases Journal 2000; 9(1):2–17.40. d’Orazio L, Jouanot F, Denneulin Y, Labbe C, Roncancio C, Valentin O. Distributed semantic caching in grid middleware.

Proceedings of the International Conference on Database and Expert Systems Applications, Regensburg, Germany, 2007;162–171.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe

ADAPTABLE CACHE SERVICE AND APPLICATION TO GRID CACHING 1137

41. Luo Q, Naughton JF, Krishnamurthy R, Cao P, Li Y. Active query caching for database web servers. Proceedings of theInternational Workshop on the World Wide Web and Databases, Dallas, U.S.A., 2001; 92–104.

42. Memcached. Available at: http://www.danga.com/memcached/ [2009].43. Cache4j. Available at: http://cache4j.sourceforge.net/ [2009].44. Shiftone Java object cache. Available at: http://jocache.sourceforge.net/ [2009].45. Oscache. Available at: http://www.opensymphony.com/oscache/ [2009].46. Ehcache. Available at: http://ehcache.sourceforge.net/ [2009].47. Java caching system. Available at: http://jakarta.apache.org/jcs/ [2009].48. Garcia-Banuelos L, Duong P-Q, Collet C. A component based infrastructure for customized persistent object management.

Proceedings of the International Workshop on Parallel and Distributed Databases: Innovative Applications andNew Architectures, Prague, Czech Republic, 2003.

49. Zola J. Cali, efficient library for cache implementation. Proceedings of the Mexican International Conference on ComputerScience, Colima, Mexico, 2004; 415–420.

50. Ahmed MU, Zaheer RA, Qadir MA. Intelligent cache management for data grid. Proceedings of the Australian Workshopon Grid Computing and e-research, Newcastle, Australia, 2005; 5–12.

51. Cardenas Y, Pierson J-M, Brunie L. Uniform distributed cache service for grid computing. Proceedings of the InternationalWorkshop on Database and Expert Systems Applications, Copenhagen, Denmark, 2005; 351–355.

52. Cardenas Y, Pierson J-M, Brunie L. Management of a cooperative cache in grids with grid cache services: Researcharticles. Concurrency and Computation: Practice and Experience 2007; 19(16):2141–2155.

53. Fuhrmann P, Gulzow V. Dcache, storage system for the future. Proceedings of the European Conference on Parallel andDistributed Computing, Dresden, Germany, 2006; 1106–1113.

54. Trystram D, Zola J. Parallel multiple sequence alignment with decentralized cache support. Proceedings of the EuropeanConference on Parallel and Distributed Computing, Lisbon, Portugal, 2005; 1217–1226.

55. Pomares A, Roncancio C, Abasolo J. Virtual objects in large scale health information systems. Proceedings of theHealthGrid Conference, Chicago, U.S.A., 2008.

56. Jouanot F, d’Orazio L, Roncancio C. Context-aware cache management in grid middleware. Proceedings of theInternational Conference on Data Management in Grid and P2P System, Turin, Italy, 2008; 34–45.

57. Drapeau S, Roncancio C, Dechamboux P. Rs2.7, un canevas adaptable de duplication. Technique et Science Informatiques2003; 22(10):1297–1324.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:1118–1137DOI: 10.1002/cpe