multicache-based content management for web caching

Multicache-Based Content Management

for Web Caching

Kai Cheng and Yahiko Kambayashi

Graduate School of Informatics, Kyoto University

Kyoto JAPAN

WISE'2000 (C)[email protected] 2

Outline of the Presentation

• Introduction– Localizing Web Contents– Why Content Management– Contributions of Our Work

• Multicache-Based Content Management• Content Management Scheme for LRU-SP• Experimental Evaluation• Concluding Remarks


Web Caching For Localizing Web Contents

19%

40%

75%

0%10%20%30%40%50%60%70%80%

1994 1996 1997

Internet Traffic for WWW• World Wide Content

Access/Delivery– Bandwidth Constraints– “Hot-Spot” Servers– Inherent Latency

(200300ms)

• Web Caching For Localizing Web Contents – Reduce Network Traffic– Distribute Server Load– Reduce Response Times– Can We Expect More ?


Characteristics and Implications

Traditional Caching

Web Caching Implications

Process OrientedHuman-User

OrientedUser Preferences

System-Level Application-Level Semantic Information

Data Block Based Document-Based Varying Sizes, Types

Memory-Based Disk-BasedPersistent Storage,

Large Size,


Limitations of Current Caching Schemes

• Document Managed As Physical Unit, Not Semantic Unit.

• Only Physical Properties Being Used

• Less Organized, Less Structured

• Only Support Simple Control Logic

Beyond Simple Priority Queues, Towards Sophisticated Content Management


Content Management

• Basic Features– Larger Cache Space– Sophisticated Control Logic

• More Challenging – Sophisticated Replacement Policies With

• User-Oriented Performance Metrics

• Document Managed as Semantic Unit


Contributions of This Work

• A Multicache Architecture for Implementing Sophisticated Content Management

• A Study of Content Management for LRU-SP

• Simulations to Compare LRU-SP Against Others


Previous Work• Classifications (Cache Data )

– LRV, LNC-W3-U, etc.

• Segmentation (Cache Space)– Segmented FIFO, FBR, 2Q etc.

• Features– Differentiating Data With Different Properties

• Shortages: – No Sophisticated Category

– No Semantic-Based Classification


Managing LFU Contents in Multiple Priority Queues

2

1

>2 B(8) C(6) D(3)

A(10) E(2) F(2)

F(1) G(1) H(1)

Hit

Hit

Outs

Outs

First In First Out Order

Ref

eren

ces

A(10) B(8) C(6) D(3) E(2) F(2) F(1) G(1) H(1)


Basics of Cache

• Space– Limit Storage Space

• Contents– Objects Selected for Caching

• Policies– Replacement Policies

• Constraints– Special Conditions

Space

Contents Policies

Constraints

SpaceSpace


Constraints for Cache

• Admission Constraints– Define Conditions for Objects Eligible For Caching

e.g. (size < 2MB) && !(Source = local)

• Freshness Constraints– Define Conditions for Objects Fresh Enough For Re-Use

e.g. (Type = news) && (Last-Modified < 1week)

• Miscellaneous Constraints e.g. (Time= end-of-day) (Total-Size< 95%*Cache-Size)


Multicache Architecture

SUBCACHE SUBCACHE SUBCACHE SUBCACHE SUBCACHESUBCACHE

CENTRAL

ROUTER

CENTRAL

ROUTER

Client WWW

Web Cache With Multiple Subcaches

JUDGE

CONSTRAINTSCONSTRAINTS

CKBCKB

IN-CACHEIN-CACHE

Request/Response

Cache Knowledge

Base


Components of the Architecture

• Central Router – Control and Mediate the Cache

• Cache Knowledge Base (CKB)– A Set of Rule Based To Allocate ObjectsR1. Allocate(X, 1):-url(X, U), match(U, *.jp),content(X, baseball)

• Subcaches– Keep Objects With Special Characteristics

• Cache Judge – Make Final Decisions From A Set of Eviction Candidat

es


Central Router services each request. Suppose current request is for

document p; 1. Locating p by In-cache Index

2. If p is not in cache, download p; i. Validate Constraints, if false, loop;ii. Fire rules in CKB, let subcache ID = K;

iii. While no enough space in subcache K for p– Subcache K selects an eviction ;– If space sharing, other subcaches do same;– Judge assesses the eviction candidates;

– Purge the victim; iv. Cache p in subcache K

3. If p is in subcache , do i) - iv) re-cache p.

The Procedural Description


Content Management for LRU-SP

• LRU (Least Recently Used)– Primarily Designed for Equal Sized Objects, an

d Only Recency of Reference In Use

• Extended LRUs– Size-Adjusted LRU (SzLRU)– Segmented LRU (SgLRU)

• LRU-SP(Size-Adjusted and Popularity-Aware LRU)– Make SzLRU Aware of Popularity Degree


Probability of Re-ReferenceAs a Function of Current Reference Times

00.10.2

0.30.4

0.50.6

0.70.8

0.9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Next Reference Next K References After First


Cost -To-Size Ratio Model

• An Object A In Cache Saves Cost nref * (1/atime)– nref is the frequency of reference

– atime is the time since last access, (1/atime) is the dynamic frequency of A

• When Put In Cache, It Takes Up Space size– Cost-to-size ratio = nref /(size*atime)

• The Object With Least Ratio Is Least Beneficial One


Content Management of LRU-SP

• CKB Rule:– Allocate(X, log(size/nref)):-Size(X, size), Freq(X, nref)

• Subcaches– Least Recently Used (LRU)

• Judge– Find the One With Largest (size*atime)/nref

– The Larger and Older and Colder, the Fast An Object Will Be Purged


Multicache Architecture for LRU-SP

LRU Subcache ①

LRU Subcache ②

LRU Subcache ③

CKB

Hits A, B

A

B

C

Judge

a

b

c

Ca

Computational Complexity O(1)


Predicted Results

• A higher Hit Rate is expectable for LRU-SP, because it utilizes three indicators to document popularity.

• However, higher Hit Rates are usually at the cost of lower Byte Hit Rates, given a similar popularity, because smaller documents contribute less to bytes of hit data.


Experiment Results Better Than Expected

0

0.05

0.1

0.15

0.2

0.25

0.15 0.3 0.5 0.8 1.5 2 3 4 5 6 7 8

LRU-SP SzLRU SgLRU LRV

0

0.05

0.1

0.15

0.2

0.25

0.3RU-SP SzLRU SgLRU LRV

* *


Results & Explanations

• LRU-SP really obtained a much higher Hit Rate than SzLRU, SgLRU and LRV.

• LRU-SP also obtained a high Byte Hit Rate, especially when cache space exceeds 3% of total required space.

• Really Popular Objects Are Saved, So Both Hit Rate and Byte Hit Rate are Improved.

• LRU-SP only incurs O(1) time complexity in content management.


Concluding Remarks

• Multicahe-Based Architecture Has Proved Well-Performed In Balancing High Performance and Low Overhead

• Possible To Incorporate Semantic Information as Well as User Preference In Caching

• It Can Work With General Database Systems to Support Web Information Integration. (Future Work)

Thank You !And Welcome To

http://www.isse.kuis.kyoto-u.ac.jp

multicache-based content management for web caching

Documents

cache indexif p

study of content management

semanticbased classification

msweb caching

semantic unit

document p

set of eviction candidates

web cachingkai cheng