multicache-based content management for web caching kai cheng and yahiko kambayashi graduate school...

Multicache-Based Content Management

for Web Caching

Kai Cheng and Yahiko Kambayashi

Graduate School of Informatics, Kyoto University

Kyoto JAPAN

WISE'2000 (C)[email protected] 2

Outline of the Presentation

• Introduction– Why Content Management– Contributions of Our Work

• Multicache-Based Content Management

• Content Management Scheme for LRU-SP

• Experimental Evaluation

• Concluding Remarks


1.1. Why Content Management

User Network Servers

②① ③ ④

Maximize Hit Rates (r = / )② ① 　 (or Weighted HR)


Can Web Do Without Caching?• Bandwidth Scarcity= Weakest Part

– Unrealistic to Update All Resources

• “Hot-Spot” Servers– Unpredictable of Server Overload

• Inherent Latency = Light Speed Distance – Even Sufficient Bandwidth and Server Capacity

– Transoceanic Data Transfer: 200ms300ms

Caching Is Necessary To AdaptivelyReduce Remote Data Requests


1.2. Why Content Management

Traditional Caching

Web Caching Implications

Process OrientedHuman-User

OrientedUser Preferences

System-Level Application-Level Semantic Information

Data Block Based Document-Based Varying Sizes, Types

Memory-Based Disk-BasedPersistent Storage,

Large Size,

Replacement policies based on empirical formula are difficult to deal with these!


Deploying Content Management

• To Support – Larger Cache Space– Sophisticated Control Logic

• To Support – Sophisticated Replacement Policies With

• User-Oriented Performance Metrics

• Document Treated as Semantic Unit


1.3. Contributions of This Work

• A Multicache Architecture for Implementing Sophisticated Content Management, Including a New Cache Definition

• A Study of Content Management for LRU-SP• Simulations to Compare LRU-SP Against Others


Previous Work• Classifications in Approximate Implementations

of Complicated Caching Schemes– LRV, LNC-W3-U, etc.

• Segmentation in Traditional Caching As Tradeoffs Between Performance and Complexity – Segmented FIFO, FBR, 2Q etc.

• Disadvantages– Both Are Built-in Ad hoc Implementation, Rather than

An Independent Mechanism – Can Not Support Sophisticated Category nor Semantic-

Based Classification


Managing LFU Contents in Multiple Priority Queues

2

1

>2 B(8) C(6) D(3)

A(10) E(2) F(2)

F(1) G(1) H(1)

Hit

Hit

Outs

Outs

First In First Out Order

Ref

eren

ces

A(10) B(8) C(6) D(3) E(2) F(2) F(1) G(1) H(1)


Cache Components

• Space– Limit Storage Space

• Contents– Objects Selected for Caching

• Policies– Replacement Policies

• Constraints– Special Conditions

Space

Contents Policies

Constraints

SpaceSpace


Constraints for Cache

• Admission Constraints– Define Conditions for Objects Eligible For Caching

e.g. (size < 2MB) && !(Source = local)

• Freshness Constraints– Define Conditions for Objects Fresh Enough For Re-Use

e.g. (Type = news) && (Last-Modified < 1week)

• Miscellaneous Constraints e.g. (Time= end-of-day) (Total-Size< 95%*Cache-Size)


Multicache Architecture

SUBCACHE SUBCACHE SUBCACHE SUBCACHE SUBCACHESUBCACHE

CENTRAL

ROUTER

CENTRAL

ROUTER

Cli

ent

Web S

ervers

Web Cache With Multiple Subcaches

JUDGE

CONSTRAINTSCONSTRAINTS

CKBCKB

IN-CACHEIN-CACHE

Request/Response

Cache Knowledge

Base


Components of the Architecture

• Central Router – Control and Mediate the Cache

• Cache Knowledge Base (CKB)– A Set of Rule Based To Allocate ObjectsR1. Allocate(X, 1):-url(X, U), match(U, *.jp),content(X, baseball)

• Subcaches– Cache for Keeping Objects With Special Properties

• Cache Judge – Make Final Decisions From A Set of Eviction Candidat

es


The Procedural Description

Central Router services each request. Suppose current request is for doc

ument p; 1. Locating p by In-cache Index

2. If p is not in cache, download p; i. Validate Constraints, if false, loop;ii. Fire rules in CKB, let subcache ID = K;

iii. While no enough space in subcache K for p– Subcache K selects an eviction ;– If space sharing, other subcaches do same;– Judge assesses the eviction candidates;

– Purge the victim; iv. Cache p in subcache K

3. If p is in subcache , do i) - iv) re-cache p.


Content Management for LRU-SP

• LRU (Least Recently Used)– Primarily Designed for Equal Sized Objects, an

d Only Recency of Reference In Use

• Extended LRUs– Size-Adjusted LRU (SzLRU)– Segmented LRU (SgLRU)

• LRU-SP(Size-Adjusted and Popularity-Aware LRU)– Make SzLRU Aware of Popularity Degree


Probability of Re-ReferenceAs a Function of Current Reference Times

00.10.2

0.30.4

0.50.6

0.70.8

0.9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Next Reference Next K References After First


Cost –To-Size Ratio Model

• An Object A In Cache Saves Cost nref * (1/atime)

– nref is the frequency of reference

– atime is the time since last access, (1/atime) is the dynamic frequency of A

• When Put In Cache, It Takes Up Space size– Cost-to-size ratio = nref /(size*atime)

• The Object With Least Ratio Is Least Beneficial One


Content Management of LRU-SP

• CKB Rule:– Allocate(X, log(size/nref)):-Size(X, size), Freq(X, nref)

• Subcaches– Least Recently Used (LRU)

• Judge– Find the One With Largest (size*atime)/nref

– The Larger and Older and Colder, the Fast An Object Will Be Purged


Predicted Results

• A higher Hit Rate is expectable for LRU-SP, because it utilizes three indicators to document popularity.

• However, higher Hit Rates are usually at the cost of lower Byte Hit Rates, because smaller documents contribute less to bytes of hit data.


Experiment Results

0

0.05

0.1

0.15

0.2

0.25

0.15 0.3 0.5 0.8 1.5 2 3 4 5 6 7 8

LRU-SP SzLRU SgLRU LRV

0

0.05

0.1

0.15

0.2

0.25

0.3

0.15 0.3 0.5 0.8 1.52 3 4 5 6 7 8

RU-SP SzLRU SgLRU LRV

* *


Explanations

• LRU-SP really obtained a much higher Hit Rate than either SzLRU, SgLRU or LRV.

• LRU-SP also obtained a higher Byte Hit Rate, when cache space exceeds 3% of total required space.

• LRU-SP only incurs O(1) time complexity in content management.

• LRU-SP a significantly improved algorithm


Concluding Remarks

• Multicahe-Based Architecture Has Proved Ideal To Realize Good Balance Between High Performance and Low Overhead

• It Is Capable of Incorporating Semantic Information as Well as User Preference In Caching

• It Can Work With Data Management Systems to Support Web Information Integration

multicache-based content management for web caching kai cheng and yahiko kambayashi graduate school...

Documents

cache indexif p

200ms300ms caching

traditional caching

semanticbased classification

document p

sophisticated category

objects fresh

objects eligible