how to live with low/intermittent bandwidth/connectivity
Post on 23-Feb-2016
51 Views
Preview:
DESCRIPTION
TRANSCRIPT
How to live with low/intermittent bandwidth/connectivity
Krithi RamamrithamIIT Bombay
krithi@cse.iitb.ernet.in
2
Web Content• Web sites have traditionally served static
content
• But, dynamic content generation has come into vogue– generated on the fly by running dynamic scripts, e.g., Active
Server Pages (ASP), Java Server Pages (JSP), Servlets– allows generation of different content for the same request
3
Web PageAd Component
Headline Component
Headline Component
Headline Component
Headline Component
Personalized Component
Navi
gatio
n Co
mpo
nent
A News content site
Dynamic Web Pages…
4
Generic Architecture
Data sourcesEnd-hosts
servers
sensors
wired hosts
mobile hosts
Net
wor
k
Net
wor
k
5
Coherency of Dynamic Data
• Strong coherency– The client and source always in sync with each other– Strong coherency is expensive!
• Relax strong coherency: - coherency– Time domain: t - coherency
• The client is never out of sync with the source by more than t time units
• eg: Traffic data not stale by more than a minute– Value domain: v - coherency
• The difference in the data values at the client and the source bounded by v at all times
• eg: Only interested in temperature changes larger than 1 degree
6
Generic Architecture
Data sources
Proxies/caches
End-hosts
servers
sensors
wired host
mobile host
Net
wor
k
Net
wor
k
7
The Push Approach
• Proxy registers the data item of interest and the coherency requirement with the server
• Server pushes interesting changes
+ Achieves Strong Consistency + Keeps network overhead minimum-- Poor Scalability (has to maintain state
and has to keep connections open)-- Low Resiliency
Server Proxy UserPush Push
8
The Pull Approach
Proxy Pulls after Time to Live (TTL) Time To next Refresh (TTR / TNR)
+ Can be implemented using the HTTP protocol+ Stateless and hence is generally scalable with respect to state
space and computation– Weak cache consistency – Heavy polling for stringent coherence requirement or highly
dynamic data– Network overheads higher than for Push
Server Proxy UserPull Push
9
Typical End-to-end Web Site Architecture
Users
ApplicationServerCluster
Data
WebServerCluster
.
.
.
.
10
WS vs. AS
• Web servers– Do well defined and quantifiable local work
• e.g., processing HTTP headers, serving static content • Application servers
– Run multi-layer programs• e.g., scripts involving calls to backends
… …
WebSwitch
WebServerCluster
ApplicationServerCluster
… …
WebSwitch
WebServerCluster
ApplicationServerCluster
11
Inside the Application Layer3-tier model
PRESENTATION
BUSINESS LOGIC
DATA CONNECTOR
HTML
Objects
Row Set
• JDBC• ODBC
• Servlets• COM+• EJB
• JSP• ASP
LegacySystems
Databases
ADDT’LSERVICES
• Commerce• Content Mgt.• Personalization
12
Inside the Application Layer…
PRESENTATION
BUSINESS LOGIC
DATA CONNECTOR
• JDBC• ODBC
...Code
Block(s)
...Code
Block(s)
LegacySystems
Databases
ADDT’LSERVICES
• Commerce• Content Mgt.• Personalization
1. JSP invokes a Servlet2. Servlet contacts CMS
3. CMS requests data
4. DBMS calls storage system
13
Performance and Scalability Issues• Computationally-intensive logic executed at
multiple tiers
• Cross-tier communication
• Object instantiation and cleanup processing
• External I/O calls
• Database connection pool latencies
• Content conversion and formatting
14
Optimizing the Application LayerTraditional Means
• Optimize each tier independently:– Presentation-level caches built inside application server
processes– Main memory database employed over persistent DBMS– Persistent object storage techniques employed inside
content management systems … and so on
PRESENTATION
BUSINESS LOGIC
DATA CONNECTOR
• JDBC• ODBC
• Servlets• COM+• EJB
• JSP• ASP
ADDT’LSERVICES
Local cacheand optimization
code
15
Query result caching
• Many application server products offer this feature
-- mitigates only local database access latency-- only a subset of query results may be reused
in page generation-- page fragments may not all be from
databases
16
Middle tier database caching
• Caching database tables in main memoryOracle 9i CacheMain-memory databases, e.g., TimesTen
-- mitigates only database access latency-- caching at table granularity results in poor
cache utilization-- main-memory databases are difficult to
integrate and maintain and can be expensive
17
Page Level Caching
• Dynamically generated HTML pages are cached
+ Can completely offload work from web/app server– Low reusability for highly personalized web pages– URL may not uniquely identify a page -- increasing the risk of delivering incorrect pages– Often introduces excessive invalidations -- e.g., even if a single element on the page changes
18
Optimizing the Application LayerIssues
• Traditional techniques impact specific components within the application, but not the entire application
– No mitigation of component-to-component interaction latencies
– Different synchronization and invalidation policies risk data integrity
– Each optimization scheme consumes programmer timefor development and maintenance
19
Key ideas
• Re-use program results to eliminate redundant work • Facilitate single-point, architecture-wide optimization
Apply to both programmatic objects and result fragments
20
Optimizing the Application Layer
PRESENTATION
BUSINESS LOGIC
DATA CONNECTOR
• JDBC• ODBC
• Servlets• COM+• EJB
• JSP• ASP
LegacySystems
Databases
ADDT’LSERVICES
• Commerce• Content Mgt.• Personalization
cache
Enables the resultsof programs to bere-used.
21
Usually….
LegacySystems
1. JSP invokes a Servlet
PRESENTATION
BUSINESS LOGIC
DATA CONNECTOR
• JDBC• ODBC
...Code
Block(s)
...Code
Block(s)
Databases
ADDT’LSERVICES
• Commerce• Content Mgt.• Personalization
2. Servlet contacts CMS
3. CMS requests data
4. DBMS calls storage system
Plus, at each step there are communication delays and logic processing delays
22
Novel Solution…
PRESENTATION
BUSINESS LOGIC
DATA CONNECTOR
• JDBC• ODBC
...Code
Block(s)
...Code
Block(s)
Function Parameter(s) Result
Real-time storage engine
Tags trigger calls to the storage engine.
Can store any program output, but is most commonly an HTML fragment or a Programmatic Object.Chutney
tags
When the Result of a Function with a specific Parameter set is already known (and up-to-date), the work normally necessary to produce that Result is bypassed.
Appl. Programming Interface
23
Page generation script
...
Codeblock
Write to Out
Codeblock
Write to Out
Applicationlogic
Databasecalls
HTMLformatting...
Code Blocks Perform Work
24
Page generation script
...
Codeblock
Write to Out
Codeblock
Write to Out
Web Page
Ad Component
Headline Component
Headline Component
Headline Component
Headline Component
Personalized Component
Navi
gatio
n Co
mpo
nent
(Example: News content site)Certain components can be cached
Code Blocks <-> Components
25
DCA: Our Solution
Codeblock
Applicationlogic
Databasecalls
HTMLformatting
Page generation scriptCodeblock
...
Request
Code Block Output
End tag
Start tag
Wor
kby
pass
ed
DynamicContent
Accelerator
26
DCA in a Typical End-to-end Web Site Architecture
• A single instance of the DCA serves a rack of application servers
• Application servers communicate with DCA through a lightweight API
Users DynamicContent
Accelerator
ApplicationServerCluster
DataWeb
ServerCluster
27
Cache Management
• A critical aspect of any caching solution
• DCA supports novel cache management strategies:
– Prediction-based cache replacement– Observation-based cache invalidation
28
Cache Replacement• Prediction-based
replacement⁻ fragments having lowest
probability of access replaced⁻ Least-Likely-to-be-Used (LLU)
– Access probabilities based on:• Current user navigational
patterns over site graph (in the form of clickstreams)• Historical user navigational
patterns over site graph (in the form of association rules)
News
Sports
Hockey
Schedules ScoresPlayers Teams
Site Graph
(News, Sports, Hockey) Schedules = 20%(News, Sports, Hockey) Players = 15%(News, Sports, Hockey) Teams = 10%(News, Sports, Hockey) Scores = 55%
LLU
29
Cache Invalidation
• DCA supports common cache invalidation techniques:
– Time-based: Each cache element assigned a TTL– Event-based: Updates to the database send an invalidation
message to the cache– On demand: Manual invalidation of selected elements
• DCA supports additional invalidation techniques….
30
Cache Invalidation…• Other invalidation techniques supported:
– Observation-based• User-initiated updates are observed in scripts; each
such update sends an invalidation message to the cache
• Most appropriate for auction sites, online trading sites• Invalidation does not require communication with the
databases– Keyword-based:
• Elements can be associated with keywords; e.g., a retailer may wish to invalidate all “seasonal” items
– Regular expression-based: • Elements can be invalidated based on regular
expression matching
31
Performance Study…
Test Site
– Fictitious online retail site, allows browsing of product catalog
– Pages generated using JSP scripts– Site content stored in Oracle database– Database schema based on Dublin Core Metadata Open
Standard– Contains 200,000 products and 44,000 categories– Each page consists of 3 components, each involving a
database call
32
Performance Study…
Test Setup
– Content Database Server: Oracle 8.1.6
– Web/Application Server: WebLogic 6.0 running on cluster of 2 machines
– Server machines:have 1 GB RAM, dual P III-933 Mhz processorsrun Windows 2K Advanced Server
33
Testing Methodology...
• Baseline Parameters:– Cache Size, i.e., percentage of fragments that fit into cache: 75%– Cache replacement policy: LLU
• User load is varied by sending requests from client machines running Radview’s WebLoad
• Simulated users navigate site according to Zipf 80-20 distribution (i.e., 80% of users follow 20% of navigation links)
34
Performance Impact80% faster response times through existing application infrastructure
Source: Fortune 100 client results
0
10
20
30
40
50
60
0 100 200 300 400 500
Number of Users
Aver
age
Resp
onse
Tim
e (s
econ
ds)
non-Chutney
Chutney
35
Chutney Throughput Impact250% increase in transaction rates
Source: Fortune 100 client results
0
100
200
300
400
500
600
700
0 100 200 300 400 500
Number of Users
Tran
sact
ions
Per
Sec
ond
non-Chutney
Chutney
36
Alternative: CDNs
Sources
Repositories
Clients
ContentDistributionNetworks
e.g., Akamai
Push BasedPush BasedCore InfrastructureCore Infrastructure
37
Conclusion• Increased use of dynamic page generation technologies => increases load on application servers => serious performance and scalability problems for e-business sites • DCA (Dynamic Content Acceleration) => significantly reduces the load on the server side
infrastructure, allows e-business sites to scale => significantly outperforms existing middle tier caching
solutions
IIT Bombay’s aAQUA Community Forum
Farmers get information and
get their questions answered
-- In the local context
-- In their local language
www.aAQUA.org
Capitalizes on existing human and infrastructural resources:
Agri-extension center – KVK, Baramati
NGO – Vigyan Ashram, Pabal
Government – MCIT
39
Access over low bandwidth:Resource Optimization
Resource constraintsLow/unpredictable bandwidth => disconnected operation/access
Exploitcaching prefetching (through prediction of future needs)Profiling by user type, location =>offline aAQUA
Data characteristicsStatic data – text, images – land records, photos
can be cached/hoardedDynamic data – weather/price information
cached info need to be refreshed carefullyContinuous media – VoIP, video data
QoS considerations
top related