l c sl c s energy aware lossless data compression kenneth barr and krste asanović mit laboratory...
TRANSCRIPT
L C S
Energy Aware Lossless Data CompressionKenneth Barr and Krste Asanović
MIT Laboratory for Computer Science
Kenneth Barr and Krste Asanovic – Mobisys 2003
• Motivation
– Compression can save wireless network energy.
• Observation
– Energyadd < 1nJ
– Energysend > 1000nJ
• Approach
– Can we use 1000 “adds” to eliminate a bit?
– Reconsider slow compressors that perform many operations to achieve best compression ratios?
– Should we choose the fastest compressor because E=Pt?
Energy Aware Lossless Data Compression:Introduction
Kenneth Barr and Krste Asanovic – Mobisys 2003
Energy Aware Lossless Data Compression:Introduction
• Results– There’s no easy answer. For minimum energy, one must
characterize hardware, software, and workload.
– Energy saved on Skiff
• Compared to default (zlib-6): 31% (web) to 57% (text) savings
• Asymmetric strategy can save 11%-12% percent over the best symmetric pair.
File size(network energy)
Energy to achieve reduced file size
Kenneth Barr and Krste Asanovic – Mobisys 2003
Energy Aware Lossless Data Compression:Agenda
• Experimental Setup– Hardware
– Benchmarks
• Observed Energy– Compression Applications
– Compute, network, memory
• Energy Analysis– What impacts compression energy?
• Lowering Overall Energy of Transmission– Understanding cache behavior
– Sleep mode affects choice
– Asymmetric compression
• Conclusion and Future Work
Kenneth Barr and Krste Asanovic – Mobisys 2003
Compaq Personal Server (aka Skiff)
• CPU similar to iPAQ
• Spread out and exposed to facilitate measurement
• Network: Enterasys five volt 802.11b (Cardbus)
Kenneth Barr and Krste Asanovic – Mobisys 2003
Skiff enables power measurement
StrongARMSA-110 CPU
Flash
DRAMMem. Controller
ethernet cardWireless
Peripherals:wired ethernet,Cardbus, RS232Clocks, GPIO, et al.
Rcpu
Rperi
Rnet
Rmem
12V DC
Regulator (3.3V)
Regulator (5V)
Regulator (2V)
GND
V21V• Measurement
– Three power planes (after cutting traces)
– PC-Card power measured with extender card
– 2 measurements (supply voltage and current) per plane
– 5 x 6.5sec samples at 60Hz sample rate; multimeter controlled via RS-232
• Error– Missed events are possible due to slow
sample rate, but not a problem in practice
– Error sources analyzed in a Compaq tech report
– Total error (hardware + averaging): <1%
– Higher error with simulation-based power estimation, but simulator is useful for instruction and event counts
Kenneth Barr and Krste Asanovic – Mobisys 2003
Benchmarks
• Workload– 1MB English text from “Calgary Corpus”
• A novel and structured bibliography– 1MB web data from most popular sites (according to “Lycos Top 50”
searches and Neilsen Netratings)• No pre-compressed images (gif, jpg) were used • Mostly .html, .css, and .js• No sites had Java class files
• Compressors– Represent major algorithms (LZ77, LZ78, PPM, BWT)– Chosen due to popularity, maturity, documentation, code quality, and
portability• bzip2 (BWT)• Unix compress (LZ78)• LZO (“realtime” LZ77)• PPMd (PPM)• zlib (LZ77)
Kenneth Barr and Krste Asanovic – Mobisys 2003
Compression for portable devices
• Goal: choose a compressor that strikes best balance between compressed file size (~ network energy) and time to achieve that size (~ compute energy)
Portable
Client
Compressed request (HTTP GET, NFS Read, etc…)
Compressed response (HTML Document, source code, etc…)
Wall-powered
Server
Kenneth Barr and Krste Asanovic – Mobisys 2003
Energy required to receive 1MB text
• Receiving and uncompressing usually saves energy (compared to receiving uncompressed data)
Decompressor Throughput (Mb/sec)
bzip2 2.59
compress 11.65
LZO 109.44
PPMd 1.42
zlib 41.15
Kenneth Barr and Krste Asanovic – Mobisys 2003
Energy required to send 1MB text
• Compressing prior to sending can actually increase total energy!
• Web data (not shown) is easier to compress and requires less energy than “none” for all except bzip2
Compressor Throughput (Mb/sec)
bzip2 0.91
compress 3.70
LZO 24.22
PPMd 1.57
zlib 0.82
Kenneth Barr and Krste Asanovic – Mobisys 2003
Large effect of varying parameters
• Parameters: size of input blocks, size of data structures, amount of effort
• Use such a chart to choose best compressor for platform+data combo
Kenneth Barr and Krste Asanovic – Mobisys 2003
Energy per operation: Skiff
• Microbenchmarks verify that computation is cheap…
Operation Energy (nJ)
CPU 32-bit ADD 0.86Network Send bit (near) 417.00
Send bit (far) 1095.00Receive bit (near) 329.00Receive bit (far) 863.00
Kenneth Barr and Krste Asanovic – Mobisys 2003
Instructions per bit
• We don’t execute an unreasonable number of instructions (though there is quite a variation between applications!)
bzip2 compress LZO PPMd zlib
Compress:
Instructions per bit removed 116 10 7 76 74
Decompress:
Instructions per bit restored 31 6 2 10 5
Compress:
Instructions per bit removed 284 9 2 60 23
Decompress:
Instructions per bit restored 20 5 1 79 3
Kenneth Barr and Krste Asanovic – Mobisys 2003
Energy per operation: Skiff
• Computation is cheap, cache misses are not.
• By their nature, compressors can have many cache misses.
Operation Energy (nJ)
CPU 32-bit ADD 0.86
Network Send bit (near) 417.00
Send bit (far) 1095.00
Receive bit (near) 329.00
Receive bit (far) 863.00
Memory Load Hit 2.72
Load Miss 125.00
Load Miss + Writeback 181.00
Store Hit 2.41
Store Miss 78.34
Kenneth Barr and Krste Asanovic – Mobisys 2003
Memory Footprints
• Requiring many memory accesses leads to high energy
• But a large memory footprint can be used wisely (eg, PPMd)
Kenneth Barr and Krste Asanovic – Mobisys 2003
Merged
Understanding cache behavior
• Skiff cache is 16KB. No L2 Cache.– iPAQ cache only 8KB. Cache problems can be exacerbated.
– X-Scale cache is 32KB. May still be a problem for apps tuned for the desktop
• Suggestions for Unix Compress (which apply to other apps)– A 1K buffer speeds I/O, but cuts into 16KB cache
– Not the size of allocation, it’s how you use it. (e.g., a large, sparse hash table fewer collisions fewer misses due to probing)
– Merge adjacent tables into structure to bring in “code” with “fcode” :
Original
Kenneth Barr and Krste Asanovic – Mobisys 2003
struct entry{
int fcode;
unsigned short code;
}table[SIZE];
Wasted space due to types and alignment padding
Understanding cache behavior
• Suggestions (continued)– Compact structures to put more usable data in cache; less
wasted space
struct entry{
signed fcode:20;
unsigned code:12;
}table[SIZE];
Kenneth Barr and Krste Asanovic – Mobisys 2003
Understanding cache behavior: results
• Merging tables has little effect
• Sparse arrays have dramatic effect even though logical table is much larger than cache
• Compacting array removes 92% of cache misses from 11-merge
– Not much energy left to be saved
– But, program runs 1.5 times faster
Kenneth Barr and Krste Asanovic – Mobisys 2003
Asymmetric Compression
• No need for the same compression method in both directions– Client compresses its requests using its lowest-energy compressor
– Server supplies data (transcoding if necessary) so that client requires minimal energy to decompress
• Server can maintain state for a flow as it may be hard to compress individual small blocks
Compressed request (HTTP GET, NFS Read, etc…)
Compressed response (HTML Document, source code, etc…
Portable
Client
Wall-powered
Server
Kenneth Barr and Krste Asanovic – Mobisys 2003
Overall results
• Energy savings over mod_gzip default (eg compress12 vs zlib-6):– Text: 57%
– Web: 31%
• Asymmetric compression energy savings over best symmetric scheme(eg, compress12+zlib9 vs compress12+compress12)– Text: 11%
– Web: 12%
• Asymmetric energy savings over no compression– Text: 45%
– Web: 73%
Combination: Compressor + Decompressor
Kenneth Barr and Krste Asanovic – Mobisys 2003
Exploiting low-power sleep mode
• Idle power will affect choice of compressor on unloaded processor
• Low power idle?– Getting some work done quickly
and going to sleep is best choice
• High idle power?– It is best to spend time doing a
good job otherwise platform wastes power while idle
Kenneth Barr and Krste Asanovic – Mobisys 2003
Changing component energy affectschoice of compressor
• If CPU and memory decrease in energy while network remains constant?– Aggressive compression
becomes possible, if not better
Total Energy as CPU and Memory Energy Decrease
0.00
1.00
2.00
3.00
4.00
5.00
6.00
10 12 13 15 17 21 26 35 52 105
Network Energy / Average CPU+Memory Energy
Jo
ule
s
bzip2
compress
lzo
ppmd
zlib
• If network improves while CPU and memory remain constant?– Little change in choice
– All files compress to same order of magnitude; energy dominated by CPU and memory of compressor
Total Energy as Network Energy Decreases
0.00
1.00
2.00
3.00
4.00
5.00
6.00
10 9 8 7 6 5 4 3 2 1 0
Network Energy / CPU + Memory Energy
Jo
ule
s
bzip2
compress
lzo
ppmd
zlib
∞
Kenneth Barr and Krste Asanovic – Mobisys 2003
Related work
• Using sophisticated error correcting codes can reduce the number bits to send, but processing codes can outweigh the energy savings– Energy efficiency of error correction on wireless links (Havinga 1999)
• Energy efficient lossy compression: recast the problem or trade energy for quality– CMU Odyssey
(Satyanarayanan et al. 1994-2000)
– Algorithmic transforms for efficient scalable computation (Sinha et al. 2000)
– Adaptive image compression for wireless multimedia communication (Taylor and Dey 2001)
• Recognize the importance of low-power idle mode– Critical power slope (Miyoshi et al. 2002)
• Many other compression and optimization techniques– Several noted in my Master’s Thesis (Barr 2002)
Kenneth Barr and Krste Asanovic – Mobisys 2003
Conclusion & Future Work
• Conclusions– Compression to save transmission energy is not always a net win.
Default compressor can double send energy!
– The fastest compressor is not always best; the smallest file is not always best.
– However, knowledge of component energy and input data combined with wise choice of algorithms and parameters can give large energy savings:
• Up to 57% over default scheme
• Up to 12% over optimal symmetric scheme
• Future work– Developing a hardware energy profiler for iPAQ that fits on a PC-Card
to measure energy portably in an active system. Use its findings to choose best application or dynamically change.
– Explore further implementation tweaks for cache-friendly behavior on portable systems.
Kenneth Barr and Krste Asanovic – Mobisys 2003
Kenneth Barr and Krste Asanovic – Mobisys 2003
Backup
• Compression ratio? Text vs web?– See paper
• Why not compress on the NIC? – Regardless, same set of tradeoffs
– Higher bandwidth links -> less need.
– Multiple flows mean less correlation
– Better ratios at the application layer (application-specific compression can be employed, large context can be maintained).
• Applications– Difficult for interactive or small packet traffic
– If you have the choice over what format to receive (eg, bzip2? No!)
– Room full of conference attendees sharing an access point