chip multiprocessing and the cell broadband engine · 8 chip multiprocessing and the cell broadband...
TRANSCRIPT
Cell Broadband Engine
© 2006 IBM Corporation
Chip Multiprocessingand the Cell Broadband Engine
Michael Gschwind
ACM Computing Frontiers 2006
Cell Broadband Engine
© 2006 IBM Corporation2 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Computer Architecture at the turn of the millenium
“The end of architecture” was being proclaimed– Frequency scaling as performance driver– State of the art microprocessors
•multiple instruction issue•out of order architecture•register renaming•deep pipelines
Little or no focus on compilers– Questions about the need for compiler research
Academic papers focused on microarchitecture tweaks
Cell Broadband Engine
© 2006 IBM Corporation3 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
The age of frequency scaling
SCALING:Voltage: V/αOxide: tox /αWire width: W/αGate width: L/αDiffusion: xd /αSubstrate: α * NA
RESULTS:Higher Density: ~α2
Higher Speed: ~αPower/ckt: ~1/α2
Power Density:~Constant
p substrate, doping α*NA
Scaled Device
L/α xd/α
GATEn+ source
n+ drain
WIRINGVoltage, V / α
W/αtox/α
Source: Dennard et al., JSSC 1974.
0.5
0.6
0.7
0.8
0.9
1
710131619222528313437
Total FO4 Per Stage
Perf
orm
ance
bips
deeper pipeline
Cell Broadband Engine
© 2006 IBM Corporation4 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Frequency scaling
A trusted standby– Frequency as the tide that raises all boats
– Increased performance across all applications
– Kept the industry going for a decade
Massive investment to keep going– Equipment cost
– Power increase
– Manufacturing variability
– New materials
Cell Broadband Engine
© 2006 IBM Corporation5 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
… but reality was closing in!
0
0.2
0.4
0.6
0.8
1
710131619222528313437
Total FO4 Per Stage
Rel
ativ
e to
Opt
imal
FO
4
bipsbips^3/W
0.001
0.01
0.1
1
10
100
1000
0.010.11
Active PowerLeakage Power
Cell Broadband Engine
© 2006 IBM Corporation6 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Power crisis
0.1
1
10
100
1000
0.010.11
gate length Lgate (µm)
classic scaling
Tox(Å)
Vdd
Vt
The power crisis is not “natural”– Created by deviating from
ideal scaling theory– Vdd and Vt not scaled by α
• additional performance with increased voltage
Laws of physics– Pchip = ntransistors * Ptransistor
Marginal performance gain per transistor low– significant power increase– decreased power/performance
efficiency
Cell Broadband Engine
© 2006 IBM Corporation7 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
The power inefficiency of deep pipelining
0
0.2
0.4
0.6
0.8
1
710131619222528313437
Total FO4 Per Stage
Rel
ativ
e to
Opt
imal
FO
4
bipsbips^3/W
Deep pipelining increases number of latches and switching rate⇒ power increases with at least f2
Latch insertion delay limits gains of pipelining
Power-performance optimal Performance optimal
Source: Srinivasan et al., MICRO 2002
deeper pipeline
Cell Broadband Engine
© 2006 IBM Corporation8 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Hitting the memory wall
Latency gap– Memory speedup lags behind processor speedup
– limits ILP
Chip I/O bandwidth gap– Less bandwidth per MIPS
Latency gap as application bandwidth gap
– Typically (much) less than chip I/O bandwidth
-60
-40
-20
0
20
40
60
80
100
1990 2003
norm
aliz
ed
MFLOPS Memory latency
Source: McKee, Computing Frontiers 2004
Source: Burger et al., ISCA 1996
usable bandwidth avg. request size
roundtrip latencyno. of inflight
requests
Cell Broadband Engine
© 2006 IBM Corporation9 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Our Y2K challenge
10× performance of desktop systems
1 TeraFlop / second with a four-node configuration
1 Byte bandwidth per 1 Flop – “golden rule for balanced supercomputer design”
scalable design across a range of design points
mass-produced and low cost
Cell Broadband Engine
© 2006 IBM Corporation10 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Cell Design Goals
Provide the platform for the future of computing– 10× performance of desktop systems shipping in 2005
Computing density as main challenge
–Dramatically increase performance per X• X = Area, Power, Volume, Cost,…
Single core designs offer diminishing returns on investment– In power, area, design complexity and verification cost
Cell Broadband Engine
© 2006 IBM Corporation11 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Necessity as the mother of invention
Increase power-performance efficiency–Simple designs are more efficient in terms of power and area
Increase memory subsystem efficiency–Increasing data transaction size–Increase number of concurrently outstanding transactions⇒ Exploit larger fraction of chip I/O bandwidth
Use CMOS density scaling–Exploit density instead of frequency scaling to deliver increased aggregate performance
Use compilers to extract parallelism from application–Exploit application parallelism to translate aggregate performance to application performance
Cell Broadband Engine
© 2006 IBM Corporation12 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Exploit application-level parallelism
Data-level parallelism– SIMD parallelism improves performance with little overhead
Thread-level parallelism– Exploit application threads with multi-core design approach
Memory-level parallelism– Improve memory access efficiency by increasing number of
parallel memory transactions
Compute-transfer parallelism– Transfer data in parallel to execution by exploiting application
knowledge
Cell Broadband Engine
© 2006 IBM Corporation13 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Concept phase ideas
ChipMultiprocessor
efficient use of memory interface
large architected register set
high frequency and low power!
reverse Vddscaling for low
power
modular design design reuse
control cost of coherency
Cell Broadband Engine
© 2006 IBM Corporation14 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Cell Architecture
Heterogeneous multi-core system architecture– Power Processor
Element for control tasks– Synergistic Processor
Elements for data-intensive processing
Synergistic Processor Element (SPE) consists of – Synergistic Processor
Unit (SPU)– Synergistic Memory Flow
Control (SMF)
16B/cycle (2x)16B/cycle
BIC
FlexIOTM
MIC
Dual XDRTM
16B/cycle
EIB (up to 96B/cycle)
16B/cycle
64-bit Power Architecture with VMX
PPE
SPE
LS
SXUSPU
SMF
PXUL1
PPU
16B/cycleL2
32B/cycle
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
Cell Broadband Engine
© 2006 IBM Corporation15 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Shifting the Balance of Power with Cell Broadband Engine
Data processor instead of control system– Control-centric code stable over time– Big growth in data processing needs
• Modeling• Games• Digital media• Scientific applications
Today’s architectures are built on a 40 year old data model– Efficiency as defined in 1964– Big overhead per data operation– Parallelism added as an after-thought
Cell Broadband Engine
© 2006 IBM Corporation16 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Powering Cell – the Synergistic Processor Unit
VRFVRF
PERMPERM LSULSUVV FF PP UU VV FF XX UU
Local StoreLocal Store
Single Port Single Port SRAMSRAM
Issue /Issue /BranchBranch
Fetch
ILB
16B x 2
2 instructions
16B x 3 x 2
64B64B
16B16B
128B128B
Source: Gschwind et al., Hot Chips 17, 2005
SMFSMF
Cell Broadband Engine
© 2006 IBM Corporation17 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Density Computing in SPEs
Today, execution units only fraction of core area and power– Bigger fraction goes to other functions
• Address translation and privilege levels• Instruction reordering• Register renaming• Cache hierarchy
Cell changes this ratio to increase performance per area and power– Architectural focus on data processing
• Wide datapaths• More and wide architectural registers• Data privatization and single level processor-local store• All code executes in a single (user) privilege level • Static scheduling
Cell Broadband Engine
© 2006 IBM Corporation18 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Streamlined ArchitectureArchitectural focus on simplicity– Aids achievable operating frequency– Optimize circuits for common performance case– Compiler aids in layering traditional hardware functions– Leverage 20 years of architecture research
Focus on statically scheduled data parallelism– Focus on data parallel instructions
• No separate scalar execution units• Scalar operations mapped onto data parallel dataflow
– Exploit wide data paths• data processing • instruction fetch
– Address impediments to static scheduling• Large register set • Reduce latencies by eliminating non-essential functionality
Cell Broadband Engine
© 2006 IBM Corporation19 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
SPE HighlightsRISC organization
– 32 bit fixed instructions– Load/store architecture– Unified register file
User-mode architecture– No page translation within SPU
SIMD dataflow– Broad set of operations (8, 16, 32, 64 Bit)– Graphics SP-Float– IEEE DP-Float
256KB Local Store– Combined I & D
DMA block transfer– using Power Architecture memory
translation
LS
LS
LS
LSGPR
FXU ODD
FXU EVN
SFPDP
CO
NTR
OL
CHANNEL
DMA SMMATO
SBIRTB
BEB
FWD
14.5mm2 (90nm SOI)Source: Kahle, Spring Processor Forum 2005
Cell Broadband Engine
© 2006 IBM Corporation20 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
dataparallelselect
staticprediction
simpler μarch
sequentialfetch
localstore
highfrequency
shorterpipeline
largebasic
blocks
determ.latency
largeregister
file
instructionbundling
DLP
widedata
paths
computedensity
Synergistic Processing
singleport
staticscheduling
scalarlayering
ILPopt.
Cell Broadband Engine
© 2006 IBM Corporation21 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Efficient data-sharing between scalar and SIMD processing
Legacy architectures separate scalar and SIMD processing– Data sharing between SIMD and scalar processing units
expensive• Transfer penalty between register files
– Defeats data-parallel performance improvement in many scenarios
Unified register file facilitates data sharing for efficient exploitation of data parallelism– Allow exploitation of data parallelism without data transfer
penalty• Data-parallelism always an improvement
Cell Broadband Engine
© 2006 IBM Corporation22 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
SPU PIPELINE FRONT END
SPU PIPELINE BACK ENDBranch Instruction
Load/Store Instruction
IF Instruction FetchIB Instruction BufferID Instruction DecodeIS Instruction IssueRF Register File AccessEX ExecutionWB Write Back
Floating Point Instruction
Permute InstructionEX1 EX2 EX3 EX4
EX1 EX2 EX3 EX4 EX5 EX6
EX1 EX2 WB
WB
RF1 RF2
WB
Fixed Point Instruction
EX1 EX2 EX3 EX4 EX5 EX6 WB
IF1 IF2 IF3 IF4 IF5 IB1 IB2 ID1 ID2 ID3 IS1 IS2
SPU Pipeline
Cell Broadband Engine
© 2006 IBM Corporation23 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Permute UnitLoad-Store Unit
Floating-Point UnitFixed-Point Unit
Branch UnitChannel Unit
Result Forwarding and StagingRegister File
Local Store(256kB)
Single Port SRAM
128B Read 128B Write
DMA Unit
Instruction Issue Unit / Instruction Line Buffer
8 Byte/Cycle 16 Byte/Cycle 128 Byte/Cycle64 Byte/Cycle
On-Chip Coherent Bus
SPE Block Diagram
Cell Broadband Engine
© 2006 IBM Corporation24 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Synergistic Memory Flow Control
Data BusSnoop BusControl BusTranslate Ld/StMMIO
DMA Engine DMAQueue
RMTMMUAtomicFacility
Bus I/F Control
LS
SXUSPU
MMIO
SPE
SMF
SMF implements memory management and mappingSMF operates in parallel to SPU– Independent compute and transfer– Command interface from SPU
• DMA queue decouples SMF & SPU• MMIO-interface for remote nodes
Block transfer between system memory and local storeSPE programs reference system memory using user-level effective address space – Ease of data sharing– Local store to local store transfers– Protection
Cell Broadband Engine
© 2006 IBM Corporation25 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Synergistic Memory Flow Control
Bus Interface– Up to 16 outstanding DMA
requests– Requests up to 16KByte– Token-based Bus Access
Management
Source: Clark et al., Hot Chips 17, 2005
Element Interconnect Bus– Four 16 byte data rings– Multiple concurrent transfers– 96B/cycle peak bandwidth – Over 100 outstanding requests– 200+ GByte/s @ 3.2+ GHz
16B 16B 16B 16B
Data Arb
16B 16B 16B 16B
16B 16B16B 16B16B 16B16B 16B
16B
16B
16B
16B
16B
16B
16B
16B
SPE0 SPE2 SPE4 SPE6
SPE7SPE5SPE3SPE1
MIC
PPE
BIF/IOIF0
IOIF1
Cell Broadband Engine
© 2006 IBM Corporation26 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Memory efficiency as key to application performance
Greatest challenge in translating peak into app performance– Peak Flops useless without way to feed data
Cache miss provides too little data too late– Inefficient for streaming / bulk data processing – Initiates transfer when transfer results are already needed– Application-controlled data fetch avoids not-on-time data delivery
SMF is a better way to look at an old problem– Fetch data blocks based on algorithm
– Blocks reflect application data structures
Cell Broadband Engine
© 2006 IBM Corporation27 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Traditional memory usage
Long latency memory access operations exposed
Cannot overlap 500+ cycles with computation
Memory access latency severely impacts application performance
computation mem protocol mem idle mem contention
Cell Broadband Engine
© 2006 IBM Corporation28 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Exploiting memory-level parallelism Reduce performance impact of memory accesses with concurrent accessCarefully scheduled memory accesses in numeric codeOut-of-order execution increases chance to discover more concurrent accesses– Overlapping 500 cycle latency with computation using OoO illusory
Bandwidth limited by queue size and roundtrip latency
Cell Broadband Engine
© 2006 IBM Corporation29 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Exploiting MLP with Chip MultiprocessingMore threads more memory level parallelism– overlap accesses from multiple cores
Cell Broadband Engine
© 2006 IBM Corporation30 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
A new form of parallelism: CTP
Compute-transfer parallelism– Concurrent execution of compute and transfer increases
efficiency• Avoid costly and unnecessary serialization
– Application thread has two threads of control• SPU computation thread• SMF transfer thread
Optimize memory access and data transfer at the application level– Exploit programmer and application knowledge about
data access patterns
Cell Broadband Engine
© 2006 IBM Corporation31 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Memory access in Cell with MLP and CTPSuper-linear performance gains observed– decouple data fetch and use
Cell Broadband Engine
© 2006 IBM Corporation32 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
multicore
manyoutstanding
transfers
sequentialfetch
highthroughput
timely data
delivery
shared I & D
reducemiss
complexity
parallelcompute &
transfer
improvedcode gen
decouplefetch andaccess
explicitdata
transfer
storagedensity
Synergistic Memory Management
singleport
applicationcontrol
reducedcoherence
cost
lowlatencyaccess
block fetch
deeper fetchqueue
localstore
Cell Broadband Engine
© 2006 IBM Corporation33 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Cell BE Implementation Characteristics
241M transistors235mm2
Design operates across wide frequency range– Optimize for power & yield
> 200 GFlops (SP) @3.2GHz> 20 GFlops (DP) @3.2GHzUp to 25.6 GB/s memory bandwidthUp to 75 GB/s I/O bandwidth100+ simultaneous bus transactions– 16+8 entry DMA queue per SPE
Rel
ativ
e
Frequency Increase vs. Power Consumption
Voltage
Source: Kahle, Spring Processor Forum 2005
Cell Broadband Engine
© 2006 IBM Corporation34 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Cell Broadband Engine
Cell Broadband Engine
© 2006 IBM Corporation35 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
System configurations
Cell BEProcessor
XDR™ XDR™
IOIF
BIF
Cell BEProcessor
XDR™ XDR™
IOIF
Cell BEProcessor
XDR™ XDR™
IOIF
BIF
Cell BEProcessor
XDR™ XDR™
IOIFswitch
Cell BEProcessor
XDR™ XDR™
IOIF BIF
Cell BEProcessor
XDR™ XDR™
IOIF
Cell BEProcessor
XDR™ XDR™
IOIF0 IOIF1
Game console systems
Blades
HDTV
Home media servers
Supercomputers
CellDesign
Cell Broadband Engine
© 2006 IBM Corporation36 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Compiling for the Cell Broadband Engine
The lesson of “RISC computing”– Architecture provides fast, streamlined primitives to compiler– Compiler uses primitives to implement higher-level idioms– If the compiler can’t target it do not include in architecture
Compiler focus throughout project– Prototype compiler soon after first proposal– Cell compiler team has made significant advances in
• Automatic SIMD code generation• Automatic parallelization• Data privatization Programmability
ProgrammerProductivity
Raw HardwarePerformance
CellDesign
Cell Broadband Engine
© 2006 IBM Corporation37 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Single source automatic parallelizationSingle source compiler– Address legacy code and programmer productivity– Partition a single source program into SPE and PPE functions
Automatic parallelization– Across multiple threads– Vectorization to exploit SIMD units
Compiler managed local store– Data and code blocking and tiling– “Software cache” managed by compiler
Range of programmer input– Fully automatic– Programmer-guided
• pragmas, intrinsics, OpenMP directivesPrototype developed in IBM Research
Cell Broadband Engine
© 2006 IBM Corporation38 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Compiler-driven performance
Code partitioning– Thread-level parallelism– Handle local store size– “Outlining” into SPE overlays
Data partitioning– Data access sequencing– Double buffering– “Software cache”
Data parallelization– SIMD vectorization– Handling of data with .unknown alignment
Source: Eichenberger et al., PACT 2005
Cell Broadband Engine
© 2006 IBM Corporation39 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Raising the bar with parallelism…
Data-parallelism and static ILP in both core types
– Results in low overhead per operationMultithreaded programming is key to great Cell BE performance– Exploit application parallelism with 9 cores – Regardless of whether code exploits DLP & ILP– Challenge regardless of homogeneity/heterogeneityLeverage parallelism between data processing and data transfer– A new level of parallelism exploiting bulk data transfer– Simultaneous processing on SPUs and data transfer on SMFs– Offers superlinear gains beyond MIPS-scaling
Cell Broadband Engine
© 2006 IBM Corporation40 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
… while understanding the trade-offsUniprocessor efficiency is actually low– Gelsinger’s Law captures historic (performance) efficiency
• 1.4x performance for 2x transistors– Marginal uni-processor (performance) efficiency is 40% (or lower!)
• And power efficiency is even worse
The “true” bar is marginal uniprocessor efficiency– A multiprocessor “only” has to beat a uniprocessor to be the better
solution
– Many low-hanging fruit to be picked in multithreading applications• Embarrassing application parallelism which has not been exploited
Cell Broadband Engine
© 2006 IBM Corporation41 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Cell: a Synergistic System Architecture
Cell is not a collection of different processors, but a synergistic whole– Operation paradigms, data formats and semantics consistent
– Share address translation and memory protection model
SPE optimized for efficient data processing– SPEs share Cell system functions provided by Power Architecture
– SMF implements interface to memory• Copy in/copy out to local storage
Power Architecture provides system functions– Virtualization
– Address translation and protection
– External exception handling
Element Interconnect Bus integrates system as data transport hub
Cell Broadband Engine
© 2006 IBM Corporation42 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
A new wave of innovation
CMPs everywhere– same ISA CMPs– different ISA CMPs– homogeneous CMP– heterogeneous CMP– thread-rich CMPs– GPUs as compute engine
Compilers and software must and will follow…
Novel systems– Power4– BOA– Piranha– Cyclops– Niagara– BlueGene– Power5– Xbox360– Cell BE
Cell Broadband Engine
© 2006 IBM Corporation43 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
Conclusion
Established performance drivers are running out of steamNeed new approach to build high performance system– Application and system-centric optimization– Chip multiprocessing – Understand and manage memory access– Need compilers to orchestrate data and instruction flow
Need new push on understanding workload behaviorNeed renewed focus on compilation technologyAn exciting journey is ahead of us!
Cell Broadband Engine
© 2006 IBM Corporation44 Chip Multiprocessing and the Cell Broadband EngineM. Gschwind, Computing Frontiers 2006
© Copyright International Business Machines Corporation 2005,6.All Rights Reserved. Printed in the United States May 2006.
The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both.IBM IBM Logo Power Architecture
Other company, product and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document areNOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could resultin death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or changeIBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnityunder the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specificenvironments, and is presented as an illustration. The results obtained in other operating environments may vary.
While the information contained herein is believed to be accurate, such information is preliminary, and should not be reliedupon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liablefor damages arising directly or indirectly from any use of the information contained in this document.