innovations in energy use future information...
TRANSCRIPT
June 18, 2008
S. J. Ben YooS. J. Ben YooUC Davis Campus CITRIS DirectorUC Davis Campus CITRIS DirectorDept of Electrical and Computer Engineering, UC DavisDept of Electrical and Computer Engineering, UC [email protected]@ece.ucdavis.edu
Innovations in Energy Use
Future Information Technology
June 18, 2008Slide_2
Next Generation Data Center
Data Centers and Super Computers
MegaWatts of Power
100’s of racks
June 18, 2008Slide_3
$ for Power and Cooling in Data Centers
Courtesy: IBM research
June 18, 2008
Today’s Data Center:
> 10 MegaWatts of Power
100’s of racks
Data Center on a Cell Phone/PDA
Pico_DataCenter
1 – 4 Watt
200 W – 500 W example Pico_DataCenter Cluster
Data Center on a Chip
June 18, 2008
Power Consumption in Electronic IP routers (e.g. CISCO CRS-1)
7%
11%8.5%
25%
3.5%10%
10%25%
ASICs(assume ~12 x 90nm devices)
Memories in Forwarding Engine
Packet Buffer Memories(DRAM)
I/O (optics/framers/MACs/L2 functions)
Control Plane (2 x Route Processors & Control
Plane portion of linecardsincluded)
Switch Fabric(both the centralised components and the part on
the linecards)
Blowers(depends very much on box topology, could be
much higher)
Power Supply efficiency loss(both the 48V input supplies and
the linecard DC:DC included)
Capacity: 320 Gb/s(640 Gb/s LAN)
213 cm
91 cm 60 cm
Power: 10.8 kW
Physical Size: 213x91x60 cm3
Data by Garry Epps of CISCO
June 18, 2008
1,152 slots of 40 Gb/s line cards in 80 shelves (72 linecard shelves and 8 fabric shelves)),
Power Consumption: ~0.85 MW
Weighs: 56,656 kg
Footprint: 50 m2
Each Port Protocol Specific up to OC-768
1024 x 1024One Semiconductor Chip Switching Fabric
Power Consumption: ~500 W (200W)
Weighs: 10 kg
Footprint: 0.1 m2
Each Port Protocol Independent up to OC-768Can achieve Packet /Burst /Circuit Switching
Conventional System All-Optical System on a Chip
MA
CM
AC
46 Tb/s optical routing system on a Chip
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
CBuffer
MemoryBuffer
MemoryMA
CBufferMemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
20 nJ/b system 10 pJ/b system
June 18, 2008
Overlay Photos: IEEE Spectrum, October 2006
Next Generation Health Care
June 18, 2008
Q&A
June 18, 2008
Incollaboration withProf. Bernd HamannUCD IDAVand Tom Nesbitt UCDMC
• Real-time 3-D Visualization will require ~1 Tb/sec
bandwidth in the future
1000
x 1000
x 1000 pixels/frame
x 28 bit/pixel
x 30 frames/sec
~ 1 Tb/sec
Real-time, On-line, Collaborative Healthcare
June 18, 2008
Chip-Scale Optical Router Micro-system
S. J. B. Yoo, “Ultra-Low Latency Multi-Protocol Optical Routers for the Next Generation Internet,” U. S. Patent 6,925,257 B2 (2005).S. J. B. Yoo, “Integrated Optical Router,” U. S. Patent 6,768,827 (2004).S. J. B. Yoo, “Ultra-Low Latency Multi-Protocol Optical Routers for the Next Generation Internet,” U. S. Patent 6,519,062 (2000).S. J. B. Yoo, “Wavelength Converter with Modulated Absorber,” U. S. Patent 6,563,627 (2001).S. J. B. Yoo, “Compact Optical Receiver with Optical Signal Processing Capabilities,” U. S. Patent pending (2001).S. J. B. Yoo, G. K. Chang, “High-Throughput, Low-Latency Next Generation Internet Using Optical Tag Switching,” U. S. Patent 6,111,673.(1997)
June 18, 2008
Moore’s law vs. Processor Performance
Center for Computing of the Future
Overview
Energy Efficient Large-Scale Computing with Nanophotonic Interconnects
Center for Computing of the FutureCenter for Computing of the Future
OverviewOverview
Energy EfficientEnergy Efficient LargeLarge--Scale Computing with Scale Computing with NanophotonicNanophotonic InterconnectsInterconnects
S. J. Ben Yoo (UC Davis), Venkatesh Akella (UCDavis), Raj Amirtharajah (UCDavis), Bevan Baas (UCDavis), Keren Bergman
(Columbia), Van Carey (UC Berkeley), Shanhui Fan (Stanford), Soheil Ghiasi (UC Davis), James Harris (Stanford), Saif Islam (UC
Davis), Michal Lipson (Cornell), Kai Liu (UCDavis), David Miller (Stanford), James Shackelford (UC Davis), and many others
UCDavis, Stanford, UCBerkeley, Cornell, Columbia [email protected]
530-752-7063
http://sierra.ece.ucdavis.edu
Data Centers and Super Computers
MegaWatts of Power
100’s of racks
$ for Power and Cooling in Data Centers
Courtesy: IBM research
Moore’s Law : density not performance
Ref. “Cramming more components onto integrated circuits” by Gordon Moore, Electronics, Vol. 38. April 19, 1965
Ideal vs. Actual Scaling
Courtesy: IBM research
‘Power Cliff’ and Opportunities
Courtesy: IBM research
Key Challenges
• Performance/watt is the riding the Moore’s law curve
• Gap between peak versus averageperformance
• Pins limits the I/O bandwidth• Leakage and process variation are get worse
as we go to 32nm and beyond• Parallelism is way forward but interconnect
becomes the key - both onchip and offchip
The Rise of Chip Multiprocessors (CMPs)• Emerging trend replicate computational logic: maintain
processing throughput while lowering clock frequencies and supply voltages.
• Parallel architectures:
– better processing performance per watt
• Power– The GHz race is grinding to a halt
Pdyn ∝ Vdd2⋅ fPleak ∝ Vdd ⋅ exp(-qVt/(kT))
– 2 cores at and
– ~4x reduction in dynamic power – equal performance
• As number of cores grows, key is to performance: scalable, fast, and power-efficient:
interconnection networks
2ddV
2f
Balancing Computing and Communications
• Amdahl’s rule => match computation and communications for best operation.– Currently, growing gap between actual computer
performance and the theoretical maximum performance rating.
– Current trends in decreasing Bytes/FLOP• 10 TeraFLOPS chip will need 10 TB/sec or ~100
Tb/s communications !! (US wide Internet traffic is ~3 Tb/s today)
• Electronic communications on-chip will no longer keep up with the demand and power efficiency requirement
Balancing Bandwidth and FLOPS
Figure: Courtesy Robert Drost,
Stanford IFC Meeting, Dec 2006
Design Space
1 byte per flop
Amdahl’s rule => match computation and communications for best operation.
10 TeraFLOPS chip will need 10 TB/sec or ~100 Tb/s communications !! (US wide Internet traffic is ~3 Tb/s today)
Bandwidth versus Memory Capacity
Figure: Courtesy Robert Drost,
Stanford IFC Meeting, Dec 2006
PicoBlade Design Space
1 byte per flop
Electrical interconnects => local interconnects
Courtesy: IBM research
Optical Interconnects => High-way interconnects
Source: IBM
Issues-Next Generation Computing Systems
• Performance/power ratio is a real issue• Memory I/O bottleneck, Power bottleneck• Optics and Electronics can help each other• Optics is especially good at interconnecting
in parallel without impedance, crosstalk, and timing-jitter concerns
• Parallel processing is good, but software, virtualization, and resource management must work.
• We need a systematic approach
Imagine:
• Next Generation Computing System with– Massively parallel ‘pico-blades’ optically
interconnected with massive number of multiple wavelength
– 3-D integration of the above on a very compact ‘chip’by nanophotonics & nanoelectronics
– Intelligent virtualization and resource managementPhase I Phase II
Today
What is Proposed:• 10 year project aiming at x100~x1000
improvement in power efficiency and miniaturization
• New Computing Architecture combining optics, electronics, and embedded intelligence
• New Virtualization and Resource Management• Hardware
– Micro/Nano Photonics– Nano Electronics– Non-volatile nano memory
• Thermal Engineering• Systems Integration• Emulation and Experimental Testbed Studies
Proposed Center Thrusts• Thrust 1: Next Generation Computing System Architecture• Thrust 2: Resource Management and Virtualization• Thrust 3: Computing System EnablingTechnologies
– Nano-photonics– Nano-electronics– Nano-memory– Novel Materials
• Thrust 4: Systems Integration - Picoblade Cluster Data Center - 3-D Chip Integrated Multicore.- Hardware-Software Integration
• Thrust 5: Testbed and Application Studies– Emulation & Simulation studies– Learn from research data center testbeds and try prototypes
• Partnership with HP Data Center Research Testbed, UC Berkeley Data Center Testbed, LBL Data Center, etc.
– Healthcare Applications
UC DavisElectrical Engineering
S.J. Ben Yoo
UC BerkeleyMechanical Engineering
Van Carey
UC DavisComputer Engineering
Bevan Baas
UC DavisComputer Engineering
Rajeevan AmirtharajahProject Leaders:
Columbia Univ.
Electrical Engineering
Keren BergmanCo-Thrust Leader:
UC DavisComputer Engineering
Venkatesh AkellaThrust Leader:
Thrust 1: Next Generation Data-Center Architecture
Thrust 1: Next Generation Computing System/ Data Center Architecture
What’s in a blade?
High Level System View
• PicoBlade composition explorationa) Multi-core CPUs + DRAMb) Multi-core CPUs + DRAM + Flash c) Multi-core CPUs + DRAM + Flash
+ energy-efficient hard disk (distributed disk farm, or for caching)
• Example system cluster– PicoBlades interconnected with optical interconnect– Energy-efficient computation
balanced with high-speed communication
– Hierarchical Optical Networking
1 – 4 Watt PicoBlade
200 W – 500 W example cluster of PicoBlades
Optical vs Electrical Interconnects
Delay distribution of 1 cm optical
Connect (Chen et. al SLIP 2005)PDP comparison of electrical And optical interconnect for a1 cm wire
Power consumption (mw) 1 CM
optical datapath
(source: Chen et. al SLIP 2005) Comparison of BW density today
PicoBlade – Rationale • Goal: performance/watt ~ 2000 (100x better than HS21 @ the
same technology node)• System Architecture - integrated and balanced approach to
blade design: – processor+I/O+memory with focus on energy per transaction
• Heterogeneous compute engines• Codesign of Electrical and Optical Interconnect
– Electrical local interconnets and Optical high-way interconnects– Where to draw the line with between electrical and optical
interconnect in the context of 3D chips– Optical interconnect between layers in 3D– Use of multiple wavelengths – Optical Clock Distribution
• Codesign of Interconnect and Processor Cores– Do we need large L2 caches if low latency, high bandwidth
interconnect is available via optical links?– Heterogeneous cores including FPGA fabrics for some cores.– Simple cores communicating to a large memory (DRAM) via 3D
interconnect technology.
Supporting Evidences1. ASAP (ISSCC 2006, HotChips 2006) - UC Davis
• 36 processor, 180 nm, scalable GALS array• 600+ MHz @ 2.0V, 600+ MOPS in 0.66 mm^2• 5-10x performance and 10-75x lower energy than
8-way VLIW TI C62X• 167 processor, 1.1 GHz, 65 nm chip in development: working!
2. 3D-Die Stacking (MICRO 2006) - Intel• 32MB stacked DRAM reduces cycles/memory access
(latency) by 15% to 55% with 0.08 degree rise in peak temp
• Off-chip BW is reduced by 66% on average• 3D scaling with 3D partitioning of a Pentium can increase
performance by 8% and reduce power by 34% • RMS workloads
3. PicoServer (ASPLOS 2006) - Michigan• 10x improvement in performance/watt over Pentium-4• On the same die area, 12 CPU with no L2 outperforms 8CPU with a
on-chip cache by 15% with 55% lower power• Tier1 Servers (Webservers workload)
4. MICRO 2006 - Cornell• On chip optical buses (2D, 4-12 wavelengths, Si waveguides, 64
core CMP in 32nm)• Effectively latency is reduced by 26%• 26% - 39% improvement in latency
SingleProcessor
OSC FIFOs
DMemIMem
Example: Optical Switch FabricExample: Optical Switch Fabric
Rapid Tuning (~ 1 nsec) of T_WC to achieve switching in
Wavelength, Time, Space domainsScalable to 42 Petabit/sec capacity32*(2562x2562) connectivity
T_WC
T_WC
T_WC
T_WC
F_WC
F_WC
F_WC
F_WC
switch control
Tunable Wavelength Converters
λ-router(AWGR)
controller
TIME
WAVELENGTH
SPACE
Fixed Wavelength Converters
Chip-Scale Optical Router Micro-system
S. J. B. Yoo, “Ultra-Low Latency Multi-Protocol Optical Routers for the Next Generation Internet,” U. S. Patent 6,925,257 B2 (2005).S. J. B. Yoo, “Integrated Optical Router,” U. S. Patent 6,768,827 (2004).S. J. B. Yoo, “Ultra-Low Latency Multi-Protocol Optical Routers for the Next Generation Internet,” U. S. Patent 6,519,062 (2000).S. J. B. Yoo, “Wavelength Converter with Modulated Absorber,” U. S. Patent 6,563,627 (2001).S. J. B. Yoo, “Compact Optical Receiver with Optical Signal Processing Capabilities,” U. S. Patent pending (2001).S. J. B. Yoo, G. K. Chang, “High-Throughput, Low-Latency Next Generation Internet Using Optical Tag Switching,” U. S. Patent 6,111,673.(1997)
1,152 slots of 40 Gb/s line cards in 80 shelves (72 linecard shelves and 8 fabric shelves)),
Power Consumption: ~0.85 MW
Weighs: 56,656 kg
Footprint: 50 m2
Each Port Protocol Specific up to OC-768
1024 x 1024One Semiconductor Chip Switching Fabric
Power Consumption: ~500 W (200W)
Weighs: 10 kg
Footprint: 0.1 m2
Each Port Protocol Independent up to OC-768Can achieve Packet /Burst /Circuit Switching
Conventional System All-Optical System on a Chip(scalable to 2 million x 2 million,
42 Petabit/sec interconnection)MA
CM
AC
46 Tb/s optical routing system on a Chip
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
CBuffer
MemoryBuffer
MemoryMA
CBufferMemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
High Level System View
• PicoBlade composition explorationa) Multi-core CPUs + DRAMb) Multi-core CPUs + DRAM + Flash c) Multi-core CPUs + DRAM + Flash
+ energy-efficient hard disk (distributed disk farm, or for caching)
• Example system cluster– PicoBlades interconnected with optical interconnect– Energy-efficient computation
balanced with high-speed communication
– Research topics: clustersize and inter-clustercommunication topology
1 – 4 Watt PicoBlade
200 W – 500 W example cluster of PicoBlades
Modulator
Tunable Filterλ1
PD
λ2
λ3
λ4
Off-chip optical comb light source
Bus waveguide
Hierarchical Ring Interconnect Architecture
Mesh Interconnect Architecture
CPU array + caches
Optical coherent ring+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
3-D nanophotonic-electronic multi-core architectures
CPU array + caches
Optical Interconnect+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
Flash DiskCPU array + caches
CPU array + caches
Optical coherent ring+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
Hierarchically Networked Pico-Blade Clusters
200 W – 500 W example cluster of PicoBlades
1 – 4 Watt PicoBlade
~1000 core processor
Optically Interconnected multi-chip PicoBlade
Optically Interconnected PicoBlade cluste
UC DavisComputer Engineering
Venkatesh AkellaProject Leaders:
UC DavisComputer Engineering
Soheil GhiasihafeziThrust Leader:
Thrust 2: Data Center Resource Management and Virtualization
Thrust 2Computing System Resource Management and Virtualization
Overall Goal
• Energy efficient operation– Derived by architectures
• Key architectural features– Integrated O/E for energy efficient high-throughput
» Inter-processor» Memory access (static, dynamic, non-volatile)
– Rethinking power hungry memory hierarchy» Distributed storage » flattened memory hierarchy
Proc ProcProcProc
Proc ProcProcProc
SingleProcessor
OSC FIFOs
DMemIMem
AsAP 1.0 [Baas et al.] Optically integrated building blocks
• Optical interconnect– High throughput– Energy efficient
• Eliminate energy hungry L2 and L3
– Local non-volatile memory
– Flattened memory hierarchy
• Distributed data storage and discovery?
• Moving data or task migration?
Distributed Data Storage
0 10 20 30 4012345678
Pow
er e
ffici
ency
(mW
/Gbp
s)
Bit Rate (Gbps)
2.26 1.96
Modulator
Tunable Filter
λ1
PD
λ2λ3
λ4
Off-chip light source
Bus waveguide
Proc ProcProcProc
Proc ProcProcProc
Wavelength Assignment and Scheduling
• Technology limitations of optical interconnects
– Full range “tuning” is expensive in terms of area, energy and practicality
• Wavelength is a new resource
– Should be managed judiciously
– Impact task assignment and application partitioning
• Does it really restrict applications we can realize?
– Minimum wavelengths to admit a given app
Power management
• Energy management– State transitions model– Markov chain analysis
• Scaling – to many processors
• Distributed version– Little global information– Throttle voltage locally or assign
to a remote processor?
idle
working
standby
[De-Micheli et al.][Pedram et al.]
λ1
λ2
λ3
λ4
Managing Heterogeneous Cores
Courtesy of IDF2006
Intelligent Resource Management
Courtesy of IDF2006
Resilient and Load-Balanced Operation
Courtesy of IDF2006
UC DavisEES.J. Ben YooStanfordEEDavid MillerUC DavisPhysicsKai LiuCornellEEMichal LipsonUC DavisEEM. Saiful IslamStanfordEEJames HarrisUC DavisCEBevan BaasProject Leaders:UCDavisCERajeevan AmirtharajahCo-Thrust Leader:StanfordEEShanhui FanThrust Leader:
Thrust 3: Co-Designed Nano Photonics and Nano Electronics Technology
Thrust 3: Co-Designed Nano Photonics and NanoElectronics Technology
CPU array + caches
Optical coherent ring+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
3-D nanophotonic-electronic multi-core architectures
CPU array + caches
Optical Interconnect+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
Flash DiskCPU array + caches
CPU array + caches
Optical coherent ring+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
31 Optical Comb Source
Flattest optical comb with 22 modes above -1.7 dBc. 31 modes above -20 dBc
Wavelength (nm)1548 1549 1550 1551 1552 1553 1554
Rel
ativ
e P
ower
(dB
)
-25
-20
-15
-10
-5
0
31 optical comb lines above -20 dBc
~Continuum (>1000 ch) Optical Comb Generation
1350 1400 1450 1500 1550 1600 1650
10-5
100
Wavelength (nm)
Inte
nsity
(a.u
.)
1520 1525 1530 1535 154010-4
10-3
10-2
10-1
Wavelength (nm)
Inte
nsity
1545 1550 1555 1560 156510-4
10-3
10-2
10-1
100
Wavelength (nm)
Inte
nsity
1570 1575 1580 1585 1590 159510-4
10-3
10-2
10-1
Wavelength (nm)
Inte
nsity
10μm
Return bend±2dB loss
Racetrack resonator
10μmPhotonic wire1E
1Y
2Y
2E
Si Wire Ring Resonator for low-power switching/routing
⎡ ⎤ ⎡ ⎤⎡ ⎤=⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦1 1
2 2
Y EY
ad Eb
c
R. Baets et. al., LEOS Annual Meeting 2004
Low drive power optical modulators
Prof. Lipson’s Group: Si Nano PICs with 58 µA @ 1.8 V switching
Courtesy of Axel Scherer, Caltech
Nano Photonic Crystal Waveguides1E 1Y
2Y 2E
Plasmonic Devices—matching optics and electronics
-50 0 50-1
-0.5
0
0.5
nm
E x Fie
ld
ε =
1
ε =
1ε
= -1
25.9
15ε
= 10
.24
ε =
-125
.915
ε =
1
ε =
1
neff = 9.744
Matching Photonic and Electronic Scales
• Optical power in, electrical data imprinted on waveform by modulator (CMOS Mach-Zender or MQWM)
• Plasmonics couple optical length scale to nanodevice scale• Multiwavelength optical clock delivery can simplify Tx/Rx architecture
– Multiple phases Φ1-ΦN, one on each wavelength, with very low jitter may eliminate local PLL and clock recovery
Courtesy Mark BrongersmaStanford
Courtesy of
M. Saif Islam
UCDavis, HP
Massively Parallel Nanowire Interconnection
•Good Ohmic Contacts
•Not labor intensive or costly
•Mass-manufacturable
1.E-16
1.E-15
1.E-14
1.E-13
1.E-12
1.E-11
1.E-10
A/R
Si Nanowire
Carbon Nanotube
Reza, Bosman, Islam, Kamins, Sharma and Williams, Submitted to IEEE Trans.
Nanotechnology, 2005
Nanotube
Nano-bridge
Hooge parameter 10-3
Low Noise Contacts
MRAMS
www.lbl.gov/.../sabl/2005/March/03-polarons.html
NanomagnetsNanomagnets
Nanoscale Architecture
Magnetic fingerprints
R. K. Dumas, et al, PRB 75, 134405 (2007).Leo Falicov Award, 2006 AVS;1st place Margaret Brubidge Award, 2005 APS-CA
Jared Wong 1st place Steven Chu Award, 2006 APS-CA
Prof. Kai Liu (U. C. Davis)
100nm
2D Multilayer 2D+1D 1D Nanowire 0D Nanodots 0D Antidots 0D Core/Shell NanoparticlesMultilayered Nanowire
SynthesisSmaller size; Higher density; Periodic arrayTunable geometry; Orientation; Distribution controlPotential applications in spintronics, magneticrecording media, MRAM…Single
domainVortexState
Task 3 Summary
• Nanophotonic interconnection plane– Si-nanowire switches/routers/modulators– Plasmonic waveguide Interfaces– Plasmonic Photonic Crystals, AWGs, Lasers– Nanowire interconnects, lasers, and
interconnects• 3-D interconnects
– Plasmonic vias– Nanowire vias
• Next Generation Nonvolatile Memories– MRAMs– Negative Index imaging memory
UC DavisEES.J. Ben YooStanfordEEDavid MillerUC DavisPhysicsKai LiuUCBerkeleyMEVan CareyUC DavisCERajeevan AmirtharajahProject Leaders:UC DavisEEM. Saiful IslamCo-Thrust Leader:UC DavisCEBevan BaasThrust Leader:
Thrust 4: System-on-Chip Integration
Thrust 4 System-on-Chip Integration
CPU array + caches
Optical coherent ring+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
3-D nanophotonic-electronic multi-core architectures
CPU array + caches
Optical Interconnect+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
Flash DiskCPU array + caches
CPU array + caches
Optical coherent ring+ modulators+ offchip
Laser+ detectors
Memory
HEAT SINK
Thrust 5 Testbed and Application Studies
• Emulation studies of the next generation data center/supercomputer
• Simulation studies of the next generation data center/supercomputer
• Learn from research data center testbedsand try out new prototypes
Incollaboration withProf. Bernd HamannUCD IDAVand Tom Nesbitt UCDMC
• Real-time 3-D Visualization will require ~1 Tb/sec
bandwidth in the future
1000
x 1000
x 1000 pixels/frame
x 28 bit/pixel
x 30 frames/sec
~ 1 Tb/sec
Overlay Photos: IEEE Spectrum, October 2006
Data Centers for Health Care and Telemedicine
Bird’s Eye View of the Schedule
Overlay Photos: IEEE Spectrum, October 2006
Workplans and Deliverables-Phase I• Phase I: April 2008-March 2013• Phase II: April 2013- March 2018• Year 1:
– Next Generation Data Center Architecture Comparisons and Full Simulations
– Virtualization and Resource Management Plans– Pico Blade I architecture design– Plasmonic, AWGs, Ring Photonic Device Operational
• Year 2: – Next Generation Data Center Architecture Refinement– Virtualization and Resource Management Implementation– Pico Blade I architecture prototyping– Plasmonic, AWGs, Ring Photonic Device with Electronic Interface
Operational• Year 3:
– Pico Blade I completion– Ring based interconnects operational
• Year 4:– Pico Blade I cluster with virtualization operational – Optical Interconnect Plane – ASAP II Integration operational
Workplans and Deliverables-Phase II
• Year 5: – Pico Blade II completion– Plasmonic, AWGs, Ring Photonic Device with Electronic Interface
Integration with Vertical Vias to ASAP III
• Year 6: – Next Generation Architecture x100 performance/watt completion
• Year 7: – Optical Interconnect- Memory-Processor Plane Design– Testbed with Pico Blade II cluster operational
• Year 8: – Medical Application with Pico Blade II cluster operational– Optical Interconnect- Memory-Processor Plane Co-Design– Pico Blade III design
• Year 9:– Optical Interconnect- Memory-Processor Plane Integration
• Year 10: – Next Generation Architecture x1000 performance/watt completion
NanoPhotonic Interconnect Memory Processor Plane Integration
Next Generation Data Centers and Computing SystemsS. J. B. Yoo, V. Akella, R. Amirtharajah, B. Baas, K. Bergman, V. Carey, S. Fan, S. Ghiasi,
J. S. Harris Jr., M.S. Islam, M. Lipson, K. Liu, D.A.B.Miller, J. Shackelford, and many others
Figure: J. D. Joannopoulos, P. R. Villeneuve, and S. Fan, Nature, vol. 386, pp. 143 (1997)
10 year team efforts towardsX100~1000 improved performance/power
We appreciate your Partnership
June 18, 2008
What is Proposed:10 year project aiming at x100~x1000 improvement in power efficiency and miniaturizationNew Computing Architecture combining optics, electronics, and embedded intelligence New Virtualization and Resource ManagementHardware
Micro/Nano PhotonicsNano ElectronicsNon-volatile nano memory
Thermal EngineeringSystems IntegrationEmulation and Experimental Testbed Studies
June 18, 2008
CITRIS NeT on CalREN-XD
June 18, 2008
NCSA/UIUC
ANL
UICMultiple Carrier Hubs
Starlight / NW Univ
Ill Inst of Tech
Univ of ChicagoIndia(Abi
I-WIRE
Pasadena
San Diego
DTF Backplane(4xλ: 40 Gbps)
Abilene
Chicago
Indianapolis
Urbana
Source: Charlie Catlett, Argonne
StarLight: Int’l Optical Peering Point(see www.startap.net)
CITRIS-net, Calren-XD,..Global Grid
UCDUCDMC
UCB
LBL
UCSCStanford
SLAC
LLNL
NASA
Res. Park
UCSB
UCLA
USC
ISIUCI
UCSD
SDSCSPAWAR
SDSU
JPL
Caltech
UCR
UCM
June 18, 2008
Networking Topics
Optical NetworkingHeterogeneous NetworkingData Center NetworkingCITRIS TestbedsHealthcare IT
June 18, 2008
Photonic Networking Trends
Optical Internetworking
Router
RouterSONET
NESONET
NE SONETNE
SONETNE
Optical Network
Switch
Switch
Router
Switch
WDM WDM
WDM
OXCOXC
OXC
DWDM &Optical Label
SwitchingDWDM
SONETATM
IP
DWDM
SONET
IP & MPLS
DWDM &Optical Switching
Thin SONETIP & GMPLS IP & GMPLS Optical
Label
June 18, 2008
Progress in Optical NetworksProgress in Optical Networks
Capacity
Function
Ring Mesh
Dynamic
Static
Optical Packet
Pt-to-PtSingle Channel
WDM
Optically Amplified
Optical Add/Drop
Topology
Optical Circuit Switching
Optical Burst Switching
Optical Packet Switching
Optical
Label
Switching
June 18, 2008Client networksClient networks
All-Optical Label Switching Router All-Optical Label Switching Router
fiberdelay
Label reader
DEM
UX
NC&M
SwitchingFabric
Label Processing Module-TI(LP-TI)
OLS Edge Router
CI CICI
OLE OLR OLE OLR OLE OLR
IP Router ATM Client Machine
500 psec/div
500 psec/divUNAS
...
Switch Controller w/ ForwardingLook-up Table
EN ADDR5ADDR4
ADDR3ADDR2
ADDR1ADDR0
RS232Connector
Micro-Controller
Tunable LaserCurrent Driver TEC Controller
Power Connector
SRAMs Buffers
RS232Driver
s
June 18, 2008
Chip-Scale Optical Router Micro-system
S. J. B. Yoo, “Ultra-Low Latency Multi-Protocol Optical Routers for the Next Generation Internet,” U. S. Patent 6,925,257 B2 (2005).S. J. B. Yoo, “Integrated Optical Router,” U. S. Patent 6,768,827 (2004).S. J. B. Yoo, “Ultra-Low Latency Multi-Protocol Optical Routers for the Next Generation Internet,” U. S. Patent 6,519,062 (2000).S. J. B. Yoo, “Wavelength Converter with Modulated Absorber,” U. S. Patent 6,563,627 (2001).S. J. B. Yoo, “Compact Optical Receiver with Optical Signal Processing Capabilities,” U. S. Patent pending (2001).S. J. B. Yoo, G. K. Chang, “High-Throughput, Low-Latency Next Generation Internet Using Optical Tag Switching,” U. S. Patent 6,111,673.(1997)
June 18, 2008
1,152 slots of 40 Gb/s line cards in 80 shelves (72 linecard shelves and 8 fabric shelves)),
Power Consumption: ~0.85 MW
Weighs: 56,656 kg
Footprint: 50 m2
Each Port Protocol Specific up to OC-768
1024 x 1024One Semiconductor Chip Switching Fabric
Power Consumption: ~500 W (200W)
Weighs: 10 kg
Footprint: 0.1 m2
Each Port Protocol Independent up to OC-768Can achieve Packet /Burst /Circuit Switching
Conventional System All-Optical System on a Chip
MA
CM
AC
46 Tb/s optical routing system on a Chip
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
CBuffer
MemoryBuffer
MemoryMA
CBufferMemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
BufferMemory
BufferMemoryM
ACBuffer
MemoryMA
C
MA
CM
AC
20 nJ/b system 10 pJ/b system
June 18, 2008
Today’s Internet– Success and Failure
Distributed ManagementHeterogeneous ScalabilityTransparent Hourglass
But…it Fails oftenRequires human interventionsVulnerable to security attacksDoes not work towards team resultsNever learns and prone to make the same mistakes
Future Internet will face more security attacks, more diverse physical layers, and more demanding applications
June 18, 2008
Example– Denial of ServiceMalicious attacks
Each network element only sees his/her perspective (surge in traffic) and fails to see a pattern
Vulnerability and failures
Restores only after patches are developed days later (too late)
June 18, 2008
Learn from Biological Systems• Biological Systems work remarkably well through coordination between Brain, Reflex, Sensors, and Actuators.• Brain reasons from partial and conflicting informationfrom sensors by extracting Spatio-Temporal patterns• Reflex provides rapid hard-wired responses • Brain-Reflex provides rapid yet reasonable responses• Cognitive Learning capabilities• Homeostasis -- excellent control of critical functions • Immunization and Vaccination (anti-body)• Teamwork– Colony of ants and bees capable of locating source of food, finding the best path to that source and transmitting that information to other members of their team
June 18, 2008
Cognitive Network Control and Management:Brain-Reflex like Signaling and Control
Brain: Interelement Control Slow but elaboratePerformance monitoring based on labelsAnomaly detectionOverall view of network (topology)Listens and instructs the Reflex
Reflex: Distributed ControlRapid and reflex-likePacket forwardingAnomaly detection Communicates with the Brain
Network Control and Management
June 18, 2008
SensorNetwork
Storage Area Network
Core RouterIPNE
DATA
LABEL
Legacy IP Network
WirelineMPLSIPNE
DATA
LABEL
Optical Label SwitchingNetwork
WirelineO-CDMA LAN
SatelliteNetwork
ReconfigurableWirelessNetwork
Next Generation Heterogeneous Networking
• Internet becoming more and more diverse in application and technology
• Any application on IP, IP on any networking technology
• End-to-end principle
June 18, 2008
100G serial transport
•Use single wavelength (can be multi-level)• Needs 100 G (or 2x50G) electronics• Better spectral efficiency but more sensitive to dispersion and PMD
• Use multiple wavelengths & modulators• Needs 10 G electronics with possible synchronization• Manageable dispersion and PMD but poorer spectral efficiency
(b)
lase
rla
ser
lase
rla
ser
lase
rla
ser
lase
rla
ser
lase
rla
ser
100G parallel transport (OTN VCAT)
Pros and Cons of Serial vs. Parallel 100 G
June 18, 2008
Universal 100 G ~10 G transmitter with built-in dispersion equalization
At a glance, this isuseful for parallel 40G/100G Trx/Rcv with independent ASK, PSK, DPSK, QPSK, DQPSK, etc.
AM PM
AM PM
AM PM
DEM
UX M
UX
June 18, 2008
-300 -200 -100 0 100 200 300-60
-40
-20
0
Frequency (GHz)
Inte
nsity
(dB
)
-300 -200 -100 0 100 200 3000
0.5
1
Time (ps)
Inte
nsity
(au)
-300 -200 -100 0 100 200 300Time (ps)
Pha
se (1
rad/
div)
Pre-chirping 100 Gb/s OOK signal for 1000 km transmission
June 18, 2008
-300 -200 -100 0 100 200 3000
0.5
1
Time (ps)
Inte
nsity
(au)
-300 -200 -100 0 100 200 3000
0.5
1
Time (ps)
Inte
nsity
(au)
-300 -200 -100 0 100 200 300Time (ps)
Pha
se (1
rad/
div)
Real
Imag
Real
Imag
OOK constellation at transmitter OOK constellation at receiver
Prechirped 100 Gb/s OOK signal at receiver after 1000 km transmission (simulated)
June 18, 2008
Peebles Africa• Fixed or mobile platform wireless mesh networking• Rapidly reconfigurable and self-forming cognitive networking• Low-cost, high-performance delivery of:
• ~54 Mb/s for current 802.11 technology (~200 km)• ~ 1 Gb/s Ethernet for current GigE (10 km=> 200 km)• ~ 10 Gb/s Ethernet for optical wireless (~100km)• ~ 1 Tb/s (100 x 10 Gb/s) for WDM optical wireless (~100km)
Wireless Mesh Telemedicine and Emergency Response
June 18, 2008
Wireless Mesh Networks
Multipath wireless meshwith Network Coding
Hierarchical wireless meshnetworking
Example: On-Demand Ambulance NetworkingCognitive Ad Hoc Wireless Mesh Networking with Mobility
Management and Intelligent Beam Forming
June 18, 2008
CITRIS NeT
June 18, 2008
June 18, 2008
Intelligent Network Elements
Observe-Analyze-ActCognitive Learning
Supervised LearningUnsupervised LearningReinforcement Learning
Neural Net PDPSpatio-Temporal data miningStatistical SummaryLabel and Tag switchingHiearchical Intelligence(Genetic Programming)
w/ Profs. Vemuri, Wu, Katz
June 18, 2008