applied micro circuits corporation challenges in making
TRANSCRIPT
CONFIDENTIAL
Keith MorrisKeith MorrisDirector of MarketingDirector of Marketing
Switching & Network ProcessingSwitching & Network [email protected]@amcc.com
Challen ges in Makin g HighlyChallen ges in Makin g HighlyIntegrated Network ProcessorsIntegrated Network Processors
Applied Micro Circuits CorporationApplied Micro Circuits Corporation
Switching & Network Processing 2CONFIDENTIAL
AgendaAgenda
z NPU Definitionz Challen ges & Solutions
z Physicalz Performancez Flexibilityz Scalability & Re-use
z Puttin g it all to gether – Inte grated NPU
Switching & Network Processing 3CONFIDENTIAL
NPU DefinitionNPU Definition
NPUSoftware
Programmable
MEMORY
Co-Processors
NetworkingInterfaces
Context & Queuing(Big & Fast)
Fast EthernetGigabit Ethernet
SPIx
Back Plane /Fabric
Interface
Special Functions &Host CPU
Switching & Network Processing 4CONFIDENTIAL
Context Handling – The Ultimate BottleneckContext Handling – The Ultimate Bottleneck
NPU or ASIC Performance
Read & Write time of Flow Context
ApplicationPerformance
Varies ByApplication
&NPU/ASIC
Architecture
Switching & Network Processing 6CONFIDENTIAL
Challenge: Physical ConstraintsChallenge: Physical Constraints
System Power Dissipation
Small Silicon Die Area
40ns
Packet Arrival Rate
Efficient nPcoreArchitecture
Switching & Network Processing 7CONFIDENTIAL
Networking Optimized Execution PipelineNetworking Optimized Execution Pipeline
T0 T1 T2 T3 T4 T5T0 T1
Fetch
Decode
Read
Task Sel
Execute
Task Sel
FetchFetch
Decode
Read
Task Sel
Execute
Task Sel
Fetch
Decode
Read
Task Sel
Fetch
Decode
Task Sel
Wr Back
Case Statements and Conditional Jumps donot cause pipeline breaka ges
Extremely important for deterministicperformance in decision rich data pathprocessin g
Overlappin g Virtual Processors
Switching & Network Processing 8CONFIDENTIAL
Zero Cycle Task Switching – Hides LatencyZero Cycle Task Switching – Hides Latency
Pool of Virtual Processors
0
1
2
n
Resource 1
Resource 2
Command FIFO
TIME
Elapsed Time for Red Packet Task
Request FIFO
Switching & Network Processing 9CONFIDENTIAL
Frequency, Power and Die Area for nPcoresFrequency, Power and Die Area for nPcores
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
1998 1999 2000 2001 2002
Year
No
rma
lize
d v
al
u T echnology (um)
Frequency (M Hz)
Power Performance(M IPS/power)Die Area Performance(M IPS/mm^2)
30 –70 fold increasein NPU performance
ratios in 4 years
Switching & Network Processing 10CONFIDENTIAL
Challenge: Scaling Processing PerformanceChallenge: Scaling Processing Performance
Parallel – Sin gle Stage
Pipeline – Multi-Sta ge
z Complex software partitionin gz Very sensitive to latency
z Each sta ge must complete within thesmallest packet time
z One sta ge becomes the bottleneckz Poor Scalability
z Eventually some sta ges may need tobecome parallel
9
8z Allows simplest pro grammin g model
z Run to completionz Not sensitive to latency
z Elapsed time to process a packet canbe very lon g if required
z Highly scalablez 100’s of tasks, multiple processors
Processor
Switching & Network Processing 11CONFIDENTIAL
Multi-Stage – Complex and InefficientMulti-Stage – Complex and Inefficient
Application Code
N N threadsthreads
core 1core 1
N N threadsthreads
core 2core 2
N N threadsthreads
core core NN
programsegment
A
programsegment
B
programsegment
N
need addt’l code to stitch each
segment together
Developer must:• Subdivide
algorithm• Load-balance
segments• Stitch
segmentstogether
Performanceissues with:
• Underutilizedcores
• 1 segment canoverrun next
• Hand-offsbetween cores
• Locking sharedresources
Switching & Network Processing 12CONFIDENTIAL
AMCC: 1-Stage – Fundamentally SimplerAMCC: 1-Stage – Fundamentally Simpler
nPcore 1nPcore 1 Thread 1Thread 1Thread 2Thread 2
Thread Thread NN
nPcore 2nPcore 2
nPcore nPcore NN
Thread 1Thread 1Thread 2Thread 2
Thread Thread NN
Thread 1Thread 1Thread 2Thread 2
Thread Thread NN
;; cannot handle.;;;; An encapsulation header is used to communicate the;; source-port-id and the errno value as follows:;;;; +====+====+====+====+====+====+====+====+;; | _0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 |;; +====+====+====+====+====+====+====+====+;; | 81 | 00 | S-PORT | ERRNO | N/A |;; +----+----+----+----+----+----+----+----+;;;;===================================================================
pkt_fwd_to_cpu
#define DB_HDL r6
;; retrieve the control connection switch header from thenPkernel ;; unicast port table.
set_entry_db ucast_port_table, DB_HDL, APP_CPU_PORTID read_first_db ucast_port_table, DB_HDL, r0
;; setup the encap header
• Same program image loadedon each thread
• A packet runs to completionon a single thread
9
Switching & Network Processing 13CONFIDENTIAL
Challenge: Flexibility vs. PerformanceChallenge: Flexibility vs. Performance
SolutionFlexibility
100%
SolutionEfficiency
100%Software Implementation (CPU)
100%Hardware Implementation (ASIC)
1
NPUTarget
Operating Range
Switching & Network Processing 14CONFIDENTIAL
Solution: Co-ProcessorsSolution: Co-Processors
Input Controller
Output ControllerTransform
Req FIFO
Cm
d F
IFO
Dat
a F
IFO
Channel
nPcore
Mem
ory
PolicyEngine
SearchMachine
SPU
XSCI/F
OFFCHIP
Switching & Network Processing 15CONFIDENTIAL
High Speed Single-Step Packet ClassificationHigh Speed Single-Step Packet Classification
SoftwareSoftwareCode BlockCode Block
_________________________________________________________________________________
IPv4
_________________________________________________________________________________
IPv6
_________________________________________________________________________________
OAMcell
___________________________________________
___________________________________________
___________________________________________
___________________________________________
If-else (1)If-else (2)
If-else (3)If-else (4)
SoftwareSoftwareCode BlockCode Block
_________________________________________________________________________________
IPv4
_________________________________________________________________________________
IPv6
_________________________________________________________________________________
OAMcell
___________________________________________
___________________________________________
___________________________________________
___________________________________________
If-else (1)If-else (2)
If-else (3)If-else (4)
•Policy Enginesaves many instructions compared to using RISCsoftware to classify packets !
Is this a cell or a packet?
If a cell, which type?
L2 or L3 packet?
•Policy Enginealso simplifiesprogramming, makingit modular•RISC for classificationis instruction hungry,not modular
Switching & Network Processing 16CONFIDENTIAL
Efficient Packet Modification & EncapsulationEfficient Packet Modification & Encapsulation
Transform
&0'��
&0'��
&0'��
&0'��
&0'��
&0'�� &0'�� &0'�� &0'�� &0'��
nPcore
Switching & Network Processing 17CONFIDENTIAL
PHYSICAL PORT• OC-192
SUB-PORTS• Fixed sub-division of the physical port• OC-192, -48, -12, -3, GE• Flows are scheduled according tominimum rate, class of service andweight at this level
VIRTUAL PIPES• A collection of flows that have anaggregate maximum rate• Pipe ensures that provisioned rate isnot exceeded• For example - all flows for anindividual subscriber, network or traffictype
Complex Bandwidth ProvisioningComplex Bandwidth Provisioning
FLOWS• Smallest scheduled entity• May have guaranteed minimum bandwidth• Individually weighted within a pipe
Switching & Network Processing 18CONFIDENTIAL
Challenge: Scalability & Re-useChallenge: Scalability & Re-use
16 x 2.5Gbps
40GbpsToday
160GbpsToday
640Gbps
“Tomorrow”
“Tomorrow”
DS-3 to OC-48c OC-3 to OC-192c OC-12 to 4 x 0C-192
One Code Base
+MPLS +DiffServ +IPv6+VPN
16 x 10Gbps 16 x 40Gbps
Maximum Software ROI
Switching & Network Processing 19CONFIDENTIAL
Solution: nP Programming InfrastructureSolution: nP Programming Infrastructure
nP
nPkernel
nPdebug
Tra
ffic
Man
ager
Fra
mer
nPk M
essageH
andler
StandardnPMs
nPapplication Ingress Data Alg
nPk M
essageH
andler
External MemorySearchCoprocessor
nPapplication Egress Data Alg
nPk DBDrivers
nPk DBDrivers
nPk Drivers
nPk DBDrivers
nPapplicationnPMs
On Chip Coprocessor
Switching & Network Processing 20CONFIDENTIAL
“Runtime Libraries”:“Runtime Libraries”:Reducin g code customer must actually writeReducin g code customer must actually write
z 99+% of total Lines of Code (LOC) are on the control CPUz Next, nPkernel alone provides up to 95% of actual NPU code, i.e.
95% of the 1%z So NPU-resident portion of app often only 100-200 total LOCz AMCC sample app u-code provides most of that 100-200 LOC, all in
some casesz Actual customer-generated code typically much less than 5% of 1% !!
5%
nPkernel
A pp lication
Switching & Network Processing 21CONFIDENTIAL
Putting it all together - Integrated NPUPutting it all together - Integrated NPU
I/O a
nd F
ram
er b
lock
s(I
nteg
rate
d M
AC
s) nPCoreCluster
Algorithmicsearch
Packet Transform Engine
FrameI/F
Channel Buffers Checksums
Workspace PolicyEngine
SPI4 P2NPSI
SP
I4.2
- C
hip
to C
hip
Inte
rcon
nect
SEARCH I/F DRAM I/F DRAM I/F QDR I/F
Data Coherency(SPU)
CPU Interface Debug JTAGGPIO
Non Blocking Memory Access Unit
TrafficManagementCo-Processor
Line
Inte
rfac
es
Cache
Switching & Network Processing 22CONFIDENTIAL
SummarySummary
z NPU Definition – No performance penaltyz Challen ges & Solutions
z Physical – More MHz won’t cut itz Performance – Cannot be at the price of usabilityz Flexibility – Doesn’t mean 100% softwarez Scalability & Re-use – Any Protocol, Any Speed,
One Architecture
z Puttin g it all to getherz Avoiding System Problems & Side Effects