hasim on-chip network model configuration
DESCRIPTION
HAsim On-Chip Network Model Configuration. Michael Adler. IMEM. FET. The Front End Multiplexed. Legend: Ready to simulate?. 1. redirect. No. CPU 1. CPU 2. (from Back End). training. 1. Line Pred. (from Back End). Branch Pred. 1. 2. fault. vaddr. pred. 1. mispred. 0. 1. - PowerPoint PPT PresentationTRANSCRIPT
INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY
HAsim On-Chip Network ModelConfiguration
Michael Adler
INTEL CONFIDENTIAL2
The Front End Multiplexed
FET
BranchPred
IMEM PCResolve
InstQ
I$
ITLB1 1 1 0
1
2
0
0first
deq
slot
enqor
drop
1
fault
mispred
1training
pred
rspImm
rspDel
1
1redirect
1vaddr
(from Back End)
vaddr
0
(from Back End)
paddr
0paddr
1
LinePred
00
instor
fault
Legend: Ready to simulate?
CPU1No CPU
2
FET IMEM
INTEL CONFIDENTIAL3
On-Chip Networks in a Time-Multiplexed World
INTEL CONFIDENTIAL4
Problem: On-Chip Network
CPUL1/L2 $
msg credit
Memory Control
rr r r
[0 1 2] [0 1 2]
CPU 0L1/L2 $
CPU 1L1/L2 $
CPU 2L1/L2 $
r
router
msg msg
credit credit
• Problem: routing wires to/from each router• Similar to the “global controller” scheme• Also utilization is low
INTEL CONFIDENTIAL5
Router0..3
Multiplexing On-Chip Network Routers
Router3
Router0
Router2
Router1
cur to 1 to 2 to 3 fr 1 fr 2 fr 30123
0
001
1
1 2 3
2
2 33
reorder
reorder
reorder
σ(x) = (x + 1) mod 4
σ(x) = (x + 2) mod 4
σ(x) = (x + 3) mod 4
1 2 3
0
001
12
2 33
Simulate the network without a network
INTEL CONFIDENTIAL6
On-Chip Network Model Multiplexed Topology
L2 Coherence LLC Hub
OCN_LANE_RECV_Core_1 portDataEnq, 1
OCN_LANE_RECV_Core_2 portDataEnq, 1
CorePvtCache_to_UncoreQ portDataEnq, 1
Uncore_to_CorePvtCacheQ__cred, 1CorePvtCache_to_UncoreQ cred, 1
Uncore_to_CorePvtCacheQ__portDataEnq, 1
Core_OCN_Connection
OCN_LANE_RECV_Core_0 cred, 1
OCN_LANE_RECV_Core_1 cred, 1
OCN_LANE_RECV_Core_2 cred, 1
OCN_LANE_SEND_Core_0 portDataEnq, 1
OCN_LANE_SEND_Core_1 portDataEnq, 1
OCN_LANE_SEND_Core_2 portDataEnq, 1OCN_LANE_RECV_Core_0 portDataEnq, 1
LLC
LLC_to_MEM_req cred,1
MEM_to_LLC_rsp portDataEnq, 1
LLCHub_to_LLC_req__portDataEnq, 1
LLC_to_LLCHub_rsp cred, 1
Mesh Network
mesh_interconnect_credit_E, 1
mesh_interconnect_enq_W, 1mesh_interconnect_enq_S, 1mesh_interconnect_enq_N, 1mesh_interconnect_enq_E, 1
mesh_interconnect_credit_W, 1mesh_interconnect_credit_S, 1mesh_interconnect_credit_N, 1
Core_OCN_Connection_InQ_credit, 1Core_OCN_Connection_InQ_enq, 1
MemoryController
ocn_to_memctrl_credit, 1
ocn_to_memctrl_enq, 1
OCN_LANE_SEND_Core_0 cred, 1
OCN_LANE_SEND_Core_1OCN_LANE_SEND_Core_2
cred, 1cred, 1
Core_OCN_Connection_OutQ_credit, 1
Core_OCN_Connection_OutQ_enq, 1
LLC_to_MEM_req portDataEnq, 1MEM_to_LLC_rsp cred, 1
LLCHub_to_LLC_req__cred, 1LLC_to_LLCHub_rsp portDataEnq, 1
memctrl_to_ocn_credit, 1memctrl_to_ocn_enq, 1
INTEL CONFIDENTIAL7
HAsim’s Network Model is Abstract
• In a software model the target network can be built at run-time• Dynamism is expensive in FPGAs and recompilation is slow• Solution: Constrained dynamism
– Fixed parameters: Max nodes, max edges per node, max VCs– Dynamic:
• Number of active contexts (nodes)• Endpoints of each edge (indirection table)• Routing table• Address mapping of distributed LLC
INTEL CONFIDENTIAL8
Topology Manager
• Software – runs once at startup so no need to optimize• HASIM_CHIP_TOPOLOGY_CLASS:
– Manages streaming of parameters to the FPGA– Iterates over all software topology mapping classes until convergence
• Namespace defined by dictionaries– .dic files are preprocessed by LEAP tools– Hierarchy of enumerated types
INTEL CONFIDENTIAL9
How do I…
• Map address ranges to LLC segments?• Map target cores to nodes?• Pick a number of memory controllers and map them to nodes?• Define a target machine network topology?• Manage interleaving for multiplexing the network and cores?
INTEL CONFIDENTIAL10
Map Address Ranges to LLC Segments (SW)
Build a table of n_llc_map_entries, where each entry is an index to a portion of the distributed LLC.
icn-mesh.cpp:
for (int addr_idx = 0; addr_idx < n_llc_map_entries; addr_idx++){ bool is_last = (addr_idx + 1 == n_llc_map_entries); topology->SendParam(TOPOLOGY_NET_LLC_ADDR_MAP, &cores_net_pos[addr_idx % num_cores], sizeof(TOPOLOGY_VALUE), is_last);}
INTEL CONFIDENTIAL11
Map Address Ranges to LLC Segments (FPGA)
Consume the table that was streamed in from SW
last-level-cache-no-coherence.bsv:
// Define a node that will stream in the topology. This builds a node// on a ring. The node looks for messages tagged TOPOLOGY_NET_MEM_CTRL_MAP// and emits associated payloads.let ctrlAddrMapInit <- mkTopologyParamStream(`TOPOLOGY_NET_MEM_CTRL_MAP);
// Allocate a local memory and initialize it with the streamed-in entries.LUTRAM#(Bit#(TLog#(TMul#(8, MAX_NUM_MEM_CTRLS))), STATION_ID) memCtrlDstForAddr <- mkLUTRAMWithGet(ctrlAddrMapInit);
// Map an address to a node ID using the tablefunction STATION_ID getMemCtrlDstForAddr(LINE_ADDRESS addr); // Use the low bits of the address as the index (resize does this). return memCtrlDstForAddr.sub(resize(addr));endfunction
INTEL CONFIDENTIAL12
Map Address Ranges to LLC Segments (LLC Hub)
rule . . . // Incoming request from core if (m_reqFromCore matches tagged Valid .req) begin // Which instance of the distributed cache is responsible? let dst = getLLCDstForAddr(req.physicalAddress);
if (dst == local_station_id) begin // Local cache handles the address. if (can_enq_reqToLocalLLC &&& ! isValid(m_new_reqToLocalLLC)) begin // Port to LLC is available. Send the local request. did_deq_reqFromCore = True; m_new_reqToLocalLLC = tagged Valid LLC_MEMORY_REQ { src: tagged Invalid, mreq: req }; debugLog.record(cpu_iid, $format("1: Core REQ to local LLC, ") + fshow(req)); end end else if (can_enq_reqToRemoteLLC && ! isValid(m_new_reqToRemoteLLC)) begin // Remote cache instance handles the address and the OCN request port is available. // // These requests share the OCN request port since only one type of request goes to // a given remote station. Memory stations get memory requests above. LLC stations get // core requests here. did_deq_reqFromCore = True; m_new_reqToRemoteLLC = tagged Valid tuple2(dst, req); debugLog.record(cpu_iid, $format("1: Core REQ to LLC %0d, ", dst) + fshow(req)); end end . . . endrule
INTEL CONFIDENTIAL13
Map Cores and Memory Controllers to Nodes
• All computed (currently) in icn-mesh.cpp• Given number of target cores and number of memory controllers:
– Builds a rectangle of cores as close to square as possible– Adds a row of memory controllers at the top and bottom– Topology streamed to FPGA using same mechanism as address mapping
E.g., 15 cores and 3 memory controllers:
x M M xC C C CC C C CC C C CC C C xx M x x
INTEL CONFIDENTIAL14
Network Topology:Map Cores/Memory Controllers to Nodes
• Multiplexed order of nodes is the same as order of cores– No permutations required for local port
• Nodes are connected to:– Core– Memory controller– Nothing
• The node doesn’t care what is connected!• Hide indirection in ports
INTEL CONFIDENTIAL15
Network Topology:Map Cores/Memory Controllers to Nodes
In icn-mesh.bsv:
// // Local ports are a dynamic combination of CPUs, memory controllers, and // NULL connections. // // localPortMap indicates, for each multiplexed port instance ID, the type // of local port attached (CPU, memory controller, NULL). // let localPortInit <- mkTopologyParamStream(`TOPOLOGY_NET_LOCAL_PORT_TYPE_MAP); LUTRAM#(Bit#(TLog#(TAdd#(TAdd#(MAX_NUM_CPUS, 1), NUM_STATIONS))), Bit#(2)) localPortMap <- mkLUTRAMWithGet(localPortInit);
PORT_SEND_MULTIPLEXED#(MAX_NUM_CPUS, OCN_MSG) enqToCores <- mkPortSend_Multiplexed("Core_OCN_Connection_InQ_enq"); PORT_SEND_MULTIPLEXED#(MAX_NUM_MEM_CTRLS, OCN_MSG) enqToMemCtrl <- mkPortSend_Multiplexed("ocn_to_memctrl_enq"); PORT_SEND_MULTIPLEXED#(NUM_STATIONS, OCN_MSG) enqToNull <- mkPortSend_Multiplexed_NULL();
let enqToLocal <- mkPortSend_Multiplexed_Split3(enqToCores, enqToMemCtrl, enqToNull, localPortMap);
INTEL CONFIDENTIAL16
Network Topology: Defining Inter-Node Edges
Each network node:
Local
N
E
S
W
INTEL CONFIDENTIAL17
Network Multiplexing
• Logically, there are n nodes in the network.• Each has a local port connected either to a core, to memory or to
nothing.• Network connection mapping and routing will determine the
topology.• Topology manager defines the routing table.
• Note: Dateline not yet implemented
INTEL CONFIDENTIAL18
Network Topology and Routing
Torus:
INTEL CONFIDENTIAL19
Network Topology and Routing
Mesh (connections identical, routing table ignores some edges):
INTEL CONFIDENTIAL20
Network Topology and Routing
Bi-directional ring:
INTEL CONFIDENTIAL21
Network Topology and Routing
Uni-directional ring:
INTEL CONFIDENTIAL22
Router0..3
Final Problem: Multiplexing On-Chip Network Routers
Router3
Router0
Router2
Router1
cur to 1 to 2 to 3 fr 1 fr 2 fr 30123
0
001
1
1 2 3
2
2 33
reorder
reorder
reorder
σ(x) = (x + 1) mod 4
σ(x) = (x + 2) mod 4
σ(x) = (x + 3) mod 4
1 2 3
0
001
12
2 33
INTEL CONFIDENTIAL23
Network Topology:Communication Across Multiplexed Nodes
• Each node talks to a different multiplexed node instance• Naïve port binding would have each node talk only to itself• A-Ports are already buffered• Bury transformation in A-Ports• Retain simple read next / write next port semantics within models
INTEL CONFIDENTIAL24
Network Topology:Communication Across Multiplexed Nodes
icn-mesh.bsv: // Initialization from topology manager ReadOnly#(STATION_IID) meshWidth <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_WIDTH); ReadOnly#(STATION_IID) meshHeight <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_HEIGHT);
// Outbound and inbound ports are loopbacks to the same multiplexed module. Ports connect // to logically different nodes but physically to the same simulator object. Vector#(NUM_PORTS, PORT_SEND_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqTo = newVector(); Vector#(NUM_PORTS, PORT_RECV_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqFrom = newVector();
// Outbound port is a normal A-Port. It has no buffering. enqTo[portEast] <- mkPortSend_Multiplexed("mesh_interconnect_enq_E");
// Inbound port provides buffering for multiplexing. Instead of forwarding messages FIFO // it must transform the messages so they cross to the correct multiplexed instance when // instances (nodes) are traversed sequentially. enqFrom[portWest] <- mkPortRecv_Multiplexed_ReorderLastToFirstEveryN("mesh_interconnect_enq_E", 1, meshWidth, meshHeight); . . . enqFrom[portEast] <- mkPortRecv_Multiplexed_ReorderFirstToLastEveryN("mesh_interconnect_enq_W", 1, meshWidth, meshHeight); enqFrom[portSouth] <- mkPortRecv_Multiplexed_ReorderFirstNToLastN("mesh_interconnect_enq_N", 1, meshWidth); enqFrom[portNorth] <- mkPortRecv_Multiplexed_ReorderLastNToFirstN("mesh_interconnect_enq_S", 1, meshWidth);