ti 6614 multicore

XMC external memory Controller

C6614/6612 Memory SystemMPBU Application Team Multicore TrainingAgendaOverview of the 6614/6612 TeraNet Memory System DSP CorePac Point of ViewOverview of Memory MapMSMC and External Memory Memory System ARM Point of ViewOverview of Memory MapARM Subsystem Access to MemoryARM-DSP CorePac CommunicationSysLib and its librariesMSGCOM PktlibResource Manager

Multicore TrainingAgendaOverview of the 6614/6612 TeraNet Memory System DSP CorePac Point of ViewOverview of Memory MapMSMC and External Memory Memory System ARM Point of ViewOverview of Memory MapARM Subsystem Access to MemoryARM-DSP CorePac CommunicationSysLib and its librariesMSGCOM PktlibResource Manager

Multicore TrainingCores @ 1.0 GHz / 1.2 GHzC66xCorePacTCI6614MSMC2MBMSMSRAM64-Bit DDR3 EMIFBCPx2x2CoprocessorsVCP2x4PowerManagementDebug & TraceBoot ROMSemaphoreMemorySubsystemSRIO x4PCIe x2UART x2AIF2x6SPIIC2PacketDMAMulticore NavigatorQueueManagerEMIF 16x332KB L1P-Cache32KB L1D-Cache1024KB L2 CacheRSARSAx2PLLEDMAx3HyperLinkTeraNetNetwork CoprocessorSwitchEthernetSwitchSGMIIx2PacketAcceleratorSecurityAcceleratorFFTCTCP3dTACx2RACARMCortex-A832KB L1P-Cache32KB L1D-Cache256KB L2 CacheUSIMTCI6614 Functional ArchitectureMulticore TrainingQMSSC6614 TeraNet Data ConnectionsMSMCDDR3Shared L2 SSCoreSPCIeSTAC_BESSRIOPCIeQM_SSMMMTPCC16ch QDMAMTC0MTC1MMDDR3XMCMDebugSS MTPCC64chQDMAMTC2MTC3MTC4MTC5TPCC64chQDMAMTC6MTC7MTC8MTC9Network CoprocessorMHyperLinkMHyperLinkSAIF / PktDMAMFFTC / PktDMAMRAC_BE0,1MTAC_FEMSRIOSSRAC_FESTCP3dSTCP3e_W/RSVCP2 (x4)SMEDMA_0EDMA_1,2CoreSMCoreSML2 0-3SMCPUCLK/2256bit TeraNet 2AFFTC / PktDMAMTCP3dSRAC_FESVCP2 (x4)SVCP2 (x4)SVCP2 (x4)SRAC_BE0,1MCPUCLK/3 128bit TeraNet 3ASSSSCPUCLK/2256bit TeraNet 2BMPUDDR3ARM

ToTeraNet2BFrom ARMMulticore TrainingCPT see physical addresses. In MSMC, one CPT per bank.AgendaOverview of the 6614/6612 TeraNet Memory System DSP CorePac Point of ViewOverview of Memory MapMSMC and External Memory Memory System ARM Point of ViewOverview of Memory MapARM Subsystem Access to MemoryARM-DSP CorePac CommunicationSysLib and its librariesMSGCOM PktlibResource Manager

Multicore TrainingSoC Memory Map 1/2Start AddressEnd AddressSizeDescription0080 00000087 FFFF512KL2 SRAM00E0 000000E0 7FFF 32KL1P00F0 000000F0 7FFF32KL1D0220 00000220 007F128KTimer 00264 0000 0264 07FF2KSemaphores0270 00000270 7FFF32KEDMA CC027D 0000027d 3FFF16KTETB Core 00c00 00000C3F FFFF4MShared L21080 00001087 FFFF512KL2 Core 0 Global12E0 000012E0 7FFF32KCore 2 L1P GlobalMulticore TrainingSoC Memory Map 2/2 Start AddressEnd AddressSizeDescription2000 0000200F FFFF1MSystem Trace Mgmt Configuration2180 000033FF FFFF296M+32KReserved3400 0000341F FFFF2MQMSS Data3420 00003FFF FFFF190MReserved4000 00004FFF FFFF256MHyperLink Data5000 00005FFF FFFF256KReserved6000 00006FFF FFFF256KPCIe Data7000 000073FF FFFF64MEMIF16 Data NAND Memory (CS2)8000 0000FFFF FFFF2GDDR3 DataMulticore TrainingKeyStone Memory TopologyL1D 32KB Cache/SRAML1P 32KB Cache/SRAML2 1MB Cache/SRAMMSM 2MB Shared SRAMDDR3 Up to 8GBL1D & L1P Cache Options 0KB, 4KB, 8KB, 16K, 32KB

L2 Cache Options 0KB, 32KB, 64KB, 128KB, 256KB, 512KBDDR3 (1x64b)MSMCPeripheralsL1DL1DL1PL1PL2L2TeraNetC66x CorePac256256256256256MSMC SRAMC66x CorePacL1DL1PL2C66x CorePac256L1DL1PL2C66x CorePac256256256Multicore TrainingMSMC Block DiagramCorePac2Shared RAM2048 KBCorePac Slave PortCorePac Slave PortSystemSlave PortforShared SRAM(SMS)System Slave Port for External Memory(SES) MSMC System Master PortMSMC EMIF Master PortMSMC DatapathArbitration256256256MemoryProtection &ExtensionUnit (MPAX)256 256EventsMemoryProtection &ExtensionUnit (MPAX)MSMC CoreTo SCR_2_Band the DDR TeraNetTeraNet256Error Detection & Correction (EDC)256256256CorePac Slave PortCorePac Slave Port256256XMCMPAXCorePac3XMCMPAXCorePac0XMCMPAXCorePac1XMCMPAXMulticore TrainingXMC External Memory Controller The XMC is responsible for the following:

Address extension/translationMemory protection for addresses outside C66xShared memory access pathCache and pre-fetch support

User Control of XMC:

MPAX (Memory Protection and Extension) RegistersMAR (Memory Attributes) Registers

Each core has its own set of MPAX and MAR registers!

Multicore TrainingThe MPAX RegistersMPAX (Memory Protection and Extension) Registers: Translate between physical and logical address16 registers (64 bits each) control (up to) 16 memory segments.Each register translates logical memory intophysical memory for the segment.FFFF_FFFF8000_00007FFF_FFFF0:8000_00000:7FFF_FFFF1:0000_00000:FFFF_FFFFC66x CorePacLogical 32-bitMemory MapSystemPhysical 36-bitMemory Map0:0C00_00000:0BFF_FFFF0:0000_0000F:FFFF_FFFF8:8000_00008:7FFF_FFFF8:0000_00007:FFFF_FFFF0C00_00000BFF_FFFF0000_0000Segment 1Segment 0MPAX RegistersMulticore Training12The MAR RegistersMAR (Memory Attributes) Registers:256 registers (32 bits each) control 256 memory segments:Each segment size is 16MBytes, from logical address 0x0000 0000 to address 0xFFFF FFFF.The first 16 registers are read only. They control the internal memory of the core.Each register controls the cacheability of the segment (bit 0) and the prefetchability (bit 3). All other bits are reserved and set to 0.All MAR bits are set to zero after reset.Multicore TrainingSpeeds up processing by making shared L2 cached by private L2 (L3 shared).Uses the same logical address in all cores; Each one points to a different physical memory.Uses part of shared L2 to communicate between cores. So makes part of shared L2 non-cacheable, but leaves the rest of shared L2 cacheable.Utilizes 8G of external memory; 2G for each core.

XMC: Typical Use CasesMulticore TrainingAgendaOverview of the 6614/6612 TeraNet Memory System DSP CorePac Point of ViewOverview of Memory MapMSMC and External Memory Memory System ARM Point of ViewOverview of Memory MapARM Subsystem Access to MemoryARM-DSP CorePac CommunicationSysLib and its librariesMSGCOM PktlibResource Manager

Multicore TrainingARM Core

Multicore Training16ARM Subsystem Memory Map

Multicore TrainingARM Subsystem Ports32-bit ARM addressing (MMU or Kernel)31 bits addressing into the external memoryARM can address ONLY 2GB of external DDR (No MPAX translation) 0x8000 0000 to 0xFFFF FFFF31 bits are used to access SOC memory or to address internal memory (ROM)

Multicore TrainingARM Visibility Through the TeraNet ConnectionIt can see the QMSS data at address 0x3400 0000It can see HyperLink data at address 0x4000 0000It can see PCIe data at address 0x6000 0000It can see shared L2 at address 0x0C00 0000 It can see EMIF 16 data at address 0x7000 0000NANDNORAsynchronous SRAMMulticore TrainingARM Access SOC MemoryDo you see a problem with HyperLink access?Addresses in the 0x4 range are part of the internal ARM memory map.

What about the cache and data from the Shared Memory and the Async EMIF16?The next slide presents a page from the device errata.DescriptionVirtual Address from Non-ARM MastersVirtual Address from ARMQMSS0x3400_0000 to 0x341F_FFFF0x4400_0000 to 0x441F_FFFFHyperLink0x4000_0000 to 0x4FFF_FFFF0x3000_0000 to 0x3FFF_FFFFMulticore TrainingErrata Users Note Number 10

Multicore TrainingARM EndianessARM uses only Little Endian.DSP CorePac can use Little Endian or Big Endian.The Users Guide shows how to mix ARM core Little Endian code with DSP CorePac Big Endian.Multicore TrainingAgendaOverview of the 6614/6612 TeraNet Memory System DSP CorePac Point of ViewOverview of Memory MapMSMC and External Memory Memory System ARM Point of ViewOverview of Memory MapARM Subsystem Access to MemoryARM-DSP CorePac CommunicationSysLib and its librariesMSGCOM PktlibResource Manager

Multicore TrainingMCSDK Software LayersHardwareSYS/BIOSRTOSSoftware Framework ComponentsInter-ProcessorCommunication(IPC)InstrumentationCommunication ProtocolsTCP/IPNetworking(NDK)Algorithm LibrariesDSPLIBIMGLIBMATHLIBDemonstration ApplicationsHUA/OOBIO BmarksImageProcessingLow-Level Drivers (LLDs)Chip Support Library (CSL)EDMA3PCIePAQMSSSRIOCPPIFFTCHyperLinkTSIPPlatform/EVM SoftwareBootloaderPlatformLibraryPower OnSelf Test (POST)OSAbstraction LayerResourceManagerTransports- IPC- NDKMulticore Training24SysLib Library An IPC ElementApplication

System Library(SYSLIB)

Low-Level Drivers (LLD)

Hardware Accelerators

Queue Manager Subsystem (QMSS)NetworkCoprocessor (NETCP)CPPI LLDPA LLDSA LLDResourceManager (ResMgr)Packet Library(PktLib)MsgComLibraryNetFPLibraryResourceManagement SAP PacketSAP CommunicationSAP FastPathSAP Multicore TrainingMsgCom LibraryPurpose: To exchange messages between a reader and writer.Read/write applications can reside:On the same DSP coreOn different DSP coresOn both the ARM and DSP coreChannel and Interrupt-based communication:Channel is defined by the reader (message destination) sideSupports multiple writers (message sources)

Multicore TrainingChannel TypesSimple Queue Channels: Messages are placed directly into a destination hardware queue that is associated with a reader. Virtual Channels: Multiple virtual channels are associated with the same hardware queue.Queue DMA Channels: Messages are copied using infrastructure PKTDMA between the writer and the reader.Proxy Queue Channels Indirect channels work over BSD sockets; Enable communications between writer and reader that are not connected to the same Navigator.Multicore TrainingInterrupt TypesNo interrupt: Reader polls until a message arrives.Direct Interrupt: Low-delay system; Special queues must be used.Accumulated Interrupts: Special queues are used; Reader receives an interrupt when the number of messages crosses a defined threshold.Multicore TrainingBlocking and Non-BlockingBlocking: The Reader can be blocked until message is available.Non-blocking: The Reader polls for a message. If there is no message, it continues execution.Multicore TrainingCase 1: Generic Channel Communication

Zero Copy-based Constructions: Core-to-CoreReaderWriterMyCh1Put(hCh,msg);Tibuf *msg = PktLibAlloc(hHeap);PktLibFree(msg);Tibuf *msg =Get(hCh);hCh=Find(MyCh1);hCh = Create(MyCh1);Delete(hCh);NOTE: Logical function onlyReader creates a channel ahead of time with a given name (e.g., MyCh1).When the Writer has information to write, it looks for the channel (find).Writer asks for a buffer and writes the message into the buffer.Writer does a put to the buffer. The Navigator does it magic!When the Reader calls get, it receives the message.The Reader must free the message after it is done reading.Multicore TrainingNotes:All naming is illustrative.

Open Items: Recycling policies on Tx Completion queuesAPI Naming conventionCase 2: Low-Latency Channel CommunicationSingle and Virtual Channel

Zero Copy-based Construction: Core-to-CoreReaderWriterNOTE: Logical function onlyReader creates a channel based on a pending queue. The channel is created ahead of time with a given name (e.g., MyCh2).Reader waits for the message by pending on a (software) semaphore.When Writer has information to write, it looks for the channel (find).Writer asks for buffer and writes the message into the buffer.Writer does a put to the buffer. The Navigator generates an interrupt . The ISR posts the semaphore to the correct channel.The Reader starts processing the message.Virtual channel structure enables usage of a single interrupt to post semaphore to one of many channels.MyCh3MyCh2hCh = Create(MyCh2);Posts internal Sem and/or callback posts MySem;chRx(driver)Put(hCh,msg);Tibuf *msg = PktLibAlloc(hHeap);PktLibFree(msg);hCh=Find(MyCh2);Get(hCh); or Pend(MySem);hCh = Create(MyCh3);Get(hCh); or Pend(MySem);PktLibFree(msg);Put(hCh,msg);Tibuf *msg = PktLibAlloc(hHeap);hCh=Find(MyCh3);Multicore TrainingNotes:All naming is illustrative.

Open Items: Recycling policies on Tx Completion queuesAPI Naming conventionCase 3: Reduce Context Switching

Zero Copy-based Constructions: Core-to-CoreReaderWriterReader creates a channel based on an accumulator queue. The channel is created ahead of time with a given name (e.g., MyCh4).When Writer has information to write, it looks for the channel (find).Writer asks for buffer and writes the message into the buffer.The writer put the buffer. The Navigator adds the message to an accumulator queue.When the number of messages reaches a water mark, or after a pre-defined time out, the accumulator sends an interrupt to the core.Reader starts processing the message and makes it free after it is done.MyCh4AccumulatorchRx(driver)PktLibFree(msg);Tibuf *msg =Get(hCh);Delete(hCh);Put(hCh,msg);Tibuf *msg = PktLibAlloc(hHeap);hCh=Find(MyCh4);hCh = Create(MyCh4);NOTE: Logical function onlyMulticore TrainingNotes:All naming is illustrative.

Open Items: Recycling policies on Tx Completion queuesAPI Naming conventionCase 4: Generic Channel Communication

ARM-to-DSP Communications via Linux Kernel VirtQueueReaderWriterReader creates a channel ahead of time with a given name (e.g., MyCh5).When the Writer has information to write, it looks for the channel (find). The kernel is aware of the user space handle.Writer asks for a buffer. The kernel dedicates a descriptor to the channel and provides the Writer with a pointer to a buffer that is associated with the descriptor. The Writer writes the message into the buffer. Writer does a put to the buffer. The kernel pushes the descriptor into the right queue. The Navigator does a loopback (copies the descriptor data) and frees the Kernel queue. The Navigator loads the data into another descriptor and sends it to the appropriate core.When the Reader calls get, it receives the message.The Reader must free the message after it is done reading.MyCh5Put(hCh,msg);msg = PktLibAlloc(hHeap);PktLibFree(msg);Tibuf *msg =Get(hCh);hCh=Find(MyCh5);hCh = Create(MyCh5);Delete(hCh);RxPKTDMATxPKTDMANOTE: Logical function onlyMulticore TrainingNotes:All naming is illustrative.

Open Items: Recycling policies on Tx Completion queuesAPI Naming conventionCase 5: Low-Latency Channel Communication

ARM-to-DSP Communications via Linux Kernel VirtQueueReaderWriterReader creates a channel based on a pending queue. The channel is created ahead of time with a given name (e.g., MyCh6).Reader waits for the message by pending on a (software) semaphore.When Writer has information to write, it looks for the channel (find). The kernel space is aware of the handle.Writer asks for buffer. The kernel dedicates a descriptor to the channel and provides the Writer with a pointer to a buffer that is associated with the descriptor. The Writer writes the message into the buffer. Writer does a put to the buffer. The kernel pushes the descriptor into the right queue. The Navigator does a loopback (copies the descriptor data) and frees the Kernel queue. The Navigator loads the data into another descriptor, moves it to the right queue, and generates an interrupt. The ISR posts the semaphore to the correct channelReader starts processing the message.Virtual channel structure enables usage of a single interrupt to post semaphore to one of many channels.PktLibFree(msg);MyCh6PktLibFree(msg);hCh = Create(MyCh6);RxPKTDMAchIRx(driver)Get(hCh); or Pend(MySem);TxPKTDMAPut(hCh,msg);msg = PktLibAlloc(hHeap);hCh=Find(MyCh6);Delete(hCh);NOTE: Logical function onlyMulticore TrainingNotes:All naming is illustrative.

Open Items: Recycling policies on Tx Completion queuesAPI Naming conventionCase 6: Reduce Context Switching

ARM-to-DSP Communications via Linux Kernel VirtQueueReaderWriterNOTE: Logical function onlyReader creates a channel based on one of the accumulator queues. The channel is created ahead of time with a given name (e.g., MyCh7). When Writer has information to write, it looks for the channel (find). The Kernel space is aware of the handle.The Writer asks for a buffer. The kernel dedicates a descriptor to the channel and gives the Write a pointer to a buffer that is associated with the descriptor. The Writer writes the message into the buffer. The Writer puts the buffer. The Kernel pushes the descriptor into the right queue. The Navigator does a loopback (copies the descriptor data) and frees the Kernel queue. Then the Navigator loads the data into another descriptor. Then the Navigator adds the message to an accumulator queue.When the number of messages reaches a watermark, or after a pre-defined time out, the accumulator sends an interrupt to the core.Reader starts processing the message and frees it after it is complete.MyCh7PktLibFree(msg);Msg = Get(hCh);hCh = Create(MyCh7);RxPKTDMAAccumulatorchRx(driver)TxPKTDMAPut(hCh,msg);msg = PktLibAlloc(hHeap);hCh=Find(MyCh7);Delete(hCh);Multicore TrainingNotes:All naming is illustrative.

Open Items: Recycling policies on Tx Completion queuesAPI Naming conventionCode ExampleReaderhCh = Create(MyChannel, ChannelType, struct *ChannelConfig); // Reader specifies what channel it wants to create// For each messageGet(hCh, &msg) // Either Blocking or Non-blocking call,pktLibFreeMsg(msg); // Not part of IPC API, the way reader frees the message can be application specificDelete(hCh);Writer:hHeap = pktLibCreateHeap(MyHeap); // Not part of IPC API, the way writer allocates the message can be application specifichCh = Find(MyChannel);//For each messagemsg = pktLibAlloc(hHeap); // Not part of IPC API, the way reader frees the message can be application specificPut(hCh, msg); // Note: if Copy=PacketDMA, msg is freed my Tx DMA.msg = pktLibAlloc(hHeap); // Not part of IPC API, the way reader frees the message can be application specificPut(hCh, msg);Multicore TrainingPacket Library (PktLib)Purpose: High-level library to allocate packets and manipulate packets used by different types of channels.Enhance capabilities of packet manipulationEnhance Heap manipulation

Multicore TrainingHeap AllocationHeap creation supports shared heaps and private heaps.Heap is identified by name. It contains Data buffer Packets or Zero Buffer PacketsHeap size is determined by application.Typical pktlib functions:Pktlib_createHeapPktlib_findHeapbyNamePktlib_allocPacket

Multicore TrainingPacket ManipulationsMerge multiple packets into one (linked) packetClone packetSplit Packet into multiple packetsTypical pktlib functions:Pktlib_packetMergePktlib_clonePacketPktlib_splitPacket

Multicore TrainingPktLib: Additional FeaturesClean up and garbage collection (especially for clone packets and split packets)Heap statisticsCache coherency

Multicore TrainingResource Manager (ResMgr) LibraryPurpose: Provides a set of utilities to manage and distribute system resources between multiple users and applications.The application asks for a resource. If the resource is available, it gets it. Otherwise, an error is returned.

Multicore TrainingResMgr ControlsGeneral purpose queuesAccumulator channelsHardware semaphoresDirect interrupt queuesMemory region request

Multicore Training

ti 6614 multicore

Documents

memory mapmsmc

memory maparm subsystem

3smcpuclk2256bit teranet

fff ffff190mreserved4000

fff ffff256kreserved6000

l2 cache options

fff ffff256kpcie data7000

fff ffff256mhyperlink