multicore navigator for dummies - texas instruments...

<SPRA##>—August 2012 Multicore Navigator for Dummies Application Report of 22Submit Documentation Feedback

<SPRA##>—August 2012

Please be aware that an important notice concerning availability, standard warranty, and use in critical applicationsof Texas Instruments semiconductor products and disclaimers thereto appears at the end of this document.

Application Report

Multicore Navigator for DummiesHigh-Performance and Multicore Processors

Ran Katzur

PurposeThe purpose of this document is to answer the following questions about Multicore Navigator:

• Why do we need Multicore Navigator?• How does Multicore Navigator work?• What does Multicore Navigator do and how do we use it ?• Which application is the most interesting application?• Who should use Multicore Navigator?• Where is the software release?

OK. So I should use what and not which, but you understand why I did it, and besides, this document will not answer all the questions, and obviously I was unable to squeeze “When” in, but I will do my best.

When TI introduced the KeyStone architecture as a generic “real” multicore family of devices, it introduced the Multicore Navigator to off-load standard tasks from the CPU cores. This document is devoted to the Multicore Navigator.

What do I mean by “real” multicore device?

There is a difference between a device that has many cores and a “real” multicore device. A real multicore device is designed to facilitate various cooperation methods between the cores, facilitate resources optimization and resources management.

Resources like internal and external shared memory, buses, peripherals, and coprocessors should be shared between cores and the system must reduce or eliminate resources conflicts between cores.

http://www.go-dsp.com/forms/techdoc/doc_feedback.htm?

of 22 Multicore Navigator for Dummies Application Report <SPRA##>—August 2012Submit Documentation Feedback

1 The History of Multicore Navigator (According to Me) or Why Multicore Navigator www.ti.com

1 The History of Multicore Navigator (According to Me) or Why Multicore NavigatorPlease consider the disclaimer.

When multicore design started, it became clear that to achieve high performance in any device, there must be standard high-bit-rate peripherals to move data in and out of the device quickly. Several high-bit-rate peripherals were chosen to include in the KeyStone device. The highest bit-rate peripheral is SRIO. With up to four lanes of SRIO, each lane can provide up to 5G baud. SRIO provides up to 20G baud in three modes, direct IO, type 9 messages, and type 11 messages.

The second high-bit-rate bus is PCI express. Supporting PCI express enables the use of PC-based cards with KeyStone devices. The PCI express peripheral has two lanes, each can carry up to 5G baud. PCI express supports root complex mode, endpoint, and legacy endpoint modes.

Two Ethernet-port peripherals were added to some KeyStone devices. Each port supports 10/100/1000 baud.

In addition to standard high-bit-rate peripherals that enable devices to send and receive data from any generic input/output device that supports any one of the above peripherals, a high-bit-rate proprietary bus called HyperLink was added. The original thought was to enable connecting the internal buses of two KeyStone devices to work as one device; this arrangement would double the processing power while saving power and cost by using one external memory and operating one set of peripherals for both devices. The HyperLink proprietary bus, which was specified to deliver four lanes of 12.5 G baud, has some limitations on the distance between the two devices and the type of connection. As it turned out, many customers saw the value of a very high-bit-rate path in and out of multicore devices; ; this prompted TI to ask a third party to develop a FPGA reference code so the HyperLink could be used to move data between the KeyStone device and a very high-bit-rate source or destination device.

Because there are high-bit-rate peripherals that move lots of data in and out of the device, there is also a need to off-load the CPUs from moving data inside the device. So it made sense to add a master DMA inside all the peripherals to move data in and out without the intervention of any CPU cores.

But then, engineers noticed an interesting thing. Some peripherals write (or read) directly to memory, so DMA master will do the trick. However, other peripherals—such as the Ethernet interface, SRIO messages, antenna interface (for wireless devices)—need to route the data. The application may want to route SRIO messages from one source to one core and from a different source to different core. The application may want to route different Ethernet traffic to different channels in different cores, or to route the antenna traffic directly to a core or coprocessor to do FFT. Therefore, there is a need for generic routing mechanism that off-loads the cores to do routing tasks.

And this is how the Navigator was conceived: having a system that moves data in and out of high-bit-rate peripherals and routes the data. Instead of a built-in master DMA that resides in all peripherals that do not need routing, a new system DMA master that works with a central routing facility was invented.

Disclaimer:

I have no idea if the history is true. I just think that this is the way it could happen, and lets be honest, we all know that perception is more important than reality, so this is my perception. As a famous journalist once said:” The value of a story is not measured by its truthfulness” (another baseless story)


http://www.ti.com


1 The History of Multicore Navigator (According to Me) or Why Multicore Navigatorwww.ti.com

A routing mechanism needs some kind of packaging with routing information present as well as data; because the new DMA must move the package, it was called PKTDMA (for packet DMA).

The second part of the system is a central routing facility. TI chose to implement the routing system based on a set of queues and a mechanism to move packages between queues. The central routing facility is called the “queue manager subsystem” (QMSS), and the complete system got the name Navigator. To add a priority scheme to the traffic and enable various notification methods and quality-of-service for peripherals, two processors were introduced into the QMSS to monitor the traffic and manage priorities. An interrupt generation and controller block were added to implement multiple methods of notification.

The next step in the development of the Multicore Navigator was the realization that if there is already a central facility to route packages inside the device, why not use it to move data and messages between different cores. Indeed, multicore programming requires cooperation between cores including moving data, synchronization, and messages. It makes sense to use the existing routing facility—the QMSS to deliver packages between cores. So a special PKTDMA was invented, one that does not serve a peripheral but serves traffic between cores inside the device. And just like roads and bridges that serve vehicle traffic, the new special PKTDMA was name “Infrastructure PKTDMA” and the navigator name was changed to “Multicore Navigator”.

The Multicore Navigator supports two type of applications; routing peripherals as was originally designed, and multicore cooperation that includes all types of communication between cores.

At the same time, engineers looked into the theory and implementation of parallel processing since multicore is naturally a good model for parallel processing. A major issue with multiple processors is to allocate computational resources in the most efficient way. Dynamic allocation of resources is always more efficient than static allocation, but dynamic allocation requires a central unit that can see the activity of all the resources and algorithm and logic to monitor and assign resources. The QMSS seems to be an ideal candidate for dynamic resource allocation: first, it is a central facility that has the ability to see the activity of all the cores; second, it has central processing power and logic on which an algorithm can be executed; and third, having an interrupt generation block, it can initiate threads to all cores. From these functions came the third type of Multicore Navigator usage—dynamic load optimization.

So, at this point the Multicore Navigator can be used for three type of tasks:• Peripherals routing• Interdevice communication (data movement, messages, synchronization)• Dynamic load optimization between cores

What about the future?

I noticed a strange phenomenon with my PC. Even though it runs ten times faster than the computer I had ten years ago (at least), Microsoft word does not work faster. How it can be? There is no vacuum in the computer business; as soon as the application developers see available resources, the developers add features that consume the additional resources and slow down the system so the hardware developers must come with even faster computers.

This is what I think will happen with the multicore navigator.

Currently the dynamic load can process up to million threads a second. In order to process more threads, more computational power should be given to the multicore navigator.

With more computational power, new features will be developed, features that I cannot even envision now


http://www.ti.com


2 High Level Description of How Multicore Navigator Works www.ti.com

2 High Level Description of How Multicore Navigator Works2.1 Basic Terminology

The Multicore Navigator was developed as a routing system, so to understand how it works, one has to look at the most common routing system—the mail service.Figure 1

We all familiar with the mail truck. It comes to my house six times a week. It picks up postcards, letters, and packages from my house and brings me postcards, letters, and packages that were sent to me.

All around the country, each neighborhood has its own mail truck. They all collect and deliver mail. But where does the mail that they collect go, and where do they get the mail to deliver? The trucks move data from (and to) the customers’ locations to (and from) the central office. The routing operation is done in the central office.

Let’s start by setting the analogy between the mail service and the Multicore Navigator. In our case, the mail truck that moves postcards, letters, and packages is the PKTDMA. Each peripheral (that is, each customer) has one.

The central office is analogous to the mail central office. This is the place where the routing takes place.

The equivalent of postcards, letters, and packages in the Multicore Navigator are called descriptors. Just like the mail service, the descriptors contain the routine information (an address). There are different types of descriptors:

• Monolithic descriptors are like postcards. The routing information and the data information that they carry are on the same structure. Like a postcard, the address and the data are on the same page. Monolithic descriptors can be used to transfer messages, synchronization, and other messages with little data.

• Host descriptors are like envelopes. Just as envelopes have the routing information on the outside and the data on a separate page or pages inside the envelope, host descriptors have a static link to a buffer that contains data. The buffer is permanently linked with the descriptor and is routed with the descriptor.


http://www.ti.com


2 High Level Description of How Multicore Navigator Workswww.ti.com

• Larger packages are analogous to special types of host descriptors. There is a way to chain several host descriptors together and link them into one package. Then the data that is stored in multiple buffers—each is connected to a different descriptor in the chain—is treated as a single package. The first descriptor in the chain is called start-host-descriptor, all other descriptors in the chain aside from the last one are called middle-host-descriptors, and the last descriptor in the chain is called the end-host-descriptor.

The equivalent of mailboxes are queues, but queues are much more generic than mailboxes. Queues can be the following:

• You can think of queues as receive mailboxes; descriptors are pushed into a receive queue (mailbox) and the receiver pops the descriptors out of it.

• Some queues behave like a storage area for descriptors that are not used. When the code defines descriptors, these descriptors are stored in one or more storage areas; these are special queues that are called Free Descriptors Queue (FDQ) and as mentioned before, the FDQ contains descriptors that are ready to be used.

• Queues are also transmit mailboxes. Just like in the U.S. when one puts a letter into a mailbox and wants to signal the postman to pick it up, there is a small red flag that needs to be set. The same is true with transmit queues; they have their own red-flag—a signal that alerts the system that there is a descriptor in the queue waiting to be routed. See the picture below.

Figure 2

• And, of course, if I am cheap and do not want to spend money on stamps when I send a letter to a friend, I can drive to my friend’s house and deposit the letter in his receive mailbox. (This is illegal in the US, only the post office can deposit letters in the receive mailbox, in addition to the owner). The same is true for the application. It can push a descriptor directly to the receive queue. In that case, the red flag does not need to be set.

• In the Multicore Navigator, descriptors are 100 percent recycled. After being used, descriptors are freed; that is, they are stored in the FDQ (special queues called “Free Descriptor queue” —how creative!)


http://www.ti.com



• And, of course, just like the post office, when a letter with a wrong address that cannot be delivered is put into a special bin to be treated later, there are “error queues” where descriptors that cannot be delivered are pushed into. These error queues enable postmortem analysis of the errors.

And, just as the postal service has supervisors who monitor the performance of the system, the Multicore Navigator has two RISC processors that monitor the traffic; that is, they monitor the number of descriptors in certain queues and manage the delivery of the descriptors.Figure 3

This is the place to mention that the Multicore Navigator has 8192 hardware queues inside the QMSS module and it can support up to 512K descriptors. Each descriptor has an index number. Note that the Queue management subsystem has internal memory that can store up to 16K descriptors indexes. If the system requires more than 16K descriptors, an external (to the QMSS module) index memory should be allocated.


http://www.ti.com



2.2 PKTDMA Operation Inside a PeripheralThe PKTDMA that is part of a peripheral converts packets encapsulated inside descriptor data into a bitstream that is sent out by the peripheral, and builds descriptors from data received by the peripheral as a bitstream. The following figure shows the interfaces to the PKTDMA:Figure 4

Outgoing traffic arrives as descriptors to the transmit queue. The descriptor might have routing information that may be conveyed to the other side.

How does the PKTDMA know that there is a descriptor in the transmit queue? Transmit queues have a pending signal that alerts the PKTDMA that a descriptor is available in the transmit queue—just like the red flag that alerts the mail carrier that an outgoing letter is waiting in the mailbox. This pending signal is an hard connection between the queues and the PKTDMA. Each PKTDMA can have multiple transmit queues connected to the peripheral. For example, the SRIO peripheral supports up to 16 channels of SRIO traffic. There are 16 transmit queues connected to the SRIO. Each queue is connected to a different SRIO channel. Pushing a descriptor into queue number 672 —trust me on this one, this is the real queue number—will generate a pending signal to the PKTDMA inside the SRIO peripheral, and as a result the PKTDMA will pop the descriptor from the queue and send the information as a bitstream on channel 0. If it the application wants to use channel 1, for example, the descriptor should be pushed into queue 673, and so on.

So, for outbound traffic, the processing information is given in the descriptors (location of the buffers, special location of protocol information) and in the queue number in which the descriptor was pushed because the queue number determines what preconfigured channel will be used.

PKTDMA

Bit Stream

Packets

Flag NotificationReceive

Instructions


http://www.ti.com



This is not the case for incoming traffic. The bitstream that arrives at the peripheral might have routing information (ID, mailbox number, and letters for type 11 SRIO, for example) but no idea where to forward the information inside the device. Thus, some receive instructions are needed to tell the PKTDMA what to do with incoming traffic and where to send it. The Multicore Navigator uses a multiple-registers structure called “RX Flow” to tell the PKTDMA what to do with incoming data. The PKTDMA packages incoming data into descriptors and pushes the descriptors into the appropriate receive queue according to the RX Flow instructions.

To summarize, PKTDMA inside a peripheral gets a pending signal when a descriptor is pushed into one of its transmit queues. It pops the descriptor and sends the data associated with the descriptor via the peripheral. After processing the descriptor, the PKTDMA pushes the free descriptor back into a free descriptor queue to be used again. In the ingress direction, the PKTDMA receives a bitstream. It collects the bitstream information and following instructions in the RX flow it pops a descriptor (one or more) from the free descriptor queue, loads the data into the descriptor or to the buffer that is associated with the descriptor, and pushes the descriptor or descriptors into a receive queue.

2.3 The Infrastructure PKTDMAYou know the joke about how a mathematician boils water if the kettle is in the other room?

First, he brings the tea kettle into his office, fills it with water, and plugs it in.

And if the next question is how do you boil water if the kettle is in your office:

Move it back to the other office, and this problem was solved already.

The same is true for the infrastructure PKTDMA. The system knows how to send descriptors toa bitstream and back, so how does the system send descriptors to the descriptors destination?

First it sends it to the bitstream, then it loops the bitstream back into the PKTDMA and makes descriptors out of the bitstream.


http://www.ti.com



The following figure shows how the infrastructure PKTDMA works.Figure 5

2.4 Send Message and Transmit QueuesWe already mentioned that messages and data are sent using descriptors. As an example, let’s assume that a core wants to send a message; that is, some data from source to destination using, for example, SRIO type 11.

Monolithic descriptors are usually used to transfer short messages. If large data needs to be sent, host descriptors are used. The Multicore Navigator User Guide SPRUGR9E Table 3-1 describes the host descriptor header. The monolithic header is described in Table 3-3. The header includes the destination address.

For real-time consideration, during initialization the code fills as much information in the descriptor header as possible when the descriptors are created; we will talk about this later. But assume that the code uses the same descriptors to send data to different destinations. In that case, the sender must add the routing information to the descriptor.

PKTDMA

Loop back the Bit Stream

Descriptors in Descriptors out

Transmit Queue

ReceiveQueue

http://www.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprugr9e&fileType=pdf


http://www.ti.com



In SRIO the routing information is the destination ID, either 8 or 16 bits, and depends on the additional mode-of-operation routing information. For type 11 SRIO, mailbox and letters are used to channel messages or data inside the same destination to different places. You can think of ID as the device ID and the mailbox and letter values are used to send messages to a certain core or certain thread.

So the process of sending data to a destination is the following:• The source picks a (host) descriptor from a queue where ready-to-be-used

descriptors are stored, that is, free descriptor queue.• The source adds the data to be sent either to the descriptor itself (if monolithic

descriptor) or to the buffer that is linked to the descriptor• The source adds the destination address information to the descriptor, if this

information is not already there• The source pushes the descriptor into a transmit queue that is associated with

SRIO. In that point, the source is done. The message will be sent out.

2.4.1 What Happens to the Descriptor Next?Transmit queues are always pending queues; that is, when there are descriptors in a transmit queue, a pending signal is generated to alert the connected PKTDMA that there are descriptors in the queue.

KeyStone SRIO has four lanes that can be configured as one port with four lanes, four distinct ports each with one lane, and other configurations. (We may show all the possible SRIO configuration in a document “SRIO for dummies” if it is ever written). It has 16 channels, each can be linked to a port. When more than one channel has descriptors to transmit, the channel with the lower channel number has higher priority. Each channel is connected to a queue. Queue number 672 is the SRIO queue connected to channel 0, queue 673 is SRIO queue connected to channel 1, and so on, until queue 687 that is connected to channel 15. So, when a core pushes a descriptor into a SRIO queue, the PKTDMA pops the descriptor, extracts the data from the buffer that is associated with the queue, extracts the address from the descriptor header, and sends the data with the address information via the SRIO channel that is connected to the queue. As soon as the data is sent, the PKTDMA frees the descriptor and pushes it back into the free descriptor queue that is associated with the descriptor. Part of the descriptor header is the free descriptor queue. This value is usually configured when the descriptors are created.

Here we encounter one of the interesting features of the Multicore Navigator. The source core can continue execution even if the previous data was not sent already. All it has to do is to pop a new descriptor from the free descriptor queue and generate more data.

Imagine a case where the processing data time varies and transmitting rate is fixed. If the core generates a descriptor faster that the SRIO transmits it, the transmit queue will accumulate descriptors and queue them to be transmitted. If the data changes such that the processing time is longer, the SRIO does not have to wait for the sender core, it transmits descriptors from the queue. This type of loose linkage is very efficient in time varies algorithms.


http://www.ti.com



How much delay can be accrued between generating data and transmitting data? Think about it. The limit is the number of descriptors that are available. By controlling the number of descriptors created for this purpose (and we will see that different descriptors can be created for different purposes) the system architect can set the maximum delay. If there are no descriptors in the free descriptor queue, the source core will stop generating data and wait for a free descriptor to come (meaning, the data in this descriptor was already sent).

Host descriptors are hard-coded linked with a buffer. When the descriptor is freed and pushed into the free descriptor queue, the buffer that is linked with it is also freed and can be used again by the system. This scheme reduces the possibility of buffer memory leaks. Buffers are freed when the descriptors are freed, and this is always done by PKTDMA for descriptors that are pushed into transmit queues.

The following table lists the transmit queues in KeyStone architecture:

Figure 6

2.5 Receive Message and Receive QueuesIt was already stated that a receive queue is like a mailbox where your mail arrives. There are multiple ways to get mail descriptors. Let’s look at some typical situations.


http://www.ti.com



First consider a simple case, a simple mailbox and the consumer is waiting for a letter. The consumer has to check the mailbox to see if mail arrived. The consumer may stand in front of the mailbox waiting for the mail, or go inside the house and check every so often to see if the mail arrived.

The same is true for a simple receive queue and the pulling method. The destination core checks a receive queue to see if there is a descriptor (or descriptors) there. The checking can be blocking; that is, the execution stalls until a descriptor is available, or non blocking, execution continues even if there is no descriptor and the core will recheck later. Note that we did not mention how the system knows to bring the descriptor to the receive queue. We will discuss it later. Right now we assume that there is a queue that was designated by the core as a receive queue. It should not be one of the transmit queues; usually it is one of the general purpose queues.

Figure 7

What about other receive features? Lets examine several scenarios:

2.5.1 Express Mail or Time-sensitive MessagesJust as express mail requires the mail carrier to come to my house and ring the bell, sometimes a destination core needs to be notified immediately when a descriptor arrives. That is, an interrupt needs to be sent, very similar to the signal that is sent when a descriptor is pushed into transmit queue. There are special queues that pushing a descriptor into them generates a system event that can be mapped as an interrupt to a core. These are queue number 652 to queue number 671.

2.5.2 More Complex Scenarios—Adding LogicOne conceivable problem with the above method is the time it takes to respond to an interrupt. When an interrupt occurs and execution is modified, a context switch takes place. This context switch has a performance cost. If the system is such that many messages may arrive one after the other, the system may use queues that are monitored by the PDSP logic, part of the QMSS. The user can define complex logic such as

• Get an interrupt only if there are N descriptors in the queue• Get an interrupt only after a descriptor is N microseconds in the queue• Get an interrupt if there are N descriptors or the first descriptor is longer than N

microseconds. (Not the same N, of course)

QMSS: Queue MappingQueue Range Count Hardware

TypePurpose

696 to 703 8 General purpose

896 to 8191 7296 General Purpose


http://www.ti.com



This is when to mention special queues that have an interesting counting feature. These are the starvation queues that count each time a core tries to pop a descriptor from the queue and the queue is empty. The counters are reset when a descriptor is pushed into the queue.

Figure 8 Special Queues

2.6 Other General Purpose Queues and Their PurposeWe already mentioned the free descriptor queues. These are the queues that host descriptors that are ready to be used. When these queues are opened, the descriptors are usually generated at the same time and part of the information is loaded. When a sender needs a descriptor, it pops it from a free descriptor queue. After the PKTDMA or a core finishes with the descriptor, it pushes it back to the free descriptor queue.

Simple receive queues are usually general purpose queues, and so are error queues. Error queues are used to do post-mortems on errors that can occur. The user can identify different type of errors—for example, illegal address on a descriptor—and instead of releasing these descriptors, it pushes it into error queues. Then the user can look at the error queues and analyze what type of errors occurred.

QMSS: Queue MappingQueue Range Count Hardware

TypePurpose

652 to 671 20 queue pend CPintC0/intC1 auto-notification queues

704 to 735 32 pdsp/firmware High Priority Accumulation queues

736 to 799 64 Starvation counter queues

832 to 863 32 Queues for traffic shaping (supported by specific firmware)

Remember the used car scam: someone buys your car but due to a mistake sends you a cashiers check for a bigger sum. All you have to do is deposit the check and send him the difference between the car price and his cashiers check.

Well, if you are the scammer, you do not want to put your address; instead, you rent a post office box. And you are too lazy to go and open the box for a single check, so you ask the clerk to call you when there are 50 letters in the box. If you get only one check, you want the clerk to call you if the check is there more than a week.

This is what the accumulator queues do for a customer (a core). The system can be programmed to send an interrupt when there are N (20, 50, any number you want) descriptors in the queue , or if there is a descriptor that was pushed more than N microseconds ago.


http://www.ti.com


3 QMSS Architecture www.ti.com

3 QMSS Architecture3.1 Overview of the Architecture

So far we mentioned descriptors, queues, PKTMA, and the queue manager subsystem. Now let’s think about how it all works.

First, I want to tell you that there are 8192 queues inside the queue manager subsystem and that the descriptors should be allocated somewhere. The descriptors; size is a multiple of 16 bytes and is at least 32 bytes. The descriptors must be allocated somewhere.

The number of descriptors in a system must be less than 512K, but for a reason that will become clear soon, it is better to have up to 16K descriptors. Let’s assume there are 16K descriptors and conceivably all of them can be pushed to any queue. So how big must each queue be—big enough for 16K descriptors? Who will pay for all this silicon?

You understand that this is not the way it works. For the 16K descriptor case, there is a 16K vector in the QMSS that has a line for each descriptor. Descriptor N corresponds to line number N in this vector. Among other things, the N line tells the system which descriptor follows descriptor N in the queue and which descriptor proceeds descriptor N in the queue.

Please understand this point. The N’th row tells the system the index pointer of the descriptor that was pushed to the same queue as descriptor N after descriptor N was pushed and the index of the descriptor that was pushed before into the same queue. If no new descriptor was pushed into the same queue, the next descriptor index is NULL (0x7ffff).

So you see, this vector actually links the current descriptor index with the next descriptor index that is pushed into the same queue and the previous descriptor, it is a double link list. Thus we gave this vector a special name. We call this 16K vector Link RAM.

And a queue? A queue is actually a set of four registers, one of which points to the index of the first descriptor that is pushed into the queue. At initialization, all queues have index 7ffff— that is, no descriptor is pushed into the queue, and all Link RAM links have the value 0x7ffff—no descriptor is linked to another descriptor.


http://www.ti.com


3 QMSS Architecturewww.ti.com

When descriptors are generated, they are pushed into the storage area, one or more FDQ. If all descriptors are stored in queue number 1, the double link list will be the following:

Now assume the system pushed descriptor number 1234 to queue number 5678. The hardware register that corresponds to queue number 5678 will change its value from 0x7ffff to 1234.

Now assume that another descriptor—this time descriptor number 9012—is pushed into the same queue. The hardware goes to the queue register and sees the value 1234, then it goes to the Link RAM at index 1234 and changes the link from 0x7ffff to 9012. The following figures describe the indices involved:

0

Head of queue 1

1

2

3

4

5

...

….

….

Link Ram

Location 00x7ffff

0

1

2

3

….

….

….

Next Previous

2

3


http://www.ti.com



What happens when a pop queue is issued? At the first pop, the first descriptor (1234) will be sent out, the value 9012 will move to the queue header, and location 1234 in the Link RAM gets the value 0x7ffff.

The second pop sends out descriptor 9012 and the queue head will turn into 0x7ffff (The value of location 9012 of the Link RAM). If another pop to the same queue is called, the return value will be -empty queue.

However, it is important to understand that even though the discussion is about descriptors, we only discussed the index of descriptors. How does the translation from indexes into descriptors actually happen? For an explanation, you have to go to the next chapter.

3.2 Link RAM, Descriptors, Memory Region, and BuffersWe already stated that the QMSS has an internal Link RAM that can have up to 16K indexes and that the user can define external Link RAM that the system can support up to 512K descriptors (all together, internal and external). But the Link RAM only deals with indexes, where are the descriptors?

1234

Head of queue 5678

1

2

3

4

5

...

9012

1236

1237

...

9013

0x7ffff

….

...

….

….

Link Ram

Location 0

Location 1234

Location 9012

0x7ffff

0

1

2

3

….

0x7fff

1233

1236

….

….

0x1234

9011

….

….

….

Next Previous

2

3


http://www.ti.com



The descriptors are allocated in what we call “Memory Region”. Memory region can be in any global memory location, and each memory region hosts N descriptors (N must be a power of 2) all of the same size. This is the place to mention that all descriptors are multiples of 16 bytes, aligned to 16 bytes, and at least 32 bytes. Up to 20 different memory region can be defined in a system, and the order in which the memory regions are defined is important. The base address of a memory region must be in ascending order; that is, the base address of the first memory region is lower than the base address of the second memory region, and so on.

When a memory region is defined, the descriptors are defined with it. One of the parameters of the memory region define function is the index of the first descriptor of the region.

The value in N location of the Link RAM has the index of the next descriptor in the queue (if there is one) and the index of the current descriptor. This value, with some information about the base address of the memory region and the size of the descriptors, enables the hardware to figure out the physical location (the address) of any descriptor, so the hardware does the translation between indexes and descriptors.

There are two types of descriptors. Monolithic descriptors have the information part of the descriptors—usually used for message exchange—and host descriptors that are associated with buffers and are usually used to transform data. As the user must allocate descriptors using the memory region (and the function to do it is called “insert memory region” or something like that), the user must allocate buffers for the host descriptors and connect the buffers to the descriptors. It is important to understand the linkage between host descriptor and a buffer. When a destination frees a descriptor, it also frees the buffer that is associated with the descriptor. Although all descriptors in the same memory region share the same size, the buffers that are associated with each descriptor need not be the same size, but it can make life easier. Note that a buffer can also be associated with a descriptor during runtime.

3.3 Descriptors ConfigurationThe following is taken from the Navigator User Guide and it describes the structure of host descriptor. More information about each field is provided in the user guide (SPRUGR9E).

Based on the application, some of the fields are pre-loaded into the descriptors when the descriptors are created, and some may be loaded by the application just before the descriptors are pushed to a queue. (Not a FDQ).


http://www.ti.com



Similarly, monolithic descriptors structure from the user guide is given by:

Again, the user guide gives a detailed description of each field in the monolithic descriptor header.


http://www.ti.com



3.4 RX Flow ConfigurationWhen a descriptor is pushed into a transmit queue, the instructions about what to do with the information is given in the definition of the queue. For example, if the information goes to SRIO, the queue is the SRIO queue in the descriptor configuration. For example, the destination address of SRIO type 11 is given in the descriptor. The free descriptor queue number in which the PKTDMA pushes the descriptor after it finishes processing is part of the descriptor information.

However, when a stream of bits comes to a receive side of a peripheral, the PKTDMA must be told what to do with the information. It is clear that eventually the streaming bit will be loaded into a descriptor buffer (or monolithic descriptor) but the knowledge of what descriptor to use, to what queue to push the descriptor, and so on must be given to the PKTDMA. This is done using a set of registers called RX Flow. There are multiple RX Flow registers in each peripheral. For example, the SRIO has 20 RX Flow sets. The user can define up to 20 destination scenarios based on incoming parameters such as receive channel number, source ID, and so on.

Even the infrastructure PKTDMA needs RX Flow. The descriptors that are on the transmit side are freed by the PKTDMA and new descriptors are popped from the free descriptor queue. The queue number is defined by the RX Flow and so is the destination queue where data loaded descriptors are pushed. When a new core pops the descriptor and processes the data, it is the responsibility of the core to free the descriptor and push it into free descriptor queue.

It becomes obvious the descriptors are always in a queue except for the processing time in which data is loaded into the descriptors or data is consumed from the descriptors. The following figure shows the life cycle of a descriptor during runtime.


http://www.ti.com



Free descriptor queue (TX)

Core generate transmit data

Load data to the descriptor or to the

buffer that is associated with the

descriptor

Pop a free descirptor

Push the descriptor into a Transmit

queue Peripheral transmit channel

Peripheral receive channel PKTDMA

After sending data out, PKTDMA

pushes descriptor back into free

descriptor queue (TX)

Free descriptor queue (RX)

PKTDMA pops a free descriptor from the receive free descriptor

queue

Destination queue

PKTDMA pushes the descriptor to a receive queue

according to the RX Flow

Receive core read pops the descriptor, process the data and pushes the descriptor back to free descriptor queue (RX)


http://www.ti.com


4 OEM - Multicore System Implementationwww.ti.com

4 OEM - Multicore System ImplementationOEM stands for Open Event Machine. This is an open standard of API that can be used to schedule threads on a system with multiple (conceivably hundreds) of processing elements with load balancing and optimizing the performances of the overall system.

The following URL points to the down-load page of the OEM API:•• http://sourceforge.net/projects/eventmachine/•

TI implementation of the OEM API is based on the Navigator. The main element of the OEM, the scheduler, is executed on one of the PDSP EISC processors that is part of the QMSS.

Two important concepts of the OEM API standard are Queues and Events. Queues are implemented by hardware queues in the QMSS (there are 8192 queues, so some of them are used for the OEM). Events are carried out by host descriptors and are moved between the queues.


http://www.ti.com


4 OEM - Multicore System Implementation www.ti.com


http://www.ti.com

multicore navigator for dummies - texas instruments...

Documents