MultiCore Architecture

Download MultiCore Architecture

Post on 27-Nov-2014




0 download

Embed Size (px)


The computation technology is going very up second by second. And with that vast enhancements the processing power of a computer has also increased. Now again the approach for making a processor has move towards single core to multicore processors to increase the performance. This book describes the overall architecture, ups and downs and the ongoing researches made under the Multi-core Technology.


<p>THE NEWTREND IN PROCESSOR MAKING</p> <p>Nabendu Karmakar</p> <p>The revolution of computer system has moved up enormously. From the age of heavy handed bulky computers today we have moved to thinnest notebooks. From the age of 4 bit Intel 4004 processors, we have moved up to Intel Core i7 extremes. From the first computer named as ENIAC, we now have palmtops. There have been a lot of changes in the way of computing. Machines have upgraded, we have moved to multi core processors from the single core processors. Single core processors, who served the computing generation for quite a long time is now vanishing. Its the Multi-core CPUs that are in charge. With lots of new functionality, great features, new up gradation Multi-core processors are surely the future product.</p> <p>Nabendu Karmakar</p> <p>Contents</p> <p>1. Computers &amp; Processors 1.1 Processors 2. A brief history of Microprocessor 2.1 Moores Law 3. Single Core Processor: A step behind 4. Past efforts to increase efficiency 5. Need of Multi-core CPU 6. Terminology 7. Multi-core Basics 8. Multi-core implementation 8.1 Intel &amp; AMD Dual Core Processor 8.2 The CELL Processor 8.3 Tilera TILE64 9. Scalability potential of Multi-core processors 10. Multi-core Challenges 10.1 Power &amp; Temperature 10.2 Cache Coherence 10.3 Multithreading 11. Open issues 11.1 Improved Memory Management</p> <p>01 01 04 04 06 07 08 12 13 15 15 16 17 19 23 23 24 25 26 26Nabendu Karmakar</p> <p>11.2 System Bus and Interconnection Networks 11.3 Parallel Programming 11.4 Starvation 11.5 Homogenous vs. Heterogonous Core 12. Multi-Core Advantages 12.1 Power and cooling advantages of multicore processors 12.2 Significance of sockets in a multicore architecture 12.3 Evolution of software toward multicore technology 13. Licensing Consideration 14. Single Core vs. Multi Core 15. Commercial Incentives 16. Last Words 17. References Used</p> <p>26 26 27 28 29 29</p> <p>30 32 34 35 36 37 38</p> <p>Nabendu Karmakar</p> <p>Multi-core Architecture</p> <p>1</p> <p>1. Computers &amp; Processors: Computers are machines that perform tasks or calculations according to a set of instructions, or programs. The first fully electronic computer ENIAC (Electronic Numerical Integrator and Computer) introduced in the 1946s was a huge machine that required teams of people to operate. Compared to those early machines, today's computers are amazing. Not only are they thousands of times faster, they can fit on our desk, on our lap, or even in our pocket. Computers work through an interaction of hardware and software. Hardware refers to the parts of a computer that we can see and touch, including the case and everything inside it. The most important piece of hardware is a tiny rectangular chip inside our computer called the central processing unit (CPU), or microprocessor. It's the "brain" of your computer the part that translates instructions and performs calculations. Hardware items such as your monitor, keyboard, mouse, printer, and other components are often called hardware devices, or devices. Software refers to the instructions, or programs, that tell the hardware what to do. A wordprocessing program that you can use to write letters on your computer is a type of software. The operating system (OS) is software that manages your computer and the devices connected to it. Windows is a well-known operating system.</p> <p>The data in the instruction tells the processor what to do. The instructions are very basic things like reading data from memory or sending data to the user display, but they are processed so rapidly that we experience the results as the smooth operation of a program. Processors</p> <p>Nabendu Karmakar</p> <p>1.1 Processors: Processors are said to be the brain of a computer system which tells the entire system what to do and what not to. It is made up of large number of transistors typically integrated onto a single die. In computing, a processor is the unit that reads and executes program instructions, which are fixed-length (typically 32 or 64 bit) or variable-length chunks of data.</p> <p>Multi-core Architecture</p> <p>2</p> <p>were originally developed with only one core. The core is the part of the processor that actually performs the reading and executing of instructions. Single-core processors can process only one instruction at a time. Speeding up the processor speed gave rise to the overall system also. A multi-core processor is composed of two or more independent cores. One can describe it as an integrated circuit which has two or more individual processors (called cores in this sense). Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package. A many-core processor is one in which the number of cores is large enough that traditional multi-processor techniques are no longer efficient largely due to issues with congestion supplying sufficient instructions and data to the many processors. This threshold is roughly in the range of several tens of cores and probably requires a network on chip. A dual-core processor contains two cores (Such as AMD Phenom II X2 or Intel Core Duo), a quad-core processor contains four cores (Such as the AMD Phenom II X4 and Intel 2010 core line, which includes 3 levels of quad core processors), and a hexa-core processor contains six cores (Such as the AMD Phenom II X6 or Intel Core i7 Extreme Edition 980X). A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share caches, and they may implement message passing or shared memory inter-core communication methods. Common network topologies to interconnect cores include bus, ring, 2-dimensional mesh, and crossbar. Homogeneous multi-core systems include only identical cores, unlike heterogeneous multi-core systems. Just as with single-processor systems, cores in multi-core systems may implement architectures like superscalar, VLIW, vector processing, SIMD, or multithreading. Multi-core processors are widely used across many application domains including general-purpose, embedded, network, digital signal processing (DSP), and graphics. The amount of performance gained by the use of a multi-core processor depends very much on the software algorithms and</p> <p>Nabendu Karmakar</p> <p>Multi-core Architecture</p> <p>3</p> <p>implementation. In particular, the possible gains are limited by the fraction of the software that can be parallelized to run on multiple cores simultaneously; this effect is described by Amdahl's law. In the best case, so-called embarrassingly parallel problems may realize speedup factors near the number of cores, or beyond even that if the problem is split up enough to fit within each processor's or core's cache(s) due to the fact that the much slower main memory system is avoided. Many typical applications, however, do not realize such large speedup factors. The parallelization of software is a significant on-going topic of research.</p> <p>Nabendu Karmakar</p> <p>Multi-core Architecture</p> <p>4</p> <p>2. A brief history of Microprocessors: Intel manufactured the first microprocessor, the 4-bit 4004, in the early 1970s which was basically just a number-crunching machine. Shortly afterwards they developed the 8008 and 8080, both 8-bit, and Motorola followed suit with their 6800 which was equivalent to Intels 8080. The companies then fabricated 16-bit microprocessors, Motorola had their 68000 and Intel the 8086 and 8088; the former would be the basis for Intels 80386 32-bit and later their popular Pentium lineup which were in the first consumer-based PCs.</p> <p>Fig 1. Worlds first Single Core CPU</p> <p>As shown in Figure 2, the number of transistors has roughly doubled every 2 years. Moores law continues to reign; for example, Intel is set to produce the worlds first 2 billion transistor microprocessor Tukwila later in 2008. Houses prediction, however, needs another correction. Throughout the 1990s and the earlier part of this decade microprocessor frequency was</p> <p>Nabendu Karmakar</p> <p>2.1 Moores Law: One of the guiding principles of computer architecture is known as Moores Law. In 1965 Gordon Moore stated that the number of transistors on a chip wills roughly double each year (he later refined this, in 1975, to every two years). What is often quoted as Moores Law is Dave Houses revision that computer performances will double every 18 months. The graph in Figure 1 plots many of the early microprocessors briefly discussed in here:</p> <p>Multi-core Architecture</p> <p>5</p> <p>synonymous with performance; higher frequency meant a faster, more capable computer. Since processor frequency has reached a plateau, we must now consider other aspects of the overall performance of a system: power consumption, temperature dissipation, frequency, and number of cores. Multicore processors are often run at slower frequencies, but have much better performance than a single-core processor because two heads are better than one.</p> <p>Nabendu Karmakar</p> <p>Fig 2. Depiction of Moores Law</p> <p>Multi-core Architecture</p> <p>6</p> <p>3. Single Core Processors: A step behind: A single core processor is a processor which contains only one core. This kind of processor was the trend of early computing system. At a high level, the single core processor architecture consists of several parts: the processor core, two levels of cache, a memory controller (MCT), three coherent HyperTransport (cHT) links, and a non-blocking crossbar switch that connects the parts together. A single-core Opteron processor design is illustrated in Figure 1. The cHT links may be connected to another processor or to peripheral devices. The NUMA design is apparent from the diagram, as each processor in a system has its own local memory, memory to which it is closer than any other processor. Memory commands may come from the local core or from another processor or a device over a cHT link. In the latter case the command comes from the cHT link to the crossbar and from there to the MCT. Fig 3. Single core processors block diag. The local processor core does not see or have to process outside memory commands, although some commands may cause data in cache to be invalidated or flushed from cache.</p> <p>Nabendu Karmakar</p> <p>Multi-core Architecture</p> <p>7</p> <p>4. Past efforts to increase efficiency: As touched upon above, from the introduction of Intels 8086 through the Pentium 4 an increase in performance, from one generation to the next, was seen as an increase in processor frequency. For example, the Pentium 4 ranged in speed (frequency) from 1.3 to 3.8 GHz over its 8 year lifetime. The physical size of chips decreased while the number of transistors per chip increased; clock speeds in-creased which boosted the heat dissipation across the chip to a dangerous level. To gain performance within a single core many techniques are used. Superscalar processors with the ability to issue multiple instructions concurrently are the standard. In these pipelines, instructions are pre-fetched, split into sub-components and executed out-of-order. A major focus of computer architects is the branch instruction. Branch instructions are the equivalent of a fork in the road; the processor has to gather all necessary information before making a decision. In order to speed up this process, the processor predicts which path will be taken; if the wrong path is chosen the processor must throw out any data computed while taking the wrong path and backtrack to take the correct path. Often even when an incorrect branch is taken the effect is equivalent to having waited to take the correct path. Branches are also removed using loop unrolling and sophisticated neural network-based predict-tors are used to minimize the miss prediction rate. Other techniques used for performance enhancement include register renaming, trace caches, reorder buffers, dynamic/software scheduling, and data value prediction. There have also been advances in power- and temperature-aware architectures. There are two flavors of power-sensitive architectures: low-power and power-aware designs. Low-power architectures minimize power consumption while satisfying performance constraints, e.g. embedded systems where low-power and real-time performance are vital. Power-aware architectures maximize performance parameters while satisfying power constraints. Temperatureaware design uses simulation to determine where hot spots lie on the chips and revises the architecture to decrease the number and effect of hot spots.</p> <p>Nabendu Karmakar</p> <p>Multi-core Architecture</p> <p>8</p> <p>5. Need of Multi-core CPUs: It is well-recognized that computer processors have increased in speed and decreased in cost at a tremendous rate for a very long time. This observation was first made popular by Gordon Moore in 1965, and is commonly referred to as Moores Law. Specifically, Moores Law states that the advancement of electronic manufacturing technology makes it possible to double the number of transistors per unit area about every 12 to 18 months. It is this advancement that has fueled the phenomenal growth in computer speed and accessibility over more than four decades. Smaller transistors have made it possible to increase the number of transistors that can be applied to processor functions and reduce the distance signals must travel, allowing processor clock frequencies to soar. This simultaneously increases system performance and reduces system cost. All of this is well-understood. But lately Moores Law has begun to show signs of failing. It is not actually Moores Law that is showing weakness, but the performance increases people expect and which occur as a side effect of Moores Law. One often associates performance with high processor clock frequencies. In the past, reducing the size of transistors has meant reducing the distances between the transistors and decreasing transistor switching times. Together, these two effects have contributed significantly to faster processor clock frequencies. Another reason processor clocks could increase is the number of 2 transistors available to implement processor functions. Most processor functions, for example, integer addition, can be implemented in multiple ways. One method uses very few transistors, but the path from start to finish is very long. Another method shortens the longest path, but it uses many more transistors. Clock frequencies are limited by the time it takes a clock signal to cross the longest path within any stage. Longer paths require slower clocks. Having more transistors to work with allows more sophisticated implementations that can be clocked more rapidly. But there is a down side. As processor frequencies climb, the amount of waste heat produced by the processor climbs with it. The ability to cool the processor inexpensively within the last few years has become a major factor limiting how fast a processor</p> <p>Nabendu Karmakar</p> <p>Multi-core Architecture</p> <p>9</p> <p>can go. This is offset, somewhat...</p>


View more >