9.4.0 - introduction to the nehalem architecture-2009

Upload: denish-mistry

Post on 07-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    1/14

    Introduction to the Nehalem Architecture

    We Begin Once More

    This is an article we have all been anticipating for years now as it introduces the most

    dramatic shift in Intel processing technology since the introduction of the front-side bus. Andironically, it is this shift that will finally remove the FSB from Intel products for good. TheNehalem core architecture has been the focus of most of Intel's Developer Forums for the last 24months and the culmination of the technology, marketing and products begins today.

    Intel's Core i7 processors will bring a dramatic set of changes to the enthusiast and PCcommunity in general including a new processor, new CPU socket, new memory architecture,new chipset, new motherboards and new overclocking methods. All of that and more will beaddressed in our review today so be prepared for a LOT of valuable information.

    The Nehalem Architecture - Years of data summed up

    We have done more than our share of technical documentation of the architecture and design,enough so that I feel that duplicating all of it here would be somewhat of a disservice to ourfrequent readers. I will highlight the most important architectural shifts in the Nehalem designhere but I still encourage you to read over my much more in-depth look at the processor designpublished in August: Inside the Nehalem: Intel's New Core i7 Microarchitecture.

    Here you can see a die shot of the new Nehalem processor - in this iteration a four core designwith two separate QPI links and large L3 cache in relation to the rest of the chip. The primary

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    2/14

    goal of Nehalem was to take the big performance advantages that the Core 2 CPUs have andmodularize them. Now with the Nehalem design, which will be branded as the Intel Core i7,Intel can easily create a range of processors from 1 core to 8 cores depending on the applicationand market demands. Eight core CPUs will be found in servers while you'll find dual coremachines in the mobile market several months after the initial desktop introduction. QPI (Quick

    Path Interlink) channels can also vary in order improve CPU-to-CPU communication.

    At a high level the Nehalem core adds some key features to the processor designs we currentlyhave with Penryn. SSE instructions get the bump to a 4.2 revision, better branch prediction andpre-fetch algorithms and simultaneous multi-threading (SMT) makes a return after a brief hiatuswith the NetBurst architecture.

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    3/14

    HyperThreading Returns

    I mentioned before that Intel is using Nehalem to mark the return of HyperThreading to its bagof weapons in the CPU battle; the process is nearly identical to that of the older NetBurstprocessors and allows two threads to run on a single CPU core. But SMT (simultaneous multi-threading) or HyperThreading is also a key to keeping the 4-wide execution engine fed withwork and tasks to complete. With the larger caches and much higher memory bandwidth that thechip provides this is a very important addition.

    Intel claims that HyperThreading is an extremely power efficient way to increase performance -it takes up very little die area on Nehalem yet has the potential for great performance gains in

    certain applications. This is obviously much more efficient than adding another core to the diebut just as obviously has some drawbacks to that method.

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    4/14

    Here you can see Intel's estimations of how much HyperThreading can help performance inspecific applications. Surprisingly one of the best performers is the 3DMark Vantage CPU testthat simulates AI and physics on the processor while POV-Ray 3.7 still sees huge 30% boost inperformance for this relatively small cost addition in logic.

    Welcome to the Uncore, we got fun and games...

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    5/14

    A new term Intel is bringing to world with this modular design is the "uncore" - basically all ofthe section of the processor that are separate from the cores and their self-contained cache.Features like the integrated memory controller, QPI links and shared L3 cache fall into the"uncore" category. All of these components that you see are completely modular; Intel can addcores, QPI links, integrated graphics (coming later in 2009) and even another IMC if they

    desired.

    New cache structure, new L3 cache

    The Intel Smart Cache makes a return with the Nehalem core but this time in a 3-level cachehierarchy design. The two first level caches include a 32 KB instruction cache and 32 KB ofdata cache and the L2 cache is a completely new design compared to the Core 2 CPUs outtoday. Each core receives 256 KB of unified cache that is 8-way associative that is both lowlatency (about 10 cycles from load-to-use) and scales well to keep extra load off the L3 cache.

    The L3 cache layer is completely new to Intel though AMD's Barcelona chip introduced asimilar design late in 2007. This L3 is an inclusive cache that scales with the number of cores onthe processor - quad core processors will have as much as 8MB in 16-way associativity. Anyperceived latency on the L3 will depend on the frequency ratio between the core and uncoresections of the CPU - something we haven't gotten enough information on yet.

    Bring out yer' dead! (front-side bus)

    One of the features that Intel HAS been talking about for a while is the move away from thefront-side bus architecture and to something called Intel's Quick Path Interconnect. Previouslyknown only as CSI, common system interface, QuickPath is Intel's answer to AMD'sHyperTransport technology and it performs a very similar function.

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    6/14

    Starting with Nehalem and moving forward Intel's processors will feature a direct connectarchitecture that is point to point and will transmit data from socket to socket as well as from the

    CPU to the chipset all while scaling nicely as the number of CPUs and QPI links goes up. Partof the reason the QPI technology was needed on Nehalem was due to the new integrated memorycontroller on the processor. As AMD introduced many years ago, an IMC allows for higher peakmemory bandwidth and lower memory latency though Intel is taking it another step up byoffering a three-channel DDR3 memory controller from each CPU. The QPI is also arequirement of efficient chip-to-chip communications where one CPU might need to access datathat is stored in memory on the other processors memory controller.

    The QPI design supports 6.4 GigaTransfers a second or 12.8 GB/s of bandwidth in each directionfor 25.6 GB/s total bandwidth between two points. Future versions of QPI will scale up to fasterspeeds as well. You can also tell in the above four-CPU diagram that QPI will scale well with as

    many as four CPUs - each processor in this case would require four total QPI connections andwould be only one hop from any other CPUs memory.

    An Integrated Memory Controller, with three channels!

    The Intel Nehalem Integrated Memory Controller (IMC) is actually pretty scalable in its ownright - besides offering extreme high bandwidth and low latency the number of memory channelscan be varied, both buffered and non-buffered memories are supported and memory speeds canbe adjusted all based on the market that the processor will be targeted for. Low cost cores withonly dual channel memory should cost considerably less than top end three-channel systems.

    At launch, the DDR3 memory controller located on Nehalem will only OFFICIALLY supportDDR3-1066 memory speeds. While that is pretty lame, I was told on numerous occasions thatthe memory controller will run at speeds of DDR3-1600-2000 but official supports stops withJEDEC. The IMC in Nehalem will also force Intel to use the NUMA (non-uniform memoryaccess standard) since memory will be stored in different areas (not just attached to the northbridge) for the first time in Intel's desktop processors.

    New Core Power Controls

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    7/14

    The Nehalem core also has a new trick in its bag that enables it to lower the power consumptionof a core to nearly 0 watts - something that wasn't possible on previous designs. You can see inthe image above what the total power consumption of a core was typically made up of with theCore 2 series of processors - clocks and logic are the majority of it yes, but a third or more isrelated to leakage of the transistors and was something that couldn't be turned off in priordesigns.

    How is this changed with Nehalem? Well with the independent power controller in the PCU and

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    8/14

    the different power planes that each core rests on, the power consumption for each core iscompletely independent from the others. You can see in this diagram that though Core 3 isloaded the entire time, both Core 2 and Core 0 are able to power down to practically 0 wattswhen their work load is complete.

    Turbo Mode: free performance?

    Perhaps the most interesting bit of news out of Intel's Nehalem was something called TurboMode - a feature directly enabled by the PCU we discussed on the previous page. With modernprocessors, the debate has raged whether users are better off getting a quad-core CPU at a lowerfrequency or a dual-core CPU at a higher frequency. Intel is hoping that with Turbo Mode userswill get the best of both worlds.

    The idea is pretty straight forward: if you have four cores that run at combined powerconsumption (and heat dissipation) of X, then if you only have two cores loaded (with the othertwo at idle) then you have additional power headroom to overclock the working cores to a higherfrequency.

    For enthusiasts and gamers this should been an exciting turn of events. While Intel wasn't veryspecific at this point I imagine we'll see ranges of 200-300 MHz going from the full quad-coreclock rate to the a dual-core or single-core (based on idle cores at the time. This means if youpurchase a 3.2 GHz Core i7 Nehalem based processor, you will likely see clock rates as high as3.5 GHz when running single threaded or just dual threaded applications. Gamers should alsotake note of this!

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    9/14

    Intel claims that with the power of the PCU inside the chip the Nehalem core is aware of itssurroundings and conditions. If your system is running very cool, say you have water cooling forexample, the chip will recognize that it is well under its own TDP and push the clocks evenfaster. This is possible even while loading all four cores as the above diagram shows. The on-board micro-controller tunes voltages based around a given frequency, operating conditions andspecific silicon characteristics. In some ways it appears that the Nehalem core will be able selfaward enough to find out how far it can be pushed without burning up.

    The Intel Core i7 CPUs

    So now that we know that guts of everything that makes up the Intel Core i7 series of processors,what do the physical specimens themselves have to offer? As you might expect, from an exteriorappearance the Nehalem-based CPUs are really quite plain and look remarkably similar to theirolder brother the Core 2:

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    10/14

    Intel Core i7 965 on the left...or is it the right?

    The actual die is hidden by a heat spreader very similar to the ones we have become used on theCore and Core 2 series of processors and they do in fact use a very similar mountingmechanism. Rather than having pins on the CPU, the pins are located on the motherboardsocketthus in theory preventing a lot of damage to the processors by end users during installation. This

    time around though the pin count is upped from 775 to 1366 making the new Nehalem Core i7processor socket known as LGA1366. How very intuitive!

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    11/14

    For a size comparison I have provided the above image: the two Core i7 CPUs rest on the bottomrow while the Core 2 Extreme QX9770 LGA775 (left) and AMD Phenom X4 processor (right)take the high road. You can see that the new Intel Core i7 CPUs are indeed just larger versionsof the Core 2 packaging with a couple of notches on the left and right hand sides to prevent

    improper directional installation in the CPU socket.

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    12/14

    One more for good measure: the 1366-pin Nehalem-based designs on the bottom clearly havemany more contacts than the 940-pin Phenom or 775-pin Core 2.

    One note about today's product announcement is that Intel does not anticipate having product for

    sale in the channel until sometime "in November" and today we are actually bringing you asneak peek of the products as they will be available later in the month. The SKUs coming tomarket this year include:

    Three CPUs will be available for purchase this month going from the Extreme Edition Core i7-965 down to the Core i7-920. All three processors share a surprisingly high amount ofspecifications including memory speed, thermal dissipation and cache size. Of course all arebuilt on the Intel Hi-K 45nm process technology. As we have seen in previous generations onlythe Core i7-965 Extreme Edition will have the overspeed protection removed to allow for a muchmore robust overclocking experience. The QPI speed on the EE CPU is also higher allowing fora faster connection to the north bridge and PCI Express 2.0 connection though performanceadvantages of this rate change are still a question in my mind.

    All three CPUs have a transistor count of 731M on a die size of about 263 mm^2 -significantly smaller than the current beast of the processing world, the GT200 from NVIDIA

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    13/14

    that sports 1.4 Billion transistors!

    Pricing is still just an estimate of course since they are not going to be for sale today, but Intelhas set 1k pricing at the following:

    Core i7-965 Extreme Edition: $999

    Core i7-940: $562 Core i7-920: $284

    Two things stand out to me about the pricing on these first three Core i7 CPUs. First I ampleasantly surprised to find that Intel would choose to offer such a low cost option for their firstround of releases with the Core i7-920 selling for under $300. Intel probably could have gottenaway with keep all three SKUs over the $500 mark for such a new technology but now we willsee some mainstream PCs have the ability to adopt the new core architecture pretty quickly.

    The second point is that I seriously doubt we will see the Core i7-965 processor selling or $999

    any time soon. An unfortunate trend we have seen increase with the Core 2 Extreme processorsis that they take a LONG time to come down to their 1k pricing; take the Core 2 ExtremeQX9650 as an example of a CPU that has yet to reach that $999 price tag after exactly one yearon the market.

  • 8/6/2019 9.4.0 - Introduction to the Nehalem Architecture-2009

    14/14

    The latest CPU-Z at revision 1.48 is already setup to properly recognize and report on thespecifications and speeds of the Intel Core i7 processors. It will become an invaluable piece ofsoftware as our tweaking and overclocking process begins.