088949 – advanced computer...

32
088949 – ADVANCED COMPUTER ARCHITECTURES Prof. Cristina Silvano email: [email protected] Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) Politecnico di Milano http://home.deib.polimi.it/silvano/aca-milano.htm AA 2016/2017 – Second Semester

Upload: others

Post on 22-Jan-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

088949 – ADVANCED COMPUTER ARCHITECTURES

Prof. Cristina Silvanoemail: [email protected]

Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)Politecnico di Milano

http://home.deib.polimi.it/silvano/aca-milano.htm

AA 2016/2017 – Second Semester

Cristina Silvano – Politecnico di Milano - 2 -

Goals of the ACA course

Provide an overview of the most recent and advanced computer architectures

Introduce the basic microarchitectural mechanisms found in modern microprocessor architectures

Provide the reasoning behind the adoption of advanced computer architectures

ADVANCED COMPUTER ARCHITECTURES: AN OVERVIEW

Cristina Silvano – Politecnico di Milano - 3 -

Advanced Computer Architectures:Supercomputers

First supercomputer reaching the Petascale peak performance (1015 Flops) was IBM Roadrunner installed in 2008 at Los Alamos National Lab (New Mexico)

Research on supercomputing is pushing towards the Exascale (1018 Flops) billions of billions to be reached in 2023.

Cristina Silvano – Politecnico di Milano - 4 -

How to measure performance:FLOPS, Floating Point Operations per Second

Name FLOPSzettaFLOPS 1021

exaFLOPS 1018

petaFLOPS 1015

teraFLOPS 1012

gigaFLOPS 109

megaFLOPS 106

kiloFLOPS 103

FLOPS 1

Cristina Silvano – Politecnico di Milano - 5 -

Top500 ranking of the world’s most powerful supercomputers (Nov. 2016)

Cristina Silvano – Politecnico di Milano - 6 -

No. 2 Tianhe-2 (Milky-Way-2) reaches 33.86 PetaFlops (Linpack performance) 54.9 PetaFlops peak performance with 17.8 MW power dissipation. Site: National Super Computer Center in Guangzhou (China)

No. 3 Titan: 17.59 PetaFlops (Linpackperformance) 27.11 PetaFlops (peak performance) with 8.2MW power dissipation. Site: Oak Ridge National Laboratory (USA)

No. 1 Sunway TaihuLight reaches 93.01 PetaFlops (Linpack performance) 125.43 PetaFlops peak performance with 15.37 MW power dissipation. Site: National Supercomputing Center in Wuxi (China)

Top500 ranking: the Italian most powerful supercomputer (Nov. 2016)

Cristina Silvano – Politecnico di Milano - 7 -

No. 12 in Top500 and No. 3 in Europe: Marconi Intel Xeon Phi: 6.22 PetaFlops (Linpack performance) 10.83 PetaFlops (peak performance) with with 241,808 cores. Site: Casalecchio di Reno, Bologna (Italy)

Marconi is the Cineca's Tier-0 system, co-designed by Cineca and Lenovo based on the Lenovo NeXtScale platform and Intel® Xeon Phi™ product family alongside with Intel® Xeon® processor E5-2600 v4 product family.

In July 2017, this system is planned to reach a total computational power of about 20Pflop/s utilizing future generation Intel Xeon processors (Sky Lakes).

No. 2 TITAN – Cray XK7, Opteron 2.2GHz, NVIDIA K20X

Cristina Silvano – Politecnico di Milano - 8 -

Exascale Supercomputers

To reach 20 MW Exascale supercomputers projected to 2023, current supercomputers must achieve energy efficiency pushing towards a goal of 50 GigaFlops/W

No.1 Sunway delivers 6 GigaFlops/W resulting only 4th in the Green500 list ranking supercomputers by their energy efficiency.

Today most green supercomputer in Green500 achieves 9.4 GigaFlops/W: NVIDIA DGX-1, Xeon E5-2698v4 and NVIDIA Tesla P100

The top positions of Green500 are currently occupied by heterogeneous computing systems

This dominance will become a trend for the next coming years to reach the target of 20 MW Exascale supercomputer

Cristina Silvano – Politecnico di Milano - 9 -

US Dept. of Energy Announced Summit and Sierra Supercomputers

Cristina Silvano – Politecnico di Milano - 10 -

Applications driving the demand for more computing performance

Cristina Silvano – Politecnico di Milano - 11 -

Astrophysics

Biology

Climate

Business Analytics

Advanced Computer Architectures:Intel® Core™ i7-3770T Processor

160mm² die @ 22nm 1.40 billion transistorsNext generations: Broadwell, Skylake, Kaby Lake at 14nm (2014); Cannonlake at 10nm (2H 2017); Ice Lake 10nm (2018)

# of Cores 4

# of Threads 8

Clock Speed 2.5 GHz

Max Turbo Frequency 3.7 GHz

Intel® Smart Cache 8 MB

Instruction Set 64-bit

Instruction Set Extensions SSE4.1/4.2, AVX

Embedded Options Available No

Lithography 22 nm

Max TDP 45 W

Recomm. Customer Price TRAY: $294.00

Max Memory Size 32 GB

Memory Types DDR3-1333/1600

# of Memory Channels 2

Max Memory Bandwidth 25.6 GB/sCristina Silvano – Politecnico di Milano

NVIDIA Fermi GPU

Cristina Silvano – Politecnico di Milano - 13 -

NVIDIA Kepler GPU

Cristina Silvano – Politecnico di Milano - 14 -

Kepler GK110 Architecture• 7.1B Transistors• 15 SMX units (2880 cores)• >1TFLOP FP64• 1.5MB L2 Cache• 384-bit GDDR5• PCI Express Gen3

NVIDIA Tesla P100 with Pascal GP100 GPU

Cristina Silvano – Politecnico di Milano - 15 -

NVIDIA Tesla P100 compared to priorgenerations

- 16 -

Advanced Computer Architectures:Smart Phones

- 17 -

4.7-inch12MP camera

5MP videocameraRetina HD display

with 3D touchA9 chip 64-bit

M9 coprocessoriOS 10

32GB 128GB

12MP camera5MP videocameraRetina HD display

with 3D touchA9 chip 64-bit

M9 coprocessoriOS 10

32GB 128GB

iPhone 74.7-inch

New 12MP camera7MP videocameraRetina HD display

with 3D touchWaterproof

Audio stereoA10 Fusion chip 64-bit

M10 co-proecessoriOS 10

32GB 128GB 256GB

iPhone 7 Plus5.5-inch display

New 12MP camera ++7MP videocameraRetina HD display

with 3D touchWaterproof

Audio stereoA10 Fusion chip 64-bit

M10 coprocessoriOS 10

32GB 128GB 256GB

Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2014 for the iPhone 6 and iPhone 6 Plus

Apple states that it has 25% more CPU performance and 50% more graphics performance with 50% of the power compared to its predecessor A7.

The A8 features the second generation of the Apple-designed 64-bit 1.4 GHz ARMv8-A dual-core CPU, called Cyclone Gen 2, and an integrated PowerVRSeries 6XT GX6450 quad-core GPU.

The A8 is manufactured on a 20 nm process by TSMC which replaced Samsungas manufacturer of Apple's mobile device processors. It contains 2 billion transistors. It has 1 GB of LPDDR3 RAM included in the package.

On October 16, 2014, Apple introduced a variant of the A8, the A8X, in the iPad Air 2 with improved graphics and CPU performance due to one extra core and higher frequency

Apple A8 System-on-Chip

Cristina Silvano – Politecnico di Milano

Apple A9 System-on-Chip

Cristina Silvano – Politecnico di Milano

Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2015 for the iPhone 6S and iPhone 6S Plus

Apple states that it has 70% more CPU performance and 90% more graphics performance compared to its predecessor A8.

This is one of the most powerful mobile chip on the market toady along with the Samsung Exynos 8890 and Qualcomm Snapdragon 820.

The A9 features the Apple-designed 64-bit 1.85 GHz ARMv8-A dual-core CPU, called Twister, and an integrated PowerVR Series 7XT GT7600 six-core GPU.

The A9 is manufactured by two companies: 14nm FinFET process by Samsung and 16 nm FinFET process by TSMC.

A9 has 2 GB of LPDDR4 RAM included in the package. Apple introduced a variant of the A9, the A9X, in the iPad Pro with the M9

motion coprocessor embedded in it

Apple A10 Fusion

Cristina Silvano – Politecnico di Milano

Apple A10 Fusion is a 64-bit ARM-based SoC designed by Apple and introduced on Sept. 2016 for the iPhone 7 and iPhone 7 Plus

Apple states that it has 40% more CPU performance and 50% more graphics performance compared to its predecessor A9.

The A10 with a die area of 125 mm2 and 3.3 billion transistors (including GPU and cache) features two Apple-designed 64-bit 2.34 GHz ARMv8-A cores called Hurricane and two energy-efficient 64-bit cores codenamed Zephyr (like the ARM big.LITTLE technology).

A10 integrates new designed PowerVR Series 7XT GT7600 six-core GPU. The A10 is manufactured 16 nm FinFET process by TSMC.

Energy efficiency underlies all markets

Energy efficiency is of paramount importance for all application markets (automotive, consumer, mobile, healthcare and beyond) and target systems spanning from sensors, cyber-physical systems, embedded systems up to servers and HPC systems.

Squeezing of computing cores

201122 nm 

200932 nm

200745 nm

200565 nm1.4 mm2

Source:ARM9 STMicroelectronics

201314 nm

201122 nm 

200932 nm

200745 nm

200565 nm1.4 mm2

Source:ARM9 STmicroelectronics

201314 nm

… entering the multi/many‐core era

What are the barriers of further scaling?

Transistor density increases ~2x every 2 years

Frequency wall

Power wall

Utilisation wall

… the end of the Dennard scaling… entering the dark silicon era

The dark silicon problem

The power wall and the utilisation wall represent the main barriers for the efficient scaling in the multi/many-core era

Dark silicon: Fraction of the die not usable due to the power budget

ACA COURSE INFORMATION

Cristina Silvano – Politecnico di Milano - 37 -

Contact Information

Office hours for students:Monday 14.00 - 15.00 at DEIB, Via Ponzio 34/5 First floor –Internal phone number: 3692 (please send an email to get an appointment).

Main Contact: The students can contact prof. Cristina Silvano bye-mail ([email protected])by indicating:

Subject: ACA COURSE Milano, Your_Surname, Your_Name, Your_POLIMI_ID_NUMBER

Cristina Silvano – Politecnico di Milano

ACA Teaching Assistants

Prof. Giovanni Agostae-mail ([email protected])

Prof. Gerardo Pelosie-mail ([email protected])

Cristina Silvano – Politecnico di Milano

Cristina Silvano – Politecnico di Milano

ACA Course Info

Teaching Activity: The course consists of 5 CFU and it is organized in 30 hours of lectures and 20 hours of written/tool-based exercises to prove the concepts presented during the lectures.

Pre-requirements: Basic concepts on logic design and computer architectures.

Cristina Silvano – Politecnico di Milano

ACA Final Exam

FINAL EXAM:The final exam consists of a written exam. For each written exam, a max. score of 32 points will be assigned to 6 questions: max. 16 points will be assigned for the solution of the exercise part (composed of 3 questions) and max. 16 points will be assigned for answering to the theory part (composed of 3 questions)It is possible to ask an OPTIONAL project to the instructor. The project must be concluded before each written exam session (firm deadline). The project assign an additional score up to max 12 points. The additional points given by the project will be added to the score of the written exam only if the final score of the written exam will be sufficient (>=18 points).

Cristina Silvano – Politecnico di Milano

ACA Teaching Material

Additional information in slides and papers available through Beep and the course webpage: http://home.deib.polimi.it/silvano/aca-milano.htmIf you're using MOZILLA FIREFOX AS WEB BROWSER, for a correct visualisationand printing of the PDF SLIDES, please use the SAVE AS option and save the PDF FILE on your laptop for correct visualisation and printing.

Reference Book: "Computer Architecture, A Quantitative Approach", John Hennessy, David Patterson, Morgan Kaufmann, Fourth Edition / Fifth Edition

ACA course is offered in English Teaching materials (slides/papers/textbook) are

available in English Final exam can be done in English Teaching support available in English and Italian Students with M-Z must follow the parallel ACA course

session held by prof. Donatella Sciuto. ACA course objectives and program are aligned. Text of final written exam is the same.

Cristina Silvano – Politecnico di Milano

ACA Course