introdução ao coprocessador intel® xeon phi™ - intel software conference 2013
Post on 06-May-2015
1.391 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Introduction to the Intel® Xeon Phi™ Coprocessor
Leo Borges (leonardo.borges@intel.com)
Intel - Software and Services Group
iStep-Brazil, August 2013
1
Click to edit Master title style
2
Introduction
High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software
Intel Xeon Phi Case Studies
Intel Xeon Phi Ecosystem
Conclusions & References
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Large ScaleClustersfor Test & Optimization
Tera-ScaleResearch
Leading Performance,Energy Efficient
Platform BuildingBlocks
Dedicated,Renowned ApplicationsExpertise
Broad Software Tools Portfolio
DefinedHPCApplicationPlatform
ManyIntegrated CoreArchitecture
ManufacturingProcessTechnologies
Exa-Scale Labs
A long term commitment to the HPC market segment
3
Intel in High-Performance Computing
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
HPC Processor Solutions
Common Intel Environment
Portable code, common tools
Xeon®
General Purpose Architecture
Leadership Per Core Performance
FP/core via AVX
Multi-Core Performance Intel® Xeon Phi™ Coprocessor
Trades a “big” IA core for multiple lower performance IA cores resulting in higher performance for a subset of highly parallel applications
ENGeneral purpose
perf/watt
EPMax perf/watt
w/ Higher Memory BW / freq and QPI ideal for HPC
Xeon EXAdditional
sockets & big memory
EP 4SAdditional compute density
Multi-Core Many-Core
4
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.5
Highly parallel and vectorized applications, or with need for higher memory bandwidth, will run even faster on Intel® Xeon Phi™ Coprocessors
Most applications will still run best on multi-core Intel® Xeon® processors
Optimizing code often delivers significant performance gains
RUNNING
EXISTING SERIAL SOFTWARE
RUNNING
OPTIMIZEDSOFTWARE
Big Gains for Selected Applications
Medical imaging and biophysics
Computer Aided Design & Manufacturing
Climate modeling & weather prediction
Financial analyses, trading
Energy &oil exploration
Digital content creation
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.6
YES
Evaluating Your Applicationsfor Intel® Xeon Phi™
NO
YES
YES
YES
Can your workload benefit from more
memory bandwidth?
Can your workload benefit from
large vectors?
NO
NO
Can your workload scale to over 100 threads?
Use Intel® Xeon Phi™ coprocessors for applications that scale with:
• Threads • Vectors • Memory Bandwidth
Click to edit Master title style
7
Introduction
High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software
Intel Xeon Phi Case Studies
Intel Xeon Phi Ecosystem
Conclusions & References
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.8
Intel Many Integrated Core (MIC, pronounced “Mike”)
Product Family/Architecture for Highly Parallel Applications
• Based on large number of smaller, low power, Intel Arch. Cores
• 512-bit wide vector engine
• Compliments Intel Xeon processor product line
• Provides breakthrough performance for highly parallel apps
– Familiar x86 programming model– Same source code supports both Intel Xeon processor & Intel Xeon Phi coprocessor– Initially a coprocessor with PCI Express form factor
First products announced at SC12: Code named Knights Corner (KNC)
• Up to 61 cores, 4 threads per core
• Up to 16GB GDDR5 memory (up to 352 GB/s)
• 225-300W (Cooling: Both passive & active SKUs)
• x16 PCIe Form-Factor (requires IA host)
8
Intel® Xeon® Phi™ Product FamilyBased on the Intel MIC Architecture
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.9
Each Xeon Phi can be addressed asan Individual Node in the Cluster
• 9
6 to 16 GB GDDR5 memory
INTEL CONFIDENTIAL
• Click to edit Master text styles
‒ Second level
Third level
o Fourth level
Fifth level
Click to edit Master title style
10© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
3 Family Outstanding Parallel Computing Solution
Performance/$ leadership
Intel® Xeon Phi™ Coprocessors
3120P 3120A
5 FamilyOptimized for High Density Environments
Performance/watt leadership
5120D
7 FamilyHighest Level of FeaturesPerformance leadership
7120P 7120X
16GB GDDR5
352 GB/s
> 1.2 TFlops DP
Turbo
T
8GB GDDR5
>300 GB/s
>1 TFlops DP
6GB GDDR5
240 GB/s
>1 TFlops DP
5120P
Click to edit Master title style
11
Introduction
High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software
Performance Considerations
Intel Xeon Phi Case Studies
Intel Xeon Phi Ecosystem
Conclusions & References
12
Based on memory access and flops required
• Temporal/spatial locality of data
• Bandwidth Requirement
6 GB/s
Bandwidth
LimitedCore Limited
Stream-triad
BLAS1 & BLAS
2
All
Linpack
DGEMM
Mfg &
Scientific
Sparse
Matrix-
Vector
Scientific
SPECfp2000
All
Reservoir
Simulation
FTDT
Oil & GasKirchhoff
Migration
Oil & Gas
Fluid Dynamics
Ocean Models
ScientificFFT
Oil & Gas
Mil HPC
(Y: Math Kernel; B: Applications; W: Segment)
Option
pricing
FSI
Molecular
Dynamic
Scientific
Application Characterization
RTM
Oil & Gas
INTEL CONFIDENTIAL13
75
171
0
50
100
150
200
STREAM Triad (GB/s)
330
802
0
200
400
600
800
1000
SMP Linpack (GF/s)
347
887
0
200
400
600
800
1000
DGEMM (GF/s)
728
1,796
0
500
1000
1500
2000
SGEMM (GF/s)
Notes
1. Intel® Xeon® Processor E5-2680 used for all SGEMM Matrix = 12800 x 12800 , DGEMM Matrix 10752 x 10752, SMP Linpack Matrix 26000 x 26000
2. Intel® Xeon Phi™ coprocessor SE10P (ECC on) with “Gold” SW stack SGEMM Matrix = 12800 x 12800, DGEMM Matrix 12800 x 12800, SMP Linpack Matrix 26872 x 28672
3. Average single-node results from measurements across a set of nodes from the TACC+ Stampede* Cluster
+ Texas Advanced Computing Center (TACC) at the University of Texas at Austin.
++ Measured on the TACC+ Stampede Cluster
Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native)
Synthetic BenchmarksIntel® Xeon Phi™ Coprocessor and Intel® MKL
UP TO
2.4XUP TO
2.5XUP TO
2.2XUP TO
2.4X
Higher is Better
• 2S Intel® Xeon® • Intel Xeon Phi
ECC ON84% Efficient 83% Efficient 75% Efficient
INTEL CONFIDENTIAL
1.00
3.91
4.634.81
0.00
1.00
2.00
3.00
4.00
5.00
6.00
2S Intel® Xeon® Processor SMP Linpack DGEMM SGEMM
Rela
tive P
erfo
rm
an
ce p
er W
att
(N
orm
alized
to
1.0
Baselin
e o
f a
2 s
ocket
In
tel®
Xeo
n®
pro
cesso
r E
5-2
67
0)
Performance per Watt
Intel® Xeon Phi™ Coprocessor vs. 2S Intel® Xeon® processor (Intel MKL)
14
1 Intel® Xeon Phi™ Coprocessorvs.
2 Socket Intel® Xeon® processor
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you
in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Source: Intel Measured results as of October 26, 2012 Configuration Details: Please reference slide speaker notes.
For more information go to http://www.intel.com/performance
Notes:
1. 2 X Intel® Xeon® Processor E5-2670 (2.6GHz, 8C, 115W)
2. Intel® Xeon Phi™ coprocessor 5110P (ECC on) with Gold RC SW stack (Coprocessor power only)
Higher is Better
Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native)
5110P
Click to edit Master title style
15
Introduction
High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software
Native, Offload and Variations
Intel Xeon Phi Case Studies
Intel Xeon Phi Ecosystem
Conclusions & References
INTEL CONFIDENTIAL
• Click to edit Master text styles
‒ Second level
Third level
o Fourth level
Fifth level
Click to edit Master title style
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Wide Spectrum of Execution Models
General purpose serial and parallel
computing
Codes with highly-parallel phases
Highly-parallel codes
Codes with balanced needs
Main( )Foo( )
MPI_*()
Foo( )
Main( )Foo( )
MPI_*()
Main()Foo( )
MPI_*()
Main( )Foo( )
MPI_*()
Main( )Foo( )
MPI_*()
Multicore
Many-core
Multicore Centric Many-core Centric
(Intel® Xeon® processors) (Intel® Many Integrated Core co-processors)
Multi-core-hosted Offload Symmetric Many-core-hosted
Range of Models to Meet Application Needs
16
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
The Intel Manycore Platform Software Stack (MPSS) provides Linux on the coprocessor
17
Linux* OS
Intel® Xeon Phi™ Coprocessor support libraries, tools, and
drivers
Linux* OS
PCI-E Bus PCI-E Bus
Intel® Xeon Phi™ Coprocessor communication and application-
launch support
Intel® Xeon Phi™ Coprocessor Host Processor
System-level code System-level code
User-level codeUser-level code
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Runs either as an accelerator for offloadedhost computation…
18
Linux* OS
Intel® Xeon Phi™ Coprocessor support libraries, tools, and
drivers
Linux* OS
PCI-E Bus PCI-E Bus
Intel® Xeon Phi™ Coprocessor communication and application-
launch support
Intel® Xeon Phi™ Coprocessor Host Processor
System-level code System-level code
User-level codeUser-level code
Offload libraries, user-level driver, user-accessible APIs
and libraries
User code
Host-side offload application
User code
Offload libraries, user-accessible APIs and libraries
Target-side offload applicationAdvantages
• More memory available• Better file access• Host better on serial code• Better uses resources
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
…Or runs as a native orMPI* compute node via IP or OFED
19
Linux* OS
Intel® Xeon Phi™ Coprocessor support libraries, tools, and
drivers
Linux* OS
PCI-E Bus PCI-E Bus
Intel® Xeon Phi™ Coprocessor communication and application-
launch support
Intel® Xeon Phi™ Coprocessor Host Processor
System-level code System-level code
User-level codeUser-level code
Advantages• Simpler model
• No directives• Easier port
• Good kernel test
ssh or telnetconnection to coprocessor IP
address
Virtual terminal session
Use if• Not serial • Modest memory• Complex code
Target-side “native” application
User code
Standard OS libraries plus any 3rd-party or
Intel libraries
IB fabric
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Flexible: Enables Multiple Programming Models
20
CPU MIC
CPU MIC
Data
MPI
Data
Net
wo
rk
Homogenous network of many-core CPUs
CPU MIC
CPU MIC
Data
MPI
Data
Net
wo
rk
Data
Data
Heterogeneous network of homogeneous CPUs
CPU MIC
CPU MIC
MPI
Offload
Offload
Net
wo
rk
Data
Data
Homogenous network of heterogeneous nodes
Coprocessor only Host+Offload Symmetric
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Click to edit Master text styles
• Second level
– Third level
– Fourth level
– Fifth level
Click to edit Master title style
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Advisor XEVTune Amplifier XEInspector XETrace Analyzer
Code Analysis
Comprehensive set of SW tools for Xeon and Xeon Phi Programing
Intel Cilk PlusThreading Building BlocksOpenMPOpenCLMPIOffload/Native/MYO
Programming Models
Math Kernel LibraryIntegrated Performance Primitives Intel Compilers
Libraries & Compilers
21
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Click to edit Master title style
First Level
• Second level
– Third level
– Fourth level
– Fifth level
INTEL CONFIDENTIAL
22
• Click to edit Master text styles
‒ Second level
Third level
o Fourth level
Fifth level
Click to edit Master title style
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Options for Thread Parallelism
Intel® Math Kernel Library
OpenMP*
Intel® Threading Building Blocks
Intel® Cilk™ Plus
OpenCL*
Pthreads* and other threading libraries Programmer control
Ease of use / code maintainability
Choice of unified programming to target Intel® Xeon® and Intel® Xeon Phi™ Architecture!
22
Click to edit Master title style
23
Introduction
High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software
Intel Xeon Phi Case Studies
Intel Xeon Phi Ecosystem
Conclusions & References
INTEL CONFIDENTIAL24
145X
FASTER
0.46SECONDS
STEP 1.
OPTIMIZE CODE
Parallelize and vectorize code and continue to run on
multi-core Intel Xeon processors
67.097SECONDS
CurrentPerformance
STARTING POINT
Unoptimized serial code running on multi-core
Intel® Xeon® processors
2.3XFASTER
0.197SECONDS
STEP 2.
USE COPROCESSORS
Run all or part of the optimized code on Intel®
Xeon Phi™ coprocessors
The Following Performance Results are Based on Already Optimized Code
SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012
Example: A Two-Step Process with SAXPY
Parallelizing for High Performance
340XFASTER
INTEL CONFIDENTIAL
• Application: Hybrid Monte-Carlo program that simulates lattice QCD with dynamical Wilson fermions. It is one of the main production programs of the QCDSF collaboration (DEISA) and beyond used for quark simulation.
• Status: Many optimizations already in released version; more optimizations and alternative offload model version in development
• Demonstrated Results:
- No source code changes
- Recompiled, selected run-time parameters to get maximum performance
25
Performance Proof-Point: Government and Academic Research
BQCD
“The performance improvement for BQCD using the Intel Xeon Phi coprocessor was reached in record time, requiring only recompilation. We are confident that larger speed-ups can be obtained with modest modifications of the code.”
Prof. Dr. Tilo Wettig
Principal Investigator of the QPACE project
BQCD Scalability Gflops/Sec(Higher is Better)
0
50
100
150
200
250
300
1 2 4 8
SOURCE: INTEL MEASURED MARCH’13
• 2S Intel® Xeon® Processor E5-2670
• Intel® Xeon Phi™ coprocessor–native(pre-production HW/SW)
• 2S Intel Xeon E5-2670 +
Intel® Xeon Phi™ coprocessor–symmetric(pre-production HW/SW)
INTEL CONFIDENTIAL
• Application: Seismic imaging technique used to obtain a subsurface depth image from input seismic data
• Status: See presentation Rice O&G HPC workshop, http://rice2013.og-hpc.org/technical-program
• Execution Model: Fully Hybrid MPI+OpenMP using symmetric mode
– Highly scalable on cluster
• Code Optimization:
– Minimal source code changes for dynamic load balancing
Performance Proof-Point: Energy Industry
CGG: WAVE EQUATION MIGRATION (WEM)
1
2.57
3.57
6.14
0
1
2
3
4
5
6
7
Speedup(Higher is Better)
• 2S Intel® Xeon® processor E5-2670 4 MPI / 4 OMP
• Intel® Xeon Phi™ Coprocessor (pre-production HW/SW) 12 MPI / 20 OMP
• 2S Intel Xeon processor E5-2670 (4/4)
+ Intel® Xeon Phi™ coprocessor (12/20)(pre-production HW/SW)
• 2S Intel Xeon processor E5-2670 (4/4)
+ 2x Intel® Xeon Phi™ coprocessor (12/20 + 12/20) (pre-production HW/SW)
26 SOURCE: ARSLAN ET AL., CGG 2013, MARCH’13
INTEL CONFIDENTIAL
• Application: Monte Carlo algorithms are used to evaluate complex instruments, portfolios, and investments. Performance depends on raw computational power and the performance of exp2()
• Status: Case Study available
• Highlights: Dramatic performance scaling for bothsingle-precision and double-precision calculations
• Demonstrated Results:
- Intel® Xeon Phi™ coprocessor fast exp2() and FMA instructions deliver high performance, high accuracy for single precision computations
- Compiler based loop unrolling delivers high performance
- Cache blocking further optimizes cache utilization, reduces cache misses, and makes outer loop vectorization possible
• Read the Case Study: software.intel.com/en-us/articles/case-
study-achieving-high-performance-on-monte-carlo-european-option-on-intel-xeon-phi
27
Performance Proof-Point: Financial Services
MONTE CARLO EUROPEAN OPTIONS
1 1
10.36
3.34
0
2
4
6
8
10
12
Single Precision
Double Precision
Speedup(Higher is Better)
• 2S Intel® Xeon® processor E5-2670
• 2S Intel Xeon processor E5-2670 +
Intel® Xeon Phi™ Coprocessor (pre-production HW/SW)
SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013
INTEL CONFIDENTIAL
• Application: Weather Research and Forecasting (WRF)
• Status: WRF V3.5 was released 4/18/13
• Code Optimization:
– Approximately two dozen files with less than 2,000 lines of code were modified (out of approximately 700,000 lines of code in about 800 files, all Fortran standard compliant)
– Most modifications improved performance for both the host and the co-processors
• Performance Measurements: Pre release of WRF 3.5 (V3.5Pre) and NCAR supported CONUS2.5KM benchmark (a high resolution weather forecast)
• Acknowledgments: There were many contributors to these results, including the National Renewable Energy Laboratory and The Weather Channel Companies
Performance Proof-Point: Government and Academic Research
WEATHER RESEARCH AND FORECASTING (WRF)
1
1.4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Speedup(Higher is Better)
• 2S Intel® Xeon® processor E5-2670 with
eight-node cluster configuration
• 2S Intel® Xeon® processor E5-2670 +
Intel® Xeon Phi™ coprocessor (pre-production HW/SW)
with eight-node cluster configuration
28 SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013
INTEL CONFIDENTIAL
• Application: Sandia National Laboratories' best approximation to an unstructured implicit finite element or finite volume application in fewer than 8000 lines of code
• Status: available at http://software.sandia.gov/trac/mantevo/browser/trunk/packages
• Demonstrated Results:- Porting was easy using OpenMP- Substituting an Intel MKL routine for the sparse matrix-
vector product accelerated performance and will simplify future optimization
- The Intel MPI Library enables rapid performance improvement when adding an Intel® Xeon Phi™ coprocessor
• Read the Case Study:
29
Performance Proof-Point: Government and Academic Research
SANDIA MANTEVO miniFE
1
2.2
0
0.5
1
1.5
2
2.5
Speedup(Higher is Better)
• 2S Intel® Xeon® processor E5-2670
• 2S Intel Xeon processor E5-2670 +
Intel® Xeon Phi™ coprocessor (pre-production HW/SW)
SOURCE: INTEL MEASURED RESULTS AS OF MARCH, 2012
“The programming models available for the Intel MIC Architecture are open-standard and portable between traditional processors and Intel Xeon Phi coprocessors. This should allow us to leverage code development across multiple platforms.”James A. Ang, Ph.D.Extreme-scale Computing, Sandia National Laboratories
software.intel.com/en-us/articles/running-minife-on-intel-xeon-phi-coprocessors
INTEL CONFIDENTIAL30
DEMONSTRATED PERFORMANCE BENEFITSIntel® Xeon Phi™ Coprocessor
UP TO
2.23X
Acceleware 8th Order Isotropic
Variable Velocity2
Seismic
UP TO
2X
Sandia National Labs MiniFE1
Finite Element Analysis
30
1. 8 node cluster, each node with 2S Xeon* (comparison is cluster performance with and without 1 Xeon Phi* per node) (Hetero)2. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor (unless otherwise noted)3. 2S Xeon* vs. 2S Xeon* + 2 Xeon Phi* (offload)
UP TO
3.54X
China Oil & Gas Geoeast Pre-stack
Time Migration3
SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012
INTEL CONFIDENTIAL31
DEMONSTRATED PERFORMANCE BENEFITSIntel® Xeon Phi™ Coprocessor
UP TO
10.75X
Monte Carlo SP3
Finance
UP TO
2.7X
Jefferson LabLattice QCD
Physics
UP TO 7XBlack-Scholes SP3
31
Notes:1. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor unless otherwise noted)2. Intel Measured Oct. 20123. Includes additional FLOPS from transcendental function unit
SPEED-UP
2.11X
Intel Labs Ray Tracing2
Embree Ray Tracing
SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012
32
Introduction
High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software
Intel Xeon Phi Case Studies
Intel Xeon Phi Ecosystem
Conclusions & References
INTEL CONFIDENTIAL
• System: TACC Stampede is a 10 petaflop supercomputer, one of the largest computing systems in the world for open science research. It became operational on January 7, 2013
• Status: In Service
• Workloads: Runs hundreds of applications for thousands of users around the world
• Performance:
– More than 7 petaflops using Intel® Xeon Phi™ coprocessors1
– More than 2 petaflops using the Intel® Xeon®
processor E5 family1
• More Information:
– SC12 interview: insidehpc.com/2012/12/06/video-intel-xeon-phi-powers-7-tacc-stampede-super/
– TACC HPC systems overview: www.tacc.utexas.edu/resources/hpc
Implementation Proof-Point: Government and Academic Research
Texas Advanced Computing Center (TACC)
33
1 http://www.tacc.utexas.edu/resources/hpc/stampede
INTEL CONFIDENTIAL
System: Located in Southwest China, it contains 16,000 nodes composing the world's largest (public) installation of Intel Ivy Bridge and Xeon Phi’s processors. Each cluster node is formed with
• 2 CPUs hex-core Intel® Xeon® Ivy-Bridge @ 2.2GHz• 3 Intel® Xeon Phi™ cards, each with 57 cores @ 1.1GHz
Performance: Theoretical peak of 54.9 Pflop/s
• 6.8 Pflop/s from 32,000 Xeon Ivy Bridge sockets • 48.1 Pflop/s from 48,000 Xeon Phi cards• for a total of 3,120,000 cores.
30.65 Pflop/s sustained Linpack.
More Information: "Visit to the National University for Defense Technology Changsha, China." Jack Dongarra, University of Tennessee, and Oak Ridge National Laboratory. June 2013. www.netlib.org/utk/people/JackDongarra/PAPERS/tianhe-2-dongarra-report.pdf
Tianhe-2 System: #1 June 2013 Top500 List
34
INTEL CONFIDENTIALOther brands and names are the property of their respective owners.
A Growing Sotware Ecosystem:Developing today on Intel® Xeon Phi™ coprocessors
Shown at SC’12, November 2012
35
36
Introduction
High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software
Intel Xeon Phi Case Studies
Intel Xeon Phi Ecosystem
Conclusions & References
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Click to edit Master text styles
• Second level
– Third level
– Fourth level
– Fifth level
Click to edit Master title style
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Conclusions
Intel® Xeon Phi™ coprocessor advantages:
• Comparable performance potential to other accelerators
• Faster time to solution due to reduced development effort
• Better investment protection with a single code base for processors and coprocessors
Flexible and Wide range of programming models: from pure Native to Offloaded – and all variants between
All with the familiar Intel development environment
37
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Click to edit Master text styles
• Second level
– Third level
– Fourth level
– Fifth level
Click to edit Master title style
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
One Stop Shop for:
Tools & Software Downloads
Getting Started Development Guides
Video Workshops, Tutorials, & Events
Code Samples & Case Studies
Articles, Forums, & Blogs
Associated Product Links
http://software.intel.com/mic-developer
Intel® Xeon Phi™ Coprocessor DeveloperSite: http://software.intel.com/mic-developer
38
Obrigado.
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Click to edit Master text styles
• Second level
– Third level
– Fourth level
– Fifth level
Click to edit Master title style
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Legal Disclaimer & Optimization Notice
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
40
top related