Transcript
Page 1: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Introduction to Introduction to Introduction to Introduction to National Supercomputer center in TianjinNational Supercomputer center in TianjinNational Supercomputer center in TianjinNational Supercomputer center in Tianjin

TH-1A Supercomputer TH-1A Supercomputer TH-1A Supercomputer TH-1A Supercomputer

Page 2: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

AgendaAgendaAgendaAgenda

� National Supercomputer Center in Tianjin( NSCC-TJ)National Supercomputer Center in Tianjin( NSCC-TJ)National Supercomputer Center in Tianjin( NSCC-TJ)National Supercomputer Center in Tianjin( NSCC-TJ)

� TH-1A systemTH-1A systemTH-1A systemTH-1A system� Hardware sub-systemHardware sub-systemHardware sub-systemHardware sub-system� Software sub-systemSoftware sub-systemSoftware sub-systemSoftware sub-system

� ApplicationsApplicationsApplicationsApplications

Page 3: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

NSCC-TJNSCC-TJNSCC-TJNSCC-TJ

�� National National National National National National National National SuperComputerSuperComputerSuperComputerSuperComputerSuperComputerSuperComputerSuperComputerSuperComputer Center in Tianjin Center in Tianjin Center in Tianjin Center in Tianjin Center in Tianjin Center in Tianjin Center in Tianjin Center in Tianjin� Sponsored by Sponsored by Sponsored by Sponsored by

� Chinese Ministry of Science and TechnologyChinese Ministry of Science and TechnologyChinese Ministry of Science and TechnologyChinese Ministry of Science and Technology� Tianjin Tianjin Tianjin Tianjin BinhaiBinhaiBinhaiBinhai New Area New Area New Area New Area

� Public information infrastructurePublic information infrastructurePublic information infrastructurePublic information infrastructure� To accelerate the economy, education and industry of To accelerate the economy, education and industry of To accelerate the economy, education and industry of To accelerate the economy, education and industry of

Northern China Northern China Northern China Northern China � To provide high performance computing service to whole To provide high performance computing service to whole To provide high performance computing service to whole To provide high performance computing service to whole

ChinaChinaChinaChina� Open platform for research and educationOpen platform for research and educationOpen platform for research and educationOpen platform for research and education

Page 4: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

NSCC-TJNSCC-TJNSCC-TJNSCC-TJ

Transformer station & Transformer station & Transformer station & Transformer station & air conditionerair conditionerair conditionerair conditionerTransformer station & Transformer station & Transformer station & Transformer station & air conditionerair conditionerair conditionerair conditioner

Computer room Computer room Computer room Computer room Total area: 2400mTotal area: 2400mTotal area: 2400mTotal area: 2400m2222

Computer room Computer room Computer room Computer room Total area: 2400mTotal area: 2400mTotal area: 2400mTotal area: 2400m2222

Main buildingMain buildingMain buildingMain building

office office office office office office office office

Page 5: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

NSCC-TJNSCC-TJNSCC-TJNSCC-TJ

The first floor of central computing room: 1200mThe first floor of central computing room: 1200mThe first floor of central computing room: 1200mThe first floor of central computing room: 1200m2222 The first floor of central computing room: 1200mThe first floor of central computing room: 1200mThe first floor of central computing room: 1200mThe first floor of central computing room: 1200m2222

Page 6: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

NSCC-TJNSCC-TJNSCC-TJNSCC-TJ

The second floor of central computing room: The second floor of central computing room: The second floor of central computing room: The second floor of central computing room:

Visualization environment, Visualization environment, Visualization environment, Visualization environment, 1200m1200m1200m1200m2222

The second floor of central computing room: The second floor of central computing room: The second floor of central computing room: The second floor of central computing room:

Visualization environment, Visualization environment, Visualization environment, Visualization environment, 1200m1200m1200m1200m2222

Page 7: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

NSCC-TJNSCC-TJNSCC-TJNSCC-TJ

Electric transformer stationElectric transformer stationElectric transformer stationElectric transformer stationElectric transformer stationElectric transformer stationElectric transformer stationElectric transformer station

Page 8: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

2011-6-28 TH-1 8

Cooling water stationCooling water stationCooling water stationCooling water station

NSCC-TJNSCC-TJNSCC-TJNSCC-TJ

Page 9: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

NSCC-TJNSCC-TJNSCC-TJNSCC-TJ

�� Layout of computing roomLayout of computing roomLayout of computing roomLayout of computing roomLayout of computing roomLayout of computing roomLayout of computing roomLayout of computing room

Page 10: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

TH-1A systemTH-1A systemTH-1A systemTH-1A system

Page 11: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

TH-1A systemTH-1A systemTH-1A systemTH-1A system� Enhanced system based on TH-1 system Enhanced system based on TH-1 system Enhanced system based on TH-1 system Enhanced system based on TH-1 system ((((Sep. 2009Sep. 2009Sep. 2009Sep. 2009))))� Installed in NSCC-TJ, Aug. 2010Installed in NSCC-TJ, Aug. 2010Installed in NSCC-TJ, Aug. 2010Installed in NSCC-TJ, Aug. 2010� Debugging and performance testing, Debugging and performance testing, Debugging and performance testing, Debugging and performance testing, Sept.~OctSept.~OctSept.~OctSept.~Oct. 2010. 2010. 2010. 2010� On service, after Nov. 2010On service, after Nov. 2010On service, after Nov. 2010On service, after Nov. 2010

Items Items Items Items Configuration Configuration Configuration Configuration ProcessorsProcessorsProcessorsProcessors 14336 Intel CPUs + 7168 14336 Intel CPUs + 7168 14336 Intel CPUs + 7168 14336 Intel CPUs + 7168 nVIDIAnVIDIAnVIDIAnVIDIA GPUsGPUsGPUsGPUs + 2048FT CPUs + 2048FT CPUs + 2048FT CPUs + 2048FT CPUs

MemoryMemoryMemoryMemory 262TB in total262TB in total262TB in total262TB in total

InterconnectInterconnectInterconnectInterconnect Proprietary high-speed interconnecting networkProprietary high-speed interconnecting networkProprietary high-speed interconnecting networkProprietary high-speed interconnecting network

StorageStorageStorageStorage 2PB2PB2PB2PB

CabinetsCabinetsCabinetsCabinets120 Compute / service Cabinets120 Compute / service Cabinets120 Compute / service Cabinets120 Compute / service Cabinets

14 Storage Cabinets14 Storage Cabinets14 Storage Cabinets14 Storage Cabinets

6 Communication Cabinets6 Communication Cabinets6 Communication Cabinets6 Communication Cabinets

Page 12: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

TH-1A systemTH-1A systemTH-1A systemTH-1A system

� TH-1A System ArchitectureTH-1A System ArchitectureTH-1A System ArchitectureTH-1A System Architecture� Hybrid MPP structure: CPU & GPUHybrid MPP structure: CPU & GPUHybrid MPP structure: CPU & GPUHybrid MPP structure: CPU & GPU� Proprietary compute nodesProprietary compute nodesProprietary compute nodesProprietary compute nodes� Connected by proprietary high-speed interconnect Connected by proprietary high-speed interconnect Connected by proprietary high-speed interconnect Connected by proprietary high-speed interconnect

networknetworknetworknetwork� Global shared parallel storage systemGlobal shared parallel storage systemGlobal shared parallel storage systemGlobal shared parallel storage system� Custom software stackCustom software stackCustom software stackCustom software stack

Page 13: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

TH-1A hardware sub-systemTH-1A hardware sub-systemTH-1A hardware sub-systemTH-1A hardware sub-system

Storage sub-systemStorage sub-system

Compute sub-systemCompute sub-system Service sub-system

Service sub-system

Communication sub-systemCommunication sub-system

CPU+

GPU

CPU+

GPU

CPU+

GPU

CPU+

GPU

CPU+

GPU

CPU+

GPU

Operation node

Operation node

MDSMDS OSSOSS OSSOSS OSSOSSOSSOSS

CPU+

GPU

CPU+

GPU

CPU+

GPU

CPU+

GPU

Operation node

Operation node

Monitor and

diagnosis sub-systemM

onitor and diagnosis sub-system

Page 14: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Compute sub-systemCompute sub-systemCompute sub-systemCompute sub-system

� 7,168 compute nodes7,168 compute nodes7,168 compute nodes7,168 compute nodes� 2 six-core CPU and 1 GPU per node2 six-core CPU and 1 GPU per node2 six-core CPU and 1 GPU per node2 six-core CPU and 1 GPU per node� CPUCPUCPUCPU

�Xeon X5670 (Xeon X5670 (Xeon X5670 (Xeon X5670 (WestmereWestmereWestmereWestmere))))�Processor speed - 2.93GHz Processor speed - 2.93GHz Processor speed - 2.93GHz Processor speed - 2.93GHz

� GPUGPUGPUGPU�NVIDIA Tesla M2050NVIDIA Tesla M2050NVIDIA Tesla M2050NVIDIA Tesla M2050�Connected with CPU by PCI-EConnected with CPU by PCI-EConnected with CPU by PCI-EConnected with CPU by PCI-E

� 32GB memory per node32GB memory per node32GB memory per node32GB memory per node� 2U height2U height2U height2U height� Peak performancePeak performancePeak performancePeak performance

��4,701,061Gflops4,701,061Gflops4,701,061Gflops4,701,061Gflops4,701,061Gflops4,701,061Gflops4,701,061Gflops4,701,061Gflops

Page 15: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Service sub-systemService sub-systemService sub-systemService sub-system

� 1,024 service nodes1,024 service nodes1,024 service nodes1,024 service nodes� 2 eight-core 2 eight-core 2 eight-core 2 eight-core domestic domestic domestic domestic CPUsCPUsCPUsCPUs� CPU: FT-1000CPU: FT-1000CPU: FT-1000CPU: FT-1000

� SoCSoCSoCSoC � 1.0GH1.0GH1.0GH1.0GHzzzz� EEEEight-coreight-coreight-coreight-core, eight-thread per , eight-thread per , eight-thread per , eight-thread per

corecorecorecore� Peak performance 8GflopsPeak performance 8GflopsPeak performance 8GflopsPeak performance 8Gflops

� 32GB memory per node32GB memory per node32GB memory per node32GB memory per node� For login, compile, and applications For login, compile, and applications For login, compile, and applications For login, compile, and applications

need throughput computingneed throughput computingneed throughput computingneed throughput computing

Page 16: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Proprietary interconnection networkProprietary interconnection networkProprietary interconnection networkProprietary interconnection network

� Interconnection signal speed Interconnection signal speed Interconnection signal speed Interconnection signal speed –––– 10Gbps 10Gbps 10Gbps 10Gbps� Bi-directional bandwidth Bi-directional bandwidth Bi-directional bandwidth Bi-directional bandwidth –––– 160Gbps 160Gbps 160Gbps 160Gbps� Hierarchy fat-tree structureHierarchy fat-tree structureHierarchy fat-tree structureHierarchy fat-tree structure

� First stage: 16 nodes connected by 16-port switching boardFirst stage: 16 nodes connected by 16-port switching boardFirst stage: 16 nodes connected by 16-port switching boardFirst stage: 16 nodes connected by 16-port switching board� Second stage: all parts connected to eleven 384-port switchesSecond stage: all parts connected to eleven 384-port switchesSecond stage: all parts connected to eleven 384-port switchesSecond stage: all parts connected to eleven 384-port switches

Page 17: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

� High radix router ASICHigh radix router ASICHigh radix router ASICHigh radix router ASIC::::NRCNRCNRCNRC� Feature size Feature size Feature size Feature size ::::90nm90nm90nm90nm� Die Die Die Die sizesizesizesize::::17.16mm x 17.16mm17.16mm x 17.16mm17.16mm x 17.16mm17.16mm x 17.16mm� PackagePackagePackagePackage::::FC-PBGAFC-PBGAFC-PBGAFC-PBGA� 2577 pins2577 pins2577 pins2577 pins� Throughput of single NRC: 2.56TbpsThroughput of single NRC: 2.56TbpsThroughput of single NRC: 2.56TbpsThroughput of single NRC: 2.56Tbps

� Network interface ASICNetwork interface ASICNetwork interface ASICNetwork interface ASIC::::NICNICNICNIC� Same feature size and package as NRCSame feature size and package as NRCSame feature size and package as NRCSame feature size and package as NRC� Die sizeDie sizeDie sizeDie size::::10.76mm x 10.76mm10.76mm x 10.76mm10.76mm x 10.76mm10.76mm x 10.76mm� 675 pins675 pins675 pins675 pins

Proprietary interconnection networkProprietary interconnection networkProprietary interconnection networkProprietary interconnection network

Page 18: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Leaf switch blade andLeaf switch blade andLeaf switch blade andLeaf switch blade andRoot switch blade of 384-ports switchRoot switch blade of 384-ports switchRoot switch blade of 384-ports switchRoot switch blade of 384-ports switch

Back plane of 384-ports switchBack plane of 384-ports switchBack plane of 384-ports switchBack plane of 384-ports switch about 700mm about 700mm about 700mm about 700mm****600mm600mm600mm600mm

16-port switch board16-port switch board16-port switch board16-port switch board in cabinet in cabinet in cabinet in cabinet

Proprietary interconnection networkProprietary interconnection networkProprietary interconnection networkProprietary interconnection network

Page 19: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

� Switching board and high-radix switchSwitching board and high-radix switchSwitching board and high-radix switchSwitching board and high-radix switch� Based on network interface ASIC and router ASICBased on network interface ASIC and router ASICBased on network interface ASIC and router ASICBased on network interface ASIC and router ASIC

� Reduced user communication protocolReduced user communication protocolReduced user communication protocolReduced user communication protocol� Throughput: 61.44TbpsThroughput: 61.44TbpsThroughput: 61.44TbpsThroughput: 61.44Tbps

Front Front Front Front

two 384-port two 384-port two 384-port two 384-port high-radix switches high-radix switches high-radix switches high-radix switches

BackBackBackBack

Proprietary interconnecting networkProprietary interconnecting networkProprietary interconnecting networkProprietary interconnecting network

Page 20: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Storage sub-systemStorage sub-systemStorage sub-systemStorage sub-system� Capacity: 2 PBCapacity: 2 PBCapacity: 2 PBCapacity: 2 PB� Connected by proprietary interconnection networkConnected by proprietary interconnection networkConnected by proprietary interconnection networkConnected by proprietary interconnection network� LustreLustreLustreLustre based parallel file system based parallel file system based parallel file system based parallel file system

Page 21: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Monitor and diagnosis sub-systemMonitor and diagnosis sub-systemMonitor and diagnosis sub-systemMonitor and diagnosis sub-system

� Real-time monitor hardware Real-time monitor hardware Real-time monitor hardware Real-time monitor hardware parametersparametersparametersparameters

� Precise fault positionPrecise fault positionPrecise fault positionPrecise fault position� Alarm and immediate action Alarm and immediate action Alarm and immediate action Alarm and immediate action

against emergencyagainst emergencyagainst emergencyagainst emergency� Self-feedback cool adjust for Self-feedback cool adjust for Self-feedback cool adjust for Self-feedback cool adjust for

environment statusenvironment statusenvironment statusenvironment status� I2C & JTAG diagnosis I2C & JTAG diagnosis I2C & JTAG diagnosis I2C & JTAG diagnosis

mechanismmechanismmechanismmechanism� Large scale console Large scale console Large scale console Large scale console � Remote monitor and Remote monitor and Remote monitor and Remote monitor and

managementmanagementmanagementmanagement

� Rich monitor & control functions Rich monitor & control functions Rich monitor & control functions Rich monitor & control functions

Page 22: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Computing cabinetComputing cabinetComputing cabinetComputing cabinet� Node: 2 CPUs and 1 GPUNode: 2 CPUs and 1 GPUNode: 2 CPUs and 1 GPUNode: 2 CPUs and 1 GPU� Blade: 2 nodesBlade: 2 nodesBlade: 2 nodesBlade: 2 nodes� FrameFrameFrameFrame

� 8 computing blades8 computing blades8 computing blades8 computing blades� 16-port switching board16-port switching board16-port switching board16-port switching board� 1 monitor and diagnosis board1 monitor and diagnosis board1 monitor and diagnosis board1 monitor and diagnosis board

� CabinetCabinetCabinetCabinet� 4 frames, 64 nodes4 frames, 64 nodes4 frames, 64 nodes4 frames, 64 nodes

� Close-coupled chilled water coolingClose-coupled chilled water coolingClose-coupled chilled water coolingClose-coupled chilled water cooling� 128 CPUs, 64 GPU128 CPUs, 64 GPU128 CPUs, 64 GPU128 CPUs, 64 GPU� 56KW cooling capacity in a cabinet56KW cooling capacity in a cabinet56KW cooling capacity in a cabinet56KW cooling capacity in a cabinet

� FootprintFootprintFootprintFootprint� 700m700m700m700m2222

Page 23: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

TH-1A software sub-systemTH-1A software sub-systemTH-1A software sub-systemTH-1A software sub-system� Software stackSoftware stackSoftware stackSoftware stack

Page 24: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Operating systemOperating systemOperating systemOperating systemOperating systemOperating systemOperating systemOperating system

� KylinKylinKylinKylin Linux Linux Linux Linux� compute node kernelcompute node kernelcompute node kernelcompute node kernel� Provide virtual running environmentProvide virtual running environmentProvide virtual running environmentProvide virtual running environment

� Isolated running environments for different usersIsolated running environments for different usersIsolated running environments for different usersIsolated running environments for different users� Custom software package installationCustom software package installationCustom software package installationCustom software package installation

� QoSQoSQoSQoS support support support support� Power aware computingPower aware computingPower aware computingPower aware computing

Page 25: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Compiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler system

� C, C++, Fortran, JavaC, C++, Fortran, JavaC, C++, Fortran, JavaC, C++, Fortran, Java� OpenMPOpenMPOpenMPOpenMP, MPI, , MPI, , MPI, , MPI, OpenMPOpenMPOpenMPOpenMP/MPI/MPI/MPI/MPI� CUDA, CUDA, CUDA, CUDA, OpenCLOpenCLOpenCLOpenCL� Heterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming framework

� Accelerate the large scale, complex applications, especially Accelerate the large scale, complex applications, especially Accelerate the large scale, complex applications, especially Accelerate the large scale, complex applications, especially for applications in developing status or their full source codes for applications in developing status or their full source codes for applications in developing status or their full source codes for applications in developing status or their full source codes are not availableare not availableare not availableare not available

� Use the computing power of CPUs and Use the computing power of CPUs and Use the computing power of CPUs and Use the computing power of CPUs and GPUsGPUsGPUsGPUs, hide the GPU , hide the GPU , hide the GPU , hide the GPU programming to usersprogramming to usersprogramming to usersprogramming to users� Inter-node homogeneous parallel programming (users)Inter-node homogeneous parallel programming (users)Inter-node homogeneous parallel programming (users)Inter-node homogeneous parallel programming (users)� Intra-node heterogeneous parallel computing (computer Intra-node heterogeneous parallel computing (computer Intra-node heterogeneous parallel computing (computer Intra-node heterogeneous parallel computing (computer

experts)experts)experts)experts)

Page 26: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Compiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler system

� Heterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming framework� Inter-node homogeneous parallel programming (JASMIN)Inter-node homogeneous parallel programming (JASMIN)Inter-node homogeneous parallel programming (JASMIN)Inter-node homogeneous parallel programming (JASMIN)

� Patch-based objects data structuresPatch-based objects data structuresPatch-based objects data structuresPatch-based objects data structures� MPI communication, dynamic load balancing supportMPI communication, dynamic load balancing supportMPI communication, dynamic load balancing supportMPI communication, dynamic load balancing support� Zero-copy optimization in communication libraryZero-copy optimization in communication libraryZero-copy optimization in communication libraryZero-copy optimization in communication library

Page 27: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Compiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler system

� Heterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming framework� Intra-node heterogeneous parallel computingIntra-node heterogeneous parallel computingIntra-node heterogeneous parallel computingIntra-node heterogeneous parallel computing

� Compiler optimized / hand-tuned threaded codeCompiler optimized / hand-tuned threaded codeCompiler optimized / hand-tuned threaded codeCompiler optimized / hand-tuned threaded code� Optimizations includeOptimizations includeOptimizations includeOptimizations include

� Adaptive partitioning, balance the workloads between CPUs and Adaptive partitioning, balance the workloads between CPUs and Adaptive partitioning, balance the workloads between CPUs and Adaptive partitioning, balance the workloads between CPUs and GPUGPUGPUGPU

� Asynchronous data transfer / computing, overlap CPU operations Asynchronous data transfer / computing, overlap CPU operations Asynchronous data transfer / computing, overlap CPU operations Asynchronous data transfer / computing, overlap CPU operations with GPU operationswith GPU operationswith GPU operationswith GPU operations

� Software pipelining, overlap GPU computing with data transfer Software pipelining, overlap GPU computing with data transfer Software pipelining, overlap GPU computing with data transfer Software pipelining, overlap GPU computing with data transfer between host and GPU device memorybetween host and GPU device memorybetween host and GPU device memorybetween host and GPU device memory

� ……………………

Page 28: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Compiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler systemCompiler system� Heterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming frameworkHeterogeneous programming framework

� An example: 3-D short range molecular simulationsAn example: 3-D short range molecular simulationsAn example: 3-D short range molecular simulationsAn example: 3-D short range molecular simulations� For each time stepFor each time stepFor each time stepFor each time step

� Split workload (force calculation) between CPU and GPUSplit workload (force calculation) between CPU and GPUSplit workload (force calculation) between CPU and GPUSplit workload (force calculation) between CPU and GPU� For each patch allocated to GPUFor each patch allocated to GPUFor each patch allocated to GPUFor each patch allocated to GPU

� Start Start Start Start asynchronousasynchronousasynchronousasynchronous operations: transfer the patch data to operations: transfer the patch data to operations: transfer the patch data to operations: transfer the patch data to GPU, compute the patch, get results from GPUGPU, compute the patch, get results from GPUGPU, compute the patch, get results from GPUGPU, compute the patch, get results from GPU

� For each patch allocated to CPUFor each patch allocated to CPUFor each patch allocated to CPUFor each patch allocated to CPU� Launch threads on CPU cores to compute the patchLaunch threads on CPU cores to compute the patchLaunch threads on CPU cores to compute the patchLaunch threads on CPU cores to compute the patch

� CPU CPU CPU CPU waits forwaits forwaits forwaits for GPU completion event GPU completion event GPU completion event GPU completion event� Adjust the split value according to the CPU/GPU performance Adjust the split value according to the CPU/GPU performance Adjust the split value according to the CPU/GPU performance Adjust the split value according to the CPU/GPU performance

(patches per second + empirical )(patches per second + empirical )(patches per second + empirical )(patches per second + empirical )� Other workload (velocity, position) computed on CPUOther workload (velocity, position) computed on CPUOther workload (velocity, position) computed on CPUOther workload (velocity, position) computed on CPU

� Performance: one NVIDIA M2050 GPU is 3 times faster than Performance: one NVIDIA M2050 GPU is 3 times faster than Performance: one NVIDIA M2050 GPU is 3 times faster than Performance: one NVIDIA M2050 GPU is 3 times faster than one Intel X5670 CPUone Intel X5670 CPUone Intel X5670 CPUone Intel X5670 CPU

Page 29: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Programming environmentProgramming environmentProgramming environmentProgramming environmentProgramming environmentProgramming environmentProgramming environmentProgramming environment

� Virtual running environmentsVirtual running environmentsVirtual running environmentsVirtual running environments� Provide services on demandProvide services on demandProvide services on demandProvide services on demand

� Parallel toolkitsParallel toolkitsParallel toolkitsParallel toolkits� Based on EclipseBased on EclipseBased on EclipseBased on Eclipse� To integrate all kinds of tools To integrate all kinds of tools To integrate all kinds of tools To integrate all kinds of tools � Editor, debugger, profilerEditor, debugger, profilerEditor, debugger, profilerEditor, debugger, profiler

� Work flow supportWork flow supportWork flow supportWork flow support� Support Support Support Support QoSQoSQoSQoS negotiate negotiate negotiate negotiate� Reserve resource for future Reserve resource for future Reserve resource for future Reserve resource for future

requirementrequirementrequirementrequirement

Page 30: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Visualization systemVisualization systemVisualization systemVisualization systemVisualization systemVisualization systemVisualization systemVisualization system

� Application area Application area Application area Application area � Numerical weather Numerical weather Numerical weather Numerical weather

forecastforecastforecastforecast� Computational fluid Computational fluid Computational fluid Computational fluid

dynamicsdynamicsdynamicsdynamics� Oil explorationOil explorationOil explorationOil exploration� Other large-scale dataOther large-scale dataOther large-scale dataOther large-scale data

� Computing platformComputing platformComputing platformComputing platform� Tianhe-1ATianhe-1ATianhe-1ATianhe-1A

� Render serverRender serverRender serverRender server� 128 CPU + 64 GPU128 CPU + 64 GPU128 CPU + 64 GPU128 CPU + 64 GPU

� Display deviceDisplay deviceDisplay deviceDisplay device� 3x6 multi-channel 3x6 multi-channel 3x6 multi-channel 3x6 multi-channel

display walldisplay walldisplay walldisplay wall

Page 31: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

ApplicationsApplicationsApplicationsApplications

� Oil explorationOil explorationOil explorationOil exploration� High-end equipment developmentHigh-end equipment developmentHigh-end equipment developmentHigh-end equipment development� BBBBio-medical researchio-medical researchio-medical researchio-medical research� AAAAnimation designnimation designnimation designnimation design� NNNNewewewew energy energy energy energy researchresearchresearchresearch� NNNNewewewew material researchmaterial researchmaterial researchmaterial research� WWWWeathereathereathereather and climate forecastingand climate forecastingand climate forecastingand climate forecasting� Engineering design, simulation and Engineering design, simulation and Engineering design, simulation and Engineering design, simulation and

analysisanalysisanalysisanalysis� Remote sensing data processingRemote sensing data processingRemote sensing data processingRemote sensing data processing� Financial risk analysisFinancial risk analysisFinancial risk analysisFinancial risk analysis

Page 32: Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

ThanksThanksThanksThanksThanksThanksThanksThanks


Top Related