fujitsu 'phi' turnkey solution system gpgpu and xeon phi ... management of cluster...

13
Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig FUJITSU “PHI” Turnkey Solution Copyright 2014 FUJITSU

Upload: vuongcong

Post on 08-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

Integrated ready to use XEON-PHI based platform

Dr. Pierre Lagier

ISC2014 - Leipzig

FUJITSU “PHI” Turnkey Solution

Copyright 2014 FUJITSU

Page 2: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

PHI Turnkey Solution challenges

System performance challenges

Parallel IO best architecture design and fine system tuning, includes the integration of SSDs technology

Known bottlenecks on application performance, like PCI bus and host sockets relationship

Application challenges

Hybrid programming model and related performance issues (latency, synchronization overhead, MPI sustained bandwidth between PHI boards) must be addressed in parallel with system performance challenges

How end users will benefit from using a Web portal (PRIMERGY Gateway) to hide the heterogeneity and related issues ?

Environment challenges

Full integration of all software components with the Cluster Deployment Manager tool.

Copyright 2014 FUJITSU

Page 3: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

FUJITSU “PHI” CLUSTER

Copyright 2014 FUJITSU

Page 4: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

The “PHI” Cluster RX350/S7

Network

User files

Parallel

File

System

Gig

aB

it S

wit

ch

Tru

eS

cale

Sw

itch

Copyright 2014 FUJITSU

Page 5: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

PHI Cluster: Network RX350/S7

Network

Ethernet and IB Network

• Single GigaBit switch

• Same shared subnet bridging XEON-PHI and all compute nodes

• Only one cluster of heterogeneous compute nodes, XEON-PHI and Dual Ivy-Bridge

• TrueScale IB switch

• Dual rail IB per compute node

Tru

eS

cale

Sw

itch

Gig

aB

it S

wit

ch

Copyright 2014 FUJITSU

Page 6: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

PHI Cluster: File Systems

6

RX350/S7

Efficiency Driven IOs

• HOME file system

• On login node

• NFS mounted on the compute nodes

• Local scratch

• On each SB node

• Mounted on connected XP node

• Parallel file system (FHGFS)

• Integrated to SB nodes with one Intel SSD per node (3TB global storage capacity over 8 nodes minimal configuration)

• Each SB node is MDS/DS/Client

• Each XP node is client

• Simple policies: local prefered MDS and DS from client, striping factor of 8 over all SB nodes

/fhgfs

/fhgfs

/fhgfs

/fhgfs

/fhgfs

/fhgfs

/fhgfs

/fhgfs

Copyright 2014 FUJITSU

Page 7: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

FUJTISU SOFTWARE STACK

Copyright 2014 FUJITSU

Page 8: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

FUJITSU HPC SW Stack

A mature software stack includes specific software for:

Deploying nodes and managing software packages

A workload manager for job and resource management

Parallel execution environment with libraries

Tools for application development (as needed)

Storage options (NFS, PFS)

These HPC software layers are always the same

Variety exists only in the actual components used

Parallel environment

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

RedHat Linux CentOS

OS Drivers

Fujitsu SW Stack coverage

Copyright 2014 FUJITSU

Page 9: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

Fondation for application solution development

PRIMERGY HPC Gateway

PRIMERGY HPC Gateway is the user interface component of the FUJITSU Software HPC Cluster Suite

Intuitive web environment incorporating application workflows, direct simulation monitoring, data access and collaboration

Value proposition based on simplifying HPC end-use and integrating application expertise, to tune business processes and better manage projects

The Gateway delivers additional value by simplifying HPC usage – shipping since 05/2013

Copyright 2014 FUJITSU

Page 10: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

THE “GROMACS” PLATFORM

Copyright 2014 FUJITSU

Page 11: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

The GROMACS Environment

Based on GROMACS 5.0

Verlet cutoff scheme tuned for the xeon-phi

Running native on XEON-PHI with Fujitsu OpenMP tuning

MPI tuning still progressing

Fully integrated to the HPC Gateway

Transparent run on front node or XEON-PHI depending of the tools used as wel as the cutoff scheme (verlet on xeon-phi)

Integration of key components of GROMACS (grompp, mdrun,…) with form based parameter control

Real time display of energies at run time

True improvement

Protein-Water test case (dhfr)

1.6 times faster on XEON-PHI than dual socket Sandy-Bridge

Copyright 2014 FUJITSU

Page 12: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution

GROMACS GUI

Copyright 2014 FUJITSU

Page 13: FUJITSU 'PHI' Turnkey Solution System GPGPU and XEON Phi ... Management of cluster resources Manage serial and parallel jobs Fair share usage between ... FUJITSU "PHI" Turnkey Solution