significantly reducing mpi intercommunication latency and …pmat/research/hipeac.pdf · motivation...

Pavlos M. Mattheakis Ioannis Papaefstathiou SIGNIFICANTLY REDUCING MPI INTERCOMMUNICATION LATENCY AND POWER OVERHEAD IN BOTH EMBEDDED AND HPC SYSTEMS

Upload: others

Post on 14-Oct-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Pavlos M. Mattheakis

Ioannis Papaefstathiou

SIGNIFICANTLY REDUCING MPI INTERCOMMUNICATION LATENCY AND POWER OVERHEAD IN BOTH EMBEDDED AND HPC SYSTEMS

Motivation • Number Of Nodes in HPC Systems

Increases Exponentially • IBM 2012

• Asynchronous MPI Messages increase Linearly with the Nodes. • Rainer Keller et al. : "Characteristics of the

Unexpected Message Queue of MPI Applications“, EuroMPI 2010

• MPI Protocol Implementations do not scale • Keith D. Underwood et al. "The Impact of MPI

Queue Usage on Message Latency", ICPP 2004

Page 3: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Background: MPI Asynchronous Commands

3 2 1

Scenario: A node buffers N messages in UMQ. Then executes N receives crossing the whole UMQ each time:

• N(N+1)/2 MPI Message Comparisons

Page 4: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Previous Work • Offloaded MPI Stack on the NIC Processor • Myricom Lanai Z8ES, 2009

• Quadrics Elan4, 2002

• List Manager Accelerator • Accelerating List Management for MPI, CLUSTER 2005

• An MPI Implementation for Multiple Processors Across Multiple FPGAs, FPL 2006

• TCAM-Based Accelerator • A Hardware Acceleration Unit for MPI Queue Processing, IPDPS 2005

• Microcoded Architecture • An architecture to perform NIC based MPI matching, CLUSTER 2007

Page 5: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Contributions

•Novel Scheme for Processing Asynchronous MPI Messages

•Profiled with real world applications

• Implementation in a state-of-the-art CMOS technology

Page 6: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Novel Scheme

1. Hash the posted receives according to the sender

2. Use an order list for messages with wildcards

3. Keep in SRAM the message fields accessed during search

Page 7: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Profiling with real world applications

• Imperas OVPsim Software Used

•OR1K Processors Running a lightweight MPI implementation

•MPI Accelerator modeled in C

Page 8: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Simulation Accuracy

•Average Search Depth of PRQ for the IS benchmark used to evaluate the accuracy of simulation

Page 9: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Implementation Flow

Untimed Functional C

Synthesizable SystemC

Verilog HDL

Netlist

Module Frequency Area (um2) Power (mW)

Accelerator 1GHz 95055 39.08

Page 10: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Results – Brightwell’s Benchmark

• Benchmark introduced in “Implications of application usage characteristics for collective communication offload”, IJHPC 2006.

• Send N messages from Node 1 to Node 0 with tags [0, N-1]. Post N receives at Node 0 with tags [0, N-1].

• Measure the time needed to unload the UMQ at Node 0.

Page 11: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Results – NASPB Suite

• Queue processing speedup for MPI processor with hashing scales with the number of processors

• Requires scaling the number of buckets as well

Page 12: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Conclusions and Future Work

• Novel scheme for reducing the processing time in MPI queues

• Simulated in a Multiprocessor Environment

• Synthesized in a state-of-the-art CMOS technology

• Profiled on real world applications

• Future Directions

• Further measurements

• Not only Benchmark-Style Applications

• Integration of more MPI tasks

Page 13: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

• Special Thanks to

• Dimitrios S. Nikolopoulos, Professor - Queen’s University of Belfast

• Nikos Taburatzis, Msc Student - Technical University of Crete

Questions?

Mitosis (PMAT/IPMAT)raghav/pdfs/bio121/Mitosis.pdf · 2014-11-03 · Mitosis (PMAT/IPMAT) The Cell cycle . Onion root tip after Aceto-Orcein stain . Stages of Mitosis in Onion Root

Slavoj Z˘ IZ˘ EK - NTT InterCommunication Center …084 InterCommunication No.24 Spring 1998 Special Article サイバースペース，あるいは幻想を横断する可能性

20 September 1967 TELETYPEWRITER GROUP · PDF file( l bs) 550 pkgs volume ( cu ft) weight ... (13190). intercommunication ... en n intercommunication set an/gic-17(v)

Linha de Financiamento do BNDES para Modernização da Administração Tributária e da Gestão dos Setores Sociais Básicos - BNDES PMAT e BNDES PMAT Automático

PMAT 2000 Data · 2017-01-27 · data loading, B777 MTF, EFB loading and Honeywell EGPWS data loading. A Complete and Scalable Solution The PMAT 2000 system is delivered as a complete

Catalogue général SONORISATION INTERCOMMUNICATION...• Supervision du chemin critique : lignes de haut-parleurs, amplificateurs, cartes, alimentation, contacts, messages préenregistrés

Users Guide PMAT 2000®

The ERATO Systems Biology Workbench Project: A Simplified Framework for Application Intercommunication Michael Hucka, Andrew Finney, Herbert Sauro, Hamid

E-001 · electrical or mechanical ... intercommunication system amplifier - wall station - line balance intercommunication desk set ... ct current transformer ctr center

- GitHub Pagesriso.github.io/assets/hipeac.pdf · global information sharing increasingly empowers individuals to influence the course of history. Whereas in Poland, the support of

CICS for iSeries Intercommunication V5 - IBM€¦ · Conversation initiation ... viii CICS for iSeries Intercommunication V5, ™ ™ ™ CICS for iSeries Intercommunication. CICS

rMPI: Message Passing on Multicore Processors with On-Chip ...people.csail.mit.edu/jim/temp/hipeac.pdf · and validation vehicle, although the ﬁndings presented are applicable to

สรุปผลโครงกำร - PMAT€¦ · ข้อมูลทั่วไปจำกผลกำรส ำรวจฯ 2 ส ำรวจอัตรำค่ำจ้ำงและสวัสดิกำร

IoT Architecture to Enable Intercommunication … Architecture to Enable Intercommunication Through REST API and UPnP Using IP, ZigBee and Arduino Hiro Gabriel Cerqueira Ferreira,

Platform for Astronomy Tool InterCommunication John Taylor

PPR ABJ Conf - PMAT (NLQ - V4) · • Objective of the PMAT To qualify countries at the appropriate stage along the step‐wise approach for the control and eradication of PPR To

Sound, intercommunication and home automation - · PDF fileSound, intercommunication and home automation Technology for your life cód. 006215 • DISEÑO HEADQUARTERS Electroacústica

HANDOUT PERKULIAHAN PENGEMBANGAN PROGRAM MATEMATIKAutari-sumarmo.dosen.stkipsiliwangi.ac.id/files/...KURIKULUM-PMAT-2… · kan kurikulum matematika 1. ... • Analisis partisipasi

2. 2. apresentação gf pmat - power point - 28.02.13

PMAT Innovation 101 : Achievement Motivation & Improvement

Subject: Applied Mathematics (AMAT) General Course …science.kln.ac.lk/web/syllabus/math2015.pdf · · 2015-02-06AMAT 4208 3 Fluid Dynamics C PMAT 4106 3 PMAT 4209 3 AMAT 42093

Inicio de estudios: Enero 2012 - UAM Iztapalapapmat.izt.uam.mx/pmat/pmat/guia-ingreso-ene12-old.pdf · icano.Mexico,1982. 2 Universidad Autónoma Metropolitana Iztapalapa Posgrado

Amphenol Interconnection Products - Mercado Ideal INTERCONNECTIO… · Amphenol Interconnection Products Today’s Broadest Range of Interconnection ... PMAT (ARINC 644), Aquacon,

EU Regulation for Intercommunication And Roaming Lasse Rautopuro Helsinki University of Technology 10.11.2004

CICS Intercommunication Guide - ibm.com€¦ · CICS Intercommunication Guide Version 3 Release 1 ... Sysplex hardware and software requirements ... Defining single-session APPC terminals

ICC NTT InterCommunication Center Innovative cultural facility 東京新宿 Gallery, Shop, Library, Café Journal 「 InterCommunication 」 Lobby Library

Intercommunication Between Two MyPBX(VoIP Trunking)

Servicios.universia.cl Mde Ensayos Pmat p123 Pmat

PMAT 2000 Data - AvionTEq · Teledyne's Portable Maintenance Access Terminal 2000 (PMAT 2000®) is a rugged flightline maintenance support tool that performs the following functions:

Platform Management Component Intercommunication (PMCI) Architecture ...€¦ · 3 Date: 2019-09-25 4 Version: 2.0.0 5 Platform Management Component 6 Intercommunication (PMCI) Architecture

Podstawy matematyki dla informatyków - mimuw.edu.plurzy/Pmat/pomat.pdf · 27 września 2017, godzina 13:51 strona 3 stwierdzazwiązekprzyczynowo-skutkowy,aponadtofaktycznezajściewymienionychzdarzeń,

NFS - Todo para la industria aeronáutica · (PMAT 2000TM) ACARS CMU Broadband SATCOM AirLANTM WiFi/Cellular TWLU Wireless GroundLink

Présentation par Stéphan Mongeon. Fondée en 1975 Alarme Incendie Intercommunication et appel de garde Caméras et accès 13 employés

Molecular Intercommunication between the Complement and

CONSULTA PMAT - Rio Grande do Sul