nvidia’s experience with open64

14

NVIDIA’s Experience with Open64 Mike Murphy NVIDIA

Upload: sanjiv

Post on 12-Feb-2016

79 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

NVIDIA’s Experience with Open64. Mike Murphy NVIDIA. Outline. Why Open64 How we use Open64 What we did to Open64 Future work in Open64. Compiling CUDA for GPUs. C/C++ CUDA Application. NVCC. GPU Code. GPU Code. CPU Code. executable. Why Open64. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NVIDIA’s Experience with Open64

NVIDIA’s Experience with Open64

Mike Murphy

NVIDIA

Page 2: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Outline

Why Open64How we use Open64 What we did to Open64Future work in Open64

Page 3: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Compiling CUDA for GPUs

NVCC

C/C++ CUDAApplication

GPU Code CPU CodeGPU Code

executable

Page 4: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Why Open64

We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes.

own gcc open64

Page 5: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Why Open64

We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes.

own gcc open64

take too long

Page 6: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Why Open64

We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes.

own gcc open64

take too long good long-term support

Page 7: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Why Open64

We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes.

own gcc open64

take too long good long-term support

best performance

(kudos to PathScale)

Page 8: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

NVCC processing of GPU codecudafe

C code for GPU

nvopencc (Open64)

ptx

OCG

object code

Page 9: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Changes: Rehosting Open64

Our compiler has to run on 32 & 64bit Linux, 32 & 64bit Windows, and Mac OS.Main Open64 source tree is only for Linux.

This is an area where sharing our changes can help grow the user base by making it easier to port Open64.

For Windows we build using Cygwin’s MINGW

Page 10: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Changes: Memory and registers

We don’t have a stack or fast memoryTherefore want to keep data in registersInline everything and optimize as much as possibleTry to keep small structs in registers by expanding struct copies into field copies (versus taking address and generating loop to do byte copy)

Page 11: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Changes: Vector loads and stores

Coalesce adjacent loads and stores for performanceDo this in CG:

Iterate through ops, trying to add to vectorsCheck for intervening killsChange alignment and use dummy regs for padding if helps to create wider vector (e.g. may use 4-word vector for 3-word struct).

Page 12: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Changes: 16bit optimization

Cheaper to use 16bit registers and operationsBut C converts shorts to int.So add pass in CG that converts back to 16bit:

Mark 16bit loads, stores, and convertsPropagate 16bit-ness forwards and backwardsUnmark 16bit-ness if cannot be 16bitChange remaining registers and instructions to be 16bit.

Page 13: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Future work

1 person -> 4 people working with Open64New application TBAMerging changes into trunk

Thanks to Sun Chan and Shin!Investigating register pressure in WOPT

Want better control of register pressure during optimization

Investigating using other features (LNO, IPA, etc)

Page 14: NVIDIA’s Experience with Open64

© NVIDIA Corporation 2008

Questions?

http://www.nvidia.com/CUDA

[email protected]

High Performance Molecular Simulation, Visualization, and ...Fermi GPUs Bring Higher Performance and Easier Programming • NVIDIA’s latest “Fermi” GPUs bring: – Greatly increased

New GPU Features of NVIDIA’s Maxwell Architecturedeveloper.download.nvidia.com/assets/events/GDC15/... · Holger Gruen, New GPU Features of NVIDIA’s Maxwell Architecture 11:00

Nvopencc tutorial1 Tutorial on NVIDIA’s Open64 Sources by Mike Murphy 11/06

Nvidia’s tegra line of processors for mobile devices2 2

17 GPU computationcs5610/lectures/17_GPU_computation 2008.pdf · – NVIDIA’s CGC.exe • Compiles to everything • Generates NV preferred assembly by default cgc -profile ps_2_0

Open64 Developers Forum 2010 OpenCL Compiler Support Based on Open64 for MPUs+GPUs Yu-Te Lin, Chung-Ju Wu Chia-Han Lu, Shao-Chung Wang Jenq-Kuen Lee Department

MAGMA - National Institute for Computational Sciences · • AMD’s Fusion • Nvidia’s ... • 4 precisions (single, double, single complex, double complex) • 3 mixed precision

A SoC Simulator the newest component in Open64 Wendong Wang, Tony Tuo, Kevin Lo Dongchen Ren, Gary Hau, Jun zhang, Dong Huang SimpLight Nanoelectronics

Retarget Open64 with an Object-Oriented ADL

AMD’S X86 OPEN64 COMPILERdeveloper.amd.com/wordpress/media/2013/06/2902_1_final.pdf · 2013-10-24 · 4 | AMD’s x86 Open64 Compiler | June 2011 BRIEF HISTORY Started as SGI ®

SLI - Nvidiahttp.download.nvidia.com/developer/presentations/2005/GDC...SLI: Scalable Link Interface • NVIDIA’s scalable graphics solution • Uses multiple PCI-Express GPUs in

OpenCL Compiler Support Based on Open64 for MPUs+GPUs

Multi-Layer Perceptron - CAE Usershomepages.cae.wisc.edu/~ece539/fall08/project/Finley_rpt.pdf · 1 Multi-Layer Perceptron Project Report1 ... A GPU-only implementation using nVidia’s

Technical Brief - NVIDIA...Introduction In this technical brief we introduce NVIDIA’s new GeForce® GTX 200 GPU family, the first GPUs to implement NVIDIA’s second-generation unified

“What and how can the Open64 community collaborate more closely?” - our experiences and ideas

Retargeting Open64 to A RISC processor -- A Student’s Perspective

OKL: A Uniﬁed Language for Parallel Architecturesdl.icdst.org/pdfs/files/01bd1857d0a929d28ef66b7792e57289.pdf · 2019-04-16 · CUDA NVIDIA’s proprietary language for programming

NVIDIA’S SOFTWARE ECOSYSTEM FOR HPC

NVIDIA GRID · GRID NVIDIA technology that is a combination of both hardware and software to deliver the ultimate virtualized experience. GRID K1 NVIDIA’s GRID K1 is a graphics

Tsinghua 1/60 Retarget Open64 to an Embedded CPU A practice for automatic approach SS&SE Group (System Software & Software Engineering ) Department of

Loop Induction Variable Canonicalization. Motivation Background: Open64 Compilation Scheme Loop Induction Variable Canonicalization Project Tracing and

NVIDIA’s Experience with Open64 Mike Murphy NVIDIA

Working and Researching on Open64

my name is Lars ishop, and I’m an engineer in NVIDIA’s

Wilf LaLonde ©2012 Comp 4501 95.4501 Collision Detection Via Nvidia’s PhysX

An Open64-based Framework for Analyzing Parallel Applications

HOT CHIPS 2014 NVIDIA’S DENVER PROCESSOR Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman HOT CHIPS 2014 NVIDIA’S DENVER PROCESSOR ... 3

GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Using Open64 for High Performance Computing on a GPU

NVIDIA GRID · virtualized experience. GRID K1 NVIDIA’s GRID K1 is a graphics card designed specifically for virtualization use cases. It carries four Kepler GPUs and a total of

AN OVERVIEW OF NVIDIA’S AUTONOMOUS VEHICLES PLATFORM · AN OVERVIEW OF NVIDIA’S AUTONOMOUS VEHICLES PLATFORM. 2 AGENDA ... Environment 52,000 2% ±1.3% ... Gigabit Ethernet 10

OpenUH: An Open Source OpenACC Compileron-demand.gputechconf.com/gtc/2014/presentations/S... · Open Source Research Compiler Open64 based Support C/C++/Fortran/Coarray • Parallel

NVIDIA’S TEGRA K1 SYSTEM-ON-CHIP€¦ · NVIDIA’S TEGRA K1 SYSTEM-ON-CHIP . Tegra K1 Battery Saver Core 2x ISP ARM7 2160p30 VIDEO ENCODER 2160p30 VIDEO DECODER AUDIO USB 3.0 SECURITY

University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

RISC-V in NVIDIA NVIDIA’s proprietary RISC Falcon = FAst Logic CONtroller General purpose embedded processor Design started in ~2005; production ~2007