cuda programming within mathematica

17
CUDA Programming within Mathematica Wolfram White Paper ®

Upload: eugen-anytzash

Post on 23-Oct-2014

254 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: CUDA Programming Within Mathematica

CUDA Programmingwithin Mathematica

Wolfram White Paper

®

Page 2: CUDA Programming Within Mathematica
Page 3: CUDA Programming Within Mathematica

IntroductionCUDA is short for Common Unified Device Architecture. NVIDIA developed the language to program graphics cards and let you run C-like code on the Graphical Processing Unit (GPU). CUDApromises significant performance increases, and CUDA-designed programs will scale to multicoresystems.

Considering CUDA’s advanced technology, you may expect its programming to be enormouslycomplicated. Enter Mathematica, the easiest way to program for CUDA and unlock GPU perfor-mance potential. Unlike programming in C or developing CUDA wrapper code, now you don’thave to be a programming wizard to use CUDA.

Mathematica offers an intuitive environment—even featuring built-in ready-to-use examples forcommon application areas, such as image processing, medical imaging, statistics, and finance—that makes CUDA programming a breeze, even if you’ve never used Mathematica before.

And if you have used Mathematica, you’ll be amazed by the massive boost in computationalpower, as well as application performance, enhancing speed by factors easily exceeding 100.

In this document, we describe the benefits of CUDA integration in Mathematica and providesome applications for which it is suitable, such as image processing and financial engineering.

Motivations for CUDACUDA is different from regular C code, which runs on the Central Processing Unit (CPU). In CUDAterminology, the device is the GPU, while the host is the CPU.

There are two motivations for using CUDA. The first is the promise of significant speed, while theother is the promise that a CUDA-designed program will scale to multicore systems. With CPUmanufacturers indicating that hundred- and thousand-core systems will be included in userdesktops, the CUDA programming paradigm, which scales well to thousands of CPU cores,becomes more important. GPUs currently have thousands of cores, and correct formulation of theproblem and memory usage can give you a significant performance boost.

2 CUDA Programming within Mathematica

Page 4: CUDA Programming Within Mathematica

A Brief Introduction to MathematicaMathematica has been one of the most powerful languages for technical computing for morethan 20 years. Wolfram Research introduces fully integrated GPU programming capabilities inMathematica, which brings a whole new meaning to high-performance computing. For CUDAdevelopers, the new integration means unlimited access to Mathematica’s vast computingabilities.

Mathematica is a sophisticated development environment that combines a flexible programminglanguage with a wide range of symbolic and numeric computational capabilities, production ofhigh-quality visualizations, built-in application area packages, and a range of immediate deploy-ment options. With direct integration of dynamic libraries, instant interface construction, andautomatic C code generation and linking, Mathematica provides the most sophisticated build-to-deploy environment in the market.

Some highlights of Mathematica’s features include:

Full-featured, unified development environment

Through its unique interface and integrated features for computation, development, and deploy-ment, Mathematica provides a streamlined workflow.

Unified data representation

At the core of Mathematica is the foundational idea that everything—data, programs, formulas,graphics, documents—can be represented as symbolic entities, called expressions. This unifiedrepresentation makes Mathematica’s language and functions extremely flexible, streamlined, andconsistent.

Multiparadigm programming language

Mathematica provides its own highly declarative functional language, as well as several differentprogramming paradigms, such as procedural and rule-based programming. Programmers canchoose their own style for writing code with minimal effort. Along with comprehensive documen-tation and resources, Mathematica’s flexibility greatly reduces the cost of entry for new users.

Unique symbolic-numeric hybrid system

The fundamental principle of Mathematica is full integration of symbolic and numeric computingcapabilities. Through its full automation and preprocessing mechanisms, users can enjoy the fullpower of a hybrid computing system without the knowledge of specific methodologies andalgorithms.

Extensive scientific and technical area coverage

Mathematica provides thousands of built-in functions and packages that cover a broad range ofscientific and technical computing areas, such as statistics, control systems, data visualization, andimage processing. All functions are carefully designed and tightly integrated with the system,which enables users to break the barriers between specialized areas and explore new possibilities.

CUDA Programming within Mathematica 3

Page 5: CUDA Programming Within Mathematica

Built-in high-performance computing

Mathematica fully supports multi-threaded, multicore processors without extra packages. Manyfunctions automatically utilize the power of multicore processors, and built-in generic parallelfunctions make high-performance programming a simple process.

Easy data access and connectivity

Mathematica natively supports hundreds of formats for importing and exporting, as well asunlimited access to data from Wolfram|Alpha®, Wolfram Research’s computational knowledgeengine’. It also provides APIs for accessing common database and programming languages, suchas SQL, C/C++, and .NET.

Platform-dependent deployment options

Through its interactive notebooks, Mathematica Player’, and browser plug-ins, Mathematicaprovides a wide range of options for deployment. Built-in code generation functionality can beused to create standalone programs for independent distribution.

Benefits of CUDA Integration in Mathematica For users who want to tap into the power of GPU computing, CUDA integration in Mathematicaprovides benefits in both development and performance. The full integration and automation ofMathematica’s CUDA capability means a more productive and efficient development cycle. Inaddition, it brings unprecedented levels of performance improvements without extra develop-ment time and cost.

Simplified Development Cycle

Automation of development project management

Like many other development frameworks, programming with CUDA may require programmersto manage project setup, platform dependencies, and device configuration. CUDA integration inMathematica makes the process completely transparent and fully automated. Users do not haveto set up any project files or worry about platform dependency.

4 CUDA Programming within Mathematica

Page 6: CUDA Programming Within Mathematica

Automated GPU memory and thread management

In a typical CUDA program, programmers should write memory and thread management codemanually, in addition to a CUDA kernel function:

With Mathematica, memory and thread management for the GPU is completely automatic, whichcompletely abolishes the need for extra code writing:

The memory between the host and the device is synchronized only as needed, to avoid datatransfer (which is known to be very costly) unless absolutely needed. For advanced applications,full control of how and when the memory needs to be copied between the host and GPU devicesis provided.

Automatic compilation and binding of dynamic libraries

Mathematica’s CUDA support streamlines the whole programming process. It goes one stepfurther by compiling and establishing the connection with Mathematica in a completely transpar-ent process, thereby greatly simplifying the development process for the end user. With a singlecommand, your CUDA kernel code is compiled, linked, and bound to Mathematica, and is readyto use.

CUDA integration provides full access to Mathematica’s native language and built-in functions. Italso provides free exchange of data between Mathematica and users’ CUDA programs.

With Mathematica’s comprehensive symbolic and numerical functions, built-in application areasupport, and graphical interface building functions, users can not only combine the power ofMathematica and GPU computing, but also spend more time on developing and optimizing coreCUDA kernel algorithms.

CUDA Programming within Mathematica 5

Page 7: CUDA Programming Within Mathematica

With Mathematica’s comprehensive symbolic and numerical functions, built-in application areasupport, and graphical interface building functions, users can not only combine the power ofMathematica and GPU computing, but also spend more time on developing and optimizing coreCUDA kernel algorithms.

Ready-to-use applications

CUDA integration in Mathematica provides several ready-to-use CUDA functions that cover abroad range of topics such as mathematics, image processing, financial engineering, and more.Examples will be given in Section 6.

Zero device configuration

As long as NVIDIA’s CUDA Toolkit is installed on the development system, there is nothing forusers to configure or set up to utilize the GPU computing power. Mathematica automaticallyfinds, configures, and makes CUDA devices available to the users.

Performance ImprovementsGPUs have been using many cores for quite some time now. A current example is the NVIDIAGeForce GTX 480, which is a consumer card with 480 CUDA cores, support for double precision,and a retail price of less than $500.

Mathematica provides support for multicore systems via built-in functions such as Parallelize,ParallelMap, etc., that free you from the cumbersome details of launching multiple kernels andprocessors and let you concentrate on the core implementation of your algorithms.

This provides considerable speed up in many areas such as:

† On a NVIDIA GT 240 with an Intel i7 950, FinancialDerivative improvements of up to 35x for binomial and 80x for Monte Carlo methods.

† Several random number generation algorithms have been implemented and speed ups of 50x have been observed.

† Volumetric rendering, implicit algebraic functions, and 3D fractals can be computed in real time, with a high frame rate.

Mathematicaʼs CUDALink: Integrated GPU ProgrammingCUDALink is a built-in Mathematica package that provides a simple and powerful interface forusing CUDA within Mathematica’s streamlined workflow.

CUDALink provides you with carefully tuned linear algebra, discrete Fourier transform, andimage processing algorithms. You can also write your own CUDALink modules with minimaleffort. Using CUDALink from within Mathematica gives you access to Mathematica’s features,including visualization, import/export, and programming capabilities.

6 CUDA Programming within Mathematica

Page 8: CUDA Programming Within Mathematica

The CUDALink package included with Mathematica at no additional cost offers:

† Automatic compilation of CUDA programs

† Automatic memory handling between host and GPU devices

† The ability to write your own CUDALink modules with little effort

† Support for Microsoft Visual Studio and GCC compilers

† Multiple-GPU support

† Support for single and double arithmetic precision

† Access to Mathematica’s flexible programming language, automatic interface builders, and full-featured development environment

† Access to Mathematica’s computable data, import/export capabilities, visualization features, and more

† Ready-to-use functionality with zero configuration needed if current video drivers and compilers are already installed

† Support for 32-bit and 64-bit Windows, Mac OS X, and Linux systems

System Requirements

To utilize Mathematica’s CUDALink, the following operating systems and software are required:

† Operating System: Windows, Linux, and Mac OS X, both 32- and 64-bit architecture. Mac OS X users need at least Mac OS X 10.6.3.

† NVIDIA CUDA enabled products. See www.nvidia.com/object/cuda_gpus.html for further information.

† Mathematica 8.0 or higher

† CUDA Toolkit 3.1 or higher

† A recent NVIDIA driver

† On Windows, Microsoft Visual Studio (Windows platforms) 2005 or higher

Getting Started with CUDALink

Programming the GPU in Mathematica is straightforward. It begins with loading the CUDALinkpackage into Mathematica:

Needs@"CUDALink`"D

The following function verifies that the system has CUDA support:

CUDAQ@D

True

Now, we will create a simple example that negates colors of a 3-channel image.

First, write a CUDA kernel function as a string, and assign it to a variable:

kernel = "__global__ void cudaColorNegateHint *img, int *dim, int channelsL 8

int width = dim@0D, height = dim@1D;int xIndex = threadIdx.x + blockIdx.x * blockDim.x;int yIndex = threadIdx.y + blockIdx.y * blockDim.y;int index = channels * HxIndex + yIndex*widthL;if HxIndex < width && yIndex < heightL 8

for Hint c = 0; c < channels; c++Limg@index + cD = 255 - img@index + cD;<<";

CUDA Programming within Mathematica 7

Page 9: CUDA Programming Within Mathematica

Pass that string to a built-in function CUDAFunctionLoad, along with the kernel function nameand the argument specification. The last argument denotes the dimension of threads per block tobe launched.

colorNegate = CUDAFunctionLoad@kernel, "cudaColorNegate",88_Integer, _, "InputOutput"<,8_Integer, _, "Input"<, _Integer<, 816, 16<D;

Several things are happening at this stage. Mathematica automatically compiles the kernelfunction as a dynamic library. There is no need for users to add system interface or memorymanagement code. After compilation, the function is automatically bound to Mathematica and isready to be called.

Now you can apply this new CUDA function to any image format that Mathematica can handle.

i = ;

colorNegate@i, ImageDimensions@iD, ImageChannels@iDD

: >

Finally, CUDAFunctionUnload can be called to unbind the function from Mathematica.

CUDAFunctionUnload@colorNegateD

Accessing System Information

CUDALink supplies several functions that make it easy to acquire detailed system information forGPU programming.

For instance, CUDAQ tells whether the current hardware and system configuration supportCUDALink:

Needs@"CUDALink`"DCUDAQ@D

True

8 CUDA Programming within Mathematica

Page 10: CUDA Programming Within Mathematica

CUDAInformation generates a detailed report on supported CUDA devices. The returned datafrom CUDAInformation is a valid Mathematica input form, which means that it can be used tooptimize CUDA kernel code programmatically. Several other functions are also provided thatreturn in-depth information about Mathematica, operating systems, hardware, and C/C++compilers that are currently used by CUDALink.

Example of a report generated by CUDAInformation.

Retargetable Code Generation

CUDALink provides you with the ability to perform on-the-fly compilation and execution, or tocompile executables or libraries to be used later. CUDALink’s portable simple compiling mecha-nism for CUDA code uses the NVCC compiler. Code can be compiled into an executable or anexternal library for reuse.

CUDA Programming within Mathematica 9

Page 11: CUDA Programming Within Mathematica

Mathematica also provides SymbolicC, which provides a hierarchical view of C code as Mathemati-ca’s own language. This makes it well suited to creating, manipulating, and optimizing C code. Inconjunction with this capability, users can generate CUDA kernel code for several differenttargets for greater portability, less platform dependency, and better code optimization.

Several built-in functions perform code generation, depending on the target:

SymbolicC

CUDACodeGenerate takes a CUDA kernel function or program and generates SymbolicC output.The SymbolicC output can then be used to render CUDA code that calls the correct functions tobind the CUDA code to Mathematica.

CUDASymbolicCGenerate produces SymbolicC output for CUDA kernel as well as C++ wrappercode.

Dynamic library

CUDALibraryGenerate generates CUDA interface code and compiles it into a library that can beloaded into Mathematica.

CUDALibraryFunctionGenerate takes a CUDA kernel function and generates the code requiredto use it. Then it takes the sources, compiles it to a dynamic library, and returns Mathematica’slibrary function, which can be called with Mathematica’s automatic dynamic library bindingcapability.

String

CUDACodeStringGenerate generates CUDA interface code in string form, which can be exportedto other development platforms.

Integration with Mathematica Functions

When you use CUDALink, you can use all of Mathematica’s features including visualization,import/export, and programming capabilities. Combining Mathematica’s full-featured develop-ment environment and CUDA integration, you can focus on innovating your algorithms in CUDAkernels, rather than spending time on repetitive tasks, such as interface building.

Manipulate: Mathematica’s automatic interface generator

Mathematica provides extensive built-in interface functions including both standard andadvanced controls. Users can also customize controls using Mathematica’s highly declarativeinterface language. Furthermore, Mathematica provides a fully automated interface generatingfunction called Manipulate.

10 CUDA Programming within Mathematica

Page 12: CUDA Programming Within Mathematica

ManipulateBoperationB , xF,

8x, 0, 9<, 8operation, 8CUDAErosion, CUDADilation<<F

By specifying possible ranges for variables, Manipulate automatically chooses appropriatecontrols and creates a user interface around it.

Example of a user interface built with Manipulate.

Support for import and export

Mathematica natively supports hundreds of file formats and their subformats for importing andexporting. Supported formats include: common image formats (JPEG, PNG, TIFF, BMP, etc.), videoformats (AVI, MOV, H264, etc.), audio formats (WAV, AU, AIFF, FLAC, etc.), medical imagingformats (DICOM), data formats (Excel, CSV, MAT, etc.), and various raw formats for furtherprocessing.

Any supported data formats will be automatically converted to Mathematica’s unified datarepresentation, or an expression, which can be used in all Mathematica functions, includingCUDALink functions.

CUDA Programming within Mathematica 11

Page 13: CUDA Programming Within Mathematica

Not only does Mathematica provide access to local resources, but any URL can be used to accessdata online. The following code imports an image from a given URL:

image = Import@"http:êêgallery.wolfram.comê2dêpopupê00_contourMosaic.pop.jpg"D;

The function Import automatically recognizes the file format and converts it into Mathematicaexpression. This can be directly used by CUDALink functions, such as CUDAImageAdd:

output = CUDAImageAddBimage, F

All outputs from Mathematica functions, including the ones from CUDALink functions, are alsoexpressions, and can be easily exported to one of the supported formats using the Exportfunction. For example, the following code exports CUDA output into PNG format:

Export@"masked.png", outputD

masked.png

$ImportFormats and $ExportFormats give full lists of supported import and export formats.

OpenCL Compatibility

In addition to CUDALink, Mathematica supports OpenCL with the built-in package OpenCLLink,which provides the same benefits and functionality of GPU programming as CUDALink overOpenCL architecture.

Mathematicaʼs CUDALink ApplicationsIn addition to support for user-defined CUDA functions and automatic compilation, CUDALinkincludes several ready-to-use functions that support NVIDIA’s CUFFT (Fourier analysis) andCUBLAS (linear algebra) libraries and cover application areas such as image processing, financialderivatives, and random number generation.

Image Processing

CUDALink offers many image processing functions that have been carefully tuned for the GPU.These include pixel operations such as image arithmetic and composition; morphological opera-tors such as such as erosion, dilation, opening, and closing; and image convolution and filtering.All of these operations work on either images or arrays of real and integer numbers.

12 CUDA Programming within Mathematica

Page 14: CUDA Programming Within Mathematica

Image convolution

CUDALink’s convolution is similar to Mathematica’s ListConvolve and ImageConvolve func-tions. It will operate on images, lists, or CUDA memory references, and it can use Mathematica’sbuilt-in filters as the kernel.

CUDAImageConvolveB ,-1 0 1-2 0 2-1 0 1

F

Convolving a microscopic image with a Sobel mask to detect edges.

CUDALink supports simple pixel operations on one or two images, such as adding or multiplyingpixel values from two images.

CUDAImageMultiplyB , F

Multiplication of two images.

Morphological operations

CUDALink supports fundamental operations such as erosion, dilation, opening, and closing.CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathemati-ca’s built-in Erosion, Dilation , Opening, and Closing functions. More sophisticated morpho-logical operations can be built using these fundamental operations.

CUDA Programming within Mathematica 13

Page 15: CUDA Programming Within Mathematica

CUDALink supports fundamental operations such as erosion, dilation, opening, and closing.CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathemati-ca’s built-in Erosion, Dilation , Opening, and Closing functions. More sophisticated morpho-logical operations can be built using these fundamental operations.

Video Processing

CUDALink’s built-in image processing functions can also be applied to videos to perform real-time filtering. Many common formats such as H.264, QuickTime, and DivX are supported. WithGPU computing power, CUDALink’s video processing function can easily handle full high-resolu-tion video (1080p) filtering in 30 frames per second.

Processing multiple video streams with different filters in real time.

Linear Algebra

You can perform various linear algebra functions with the GPU. Examples include vector addi-tion, products, and other operations, finding minimum or maximum elements, or transposingrows and columns of an image.

Fourier Analysis

The Fourier analysis capabilities of the CUDALink package include forward and inverse Fouriertransforms.

The CUDAFourier function can operate on a list of 1D, 2D, or 3D real or complex numbers, or ondata held in a chunk of device memory that is registered with CUDALink’s Fourier memorymanager.

14 CUDA Programming within Mathematica

Page 16: CUDA Programming Within Mathematica

Fluid Dynamics

Computational fluid dynamics examples are included with CUDALink. They simulate a largenumber of particles in real time using the GPU’s computing power.

Fluid simulation with multi-particles.

CUDALink’s options pricing function uses the binomial or Monte Carlo method, depending onthe type of option selected. Computing options on the GPU can be dozens of times faster thanusing the CPU and parallel processing.

Volumetric Rendering

CUDALink includes functions to read and display volumetric data in 3D, with interactive inter-faces for transfer functions and other volume-rendering parameter controls.

Volumetric rendering of a medical image.

CUDA Programming within Mathematica 15

Page 17: CUDA Programming Within Mathematica

Pricing and Licensing InformationWolfram Research offers many flexible licensing options for both organizations and individuals.You can choose a convenient, cost-effective plan for your workgroup, department, directorate,university, or just yourself, including network licensing for groups.

Visit us online for more information:www.wolfram.com/mathematica-purchase

SummaryThanks to Mathematica’s integrated platform design, all functionality is included without theneed to buy, learn, use, and maintain multiple tools and add-on packages.

With its simplified development cycle, automatic memory management, multicore computing,and built-in functions for many applications—plus full integration with all of Mathematica’sother computation, development, and deployment capabilities—Mathematica’s built-in CUD-ALink package provides a powerful interface for GPU computing.

Recommended Next Steps

Request your free Mathematica trial versionwww.wolfram.com/mathematica-trial

Request a technical demo:www.wolfram.com/mathematica-demo

Learn more about CUDA programming in Mathematica:

U.S. and Canada: Europe:1-800-WOLFRAM (965-3726) +44-(0)[email protected] [email protected]

Outside U.S. and Canada (except Europe and Asia): Asia:+1-217-398-0700 +81-(0)[email protected] [email protected]

© 2010 Wolfram Research, Inc. Mathematica is a registered trademark and Mathematica Player is a trademark ofWolfram Research, Inc. Wolfram|Alpha is a registered trademark and computational knowledge engine is a trademarkof Wolfram Alpha LLC—A Wolfram Research Company. All other trademarks are the property of their respectiveowners. Mathematica is not associated with Mathematica Policy Research, Inc. or Mathtech, Inc. MKT3014 20606990910.hk

16 CUDA Programming within Mathematica