![Page 1: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/1.jpg)
4/18/00 Spring 2000 FFTw workshop 1
AHPCC/NCSA WORKSHOPFast Fourier Transform Using FFTw
Guobin Ma1 ([email protected]),
Sirpa Saarinen2 ([email protected]),
and Paul M. Alsing1 ([email protected]),1AHPCC, 2NCSA
http://www.ahpcc.unm.edu/Workshop/FFTW
![Page 2: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/2.jpg)
4/18/00 Spring 2000 FFTw workshop 2
ContentsFFT basic (Paul)
What is FFT and why FFT
FFTwOutline of FFTW (Guobin)
Characteristics C routines
Performance and C example codes (Sirpa) Fortran wrappers and example codes (Guobin)
Exercises (skipped)
![Page 3: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/3.jpg)
4/18/00 Spring 2000 FFTw workshop 3
FFT Basic
What is FFT and why FFT
by Paul Alsing
![Page 4: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/4.jpg)
4/18/00 Spring 2000 FFTw workshop 4
Fourier Transform: frequency analysis of time series data.DFT: Discrete Fourier Transform (N time/freq points)
FFT: Fast Fourier Transform: efficient implementation ~O(Nlog2N)
12/,,2/,
1,,1,0,,
1
2
1 1
0
/22
1
0
/22
NNntN
nf
Nktktthh
eHN
hdefHth
ehHdtethfH
n
kkk
N
n
Nnkink
tfi
N
k
Nnkikn
tfi
![Page 5: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/5.jpg)
4/18/00 Spring 2000 FFTw workshop 5
Aliasing issues:
Let fc = Nyquist Frequency
= 1/(2t). A sine wave
sampled at fc will be sampled at
2 points, the peak and the trough.
Frequency components f > | fc |
will be falsely folded back into
the range -fc < f < fc.
![Page 6: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/6.jpg)
4/18/00 Spring 2000 FFTw workshop 6
Fourier Transform: radix 2, Danielson-Lanczos
sh
nHsh
nH
HWH
eWheWhe
hehe
heH
k
onk
en
on
nen
NinN
kk
NnkinN
kk
Nnki
N
kk
NnkiN
kk
Nnki
N
kk
Nnkin
' original theof components odd thefrom formed N/2length of
FT theofcomponent th theis ;' original theof componentseven
thefrom formed N/2length of FT theofcomponent th theis where
, /212/
0
2//212/
0
2//2
12/
0
/12212/
0
/22
1
0
/2
![Page 7: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/7.jpg)
4/18/00 Spring 2000 FFTw workshop 7
Fourier Transform: radix 2, Danielson-Lanczos (cont.)
8/length of are ,,,,,,,
4/length of are ,,,
2/length of are ,
Nlength of is
steps8log,
,
,
2424
424
22
NHHHHHHHH
NHHHH
NHH
H
NHWHWHWHW
HWHWHWH
HWHWHWH
HWHH
ooon
oeon
eoon
eeon
ooen
oeen
eoen
eeen
oon
oen
eon
een
on
en
n
oooon
nooen
noeon
noeen
n
eoon
neoen
neeon
neeen
oon
noen
neon
neen
on
nenn
![Page 8: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/8.jpg)
4/18/00 Spring 2000 FFTw workshop 8
Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm
We finally get down to 1-point transforms such as
The question is: which value of m corresponds to which pattern
of e’s and o’s?
The answer is:
Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will
have the value of m in binary.
1-Nm 0 somefor e)input valu (some moeeeoeeoeo
n hH
![Page 9: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/9.jpg)
4/18/00
Bit reversal:The Cooley-Tukeyalgorithm first rearranges the datain bit reversed form,then builds up thetransform in
N log2N iterations
(decimation in time).
eee
eeo
eoe
eoo
oee
oeo
ooe
eee
eee
eeo
eoe
eoo
oee
oeo
ooe
eee
![Page 10: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/10.jpg)
4/18/00 10
Ordering oftime series(coord space)and frequenciesin fourier (momentum) space.
![Page 11: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/11.jpg)
11
Example Application: Quantum MechanicsPropagation of (dimensionless) Schrodinger Wave Function
tk
tk
e
e
tke
tx
tx
e
e
txe
txeee
VTHtxettx
txHtxtxVx
tx
t
txi
Ntki
tki
tiT
NttxiV
ttxiV
tiV
tTitiVtiT
tHi
N
N
,ˆ
,ˆ
,ˆ space, (momentum)fourier In
,
,
,,space coordinateIn
,
,,,
0,,,,,
1
2/2/1
2/2/1
2/
1
,
,
2/2/
2
2
2
21
1
![Page 12: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/12.jpg)
4/18/00
x
y
y
x
transpose
Transpose data to keepy transforms continguousin memory.
x data is contiguous in memory (Fortran)
Serial FFTs
![Page 13: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/13.jpg)
transposeIn parallel, all x transformsare local operations on eachprocessor (no communication)
In performing the transposeprocessors must perform anAll-to-All communication.
Parallel FFTs
y
xP0 P3P1 P2
x
y P2P0 P1 P3
![Page 14: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/14.jpg)
4/18/00 Spring 2000 FFTw workshop 14
Outline of FFTw
By Guobin Ma
![Page 15: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/15.jpg)
4/18/00 Spring 2000 FFTw workshop 15
Characteristics of FFTwC routines generated by Caml-Light ML1D/nD, real/complex dataArbitrary input size, not necessary 2n
Serial/Parallel, Share/Distributed MemoryFaster than all others, high performancePortable, automatically adapt to machine
![Page 16: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/16.jpg)
4/18/00 Spring 2000 FFTw workshop 16
Two Phases of FFTwHardware dependent algorithmPlanner
‘Learn’ the fast way on your machineProduce a data structure --‘plan’Reusable
ExecutorCompute the transform
Apply to all FFTw operation modes 1D/nD, complex/real, serial/parallel
![Page 17: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/17.jpg)
4/18/00 Spring 2000 FFTw workshop 17
C Routines of FFTwRoutines
1D/nD complex1D/nD realCorresponding parallel (MPI) ones
ArgumentsSpecial notesData formats
![Page 18: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/18.jpg)
4/18/00 Spring 2000 FFTw workshop 18
1D Complex TransformTypical call
#include <fftw.h>…{ fftw_complex in[N], out[N]; fftw_plan p; … p = fftw_create_plan(int n, fftw_direction dir, int flags); … fftw_one(p, in, out); … fftw_destroy_plan(p);}
![Page 19: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/19.jpg)
4/18/00 Spring 2000 FFTw workshop 19
1D Complex Transform (cont.) Routines
fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);
void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out);
fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, int flags,
fftw_complex *in, int istride,fftw_complex *out, int ostride);
![Page 20: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/20.jpg)
4/18/00 Spring 2000 FFTw workshop 20
1D Complex Transform (cont.) Routines (cont.)
void fftw(fftw_plan plan, int howmany,fftw_complex *in, int istride, int
idist, fftw_complex *out, int ostride, int odist);
fftw_destroy_plan(fftw_plan plan);
![Page 21: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/21.jpg)
4/18/00 Spring 2000 FFTw workshop 21
1D Complex Transform (cont.) Arguments
plan: data structure containing all the information
n: data size
dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1)
flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE,FFTW_IN_PLACE, FFTW_USE_WISDOM, separated
by |
howmany: number of transforms / input arrays
in, istride, idist: input arrays, in[i*istride+j*idist]
out, ostride, odist: output arrays, ...
![Page 22: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/22.jpg)
4/18/00 Spring 2000 FFTw workshop 22
1D Complex Transform (cont.) Notes
out of place (default), in[N], out[N]
in place, save memory, cost more timeignore ostride and odist; ignore out
in-order output, 0 frequency at out[0]
unnormalized, factor of N
![Page 23: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/23.jpg)
4/18/00 Spring 2000 FFTw workshop 23
nD Complex TransformRoutines, similar to 1D case, except …
fftwnd_plan fftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);
void fftwnd_one(fftwnd_plan plan, , );
fftwnd_plan fftw_create_plan_specific(int rank, const *int n, fftw_direction dir, , , , , );
void fftwnd(fftwnd_plan plan, , , , , , , );
fftwnd_destroy_plan(fftwnd_plan plan);
![Page 24: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/24.jpg)
4/18/00 Spring 2000 FFTw workshop 24
nD Complex Transform (cont.)Arguments
rank: dimensionality of the arrays to be transformed
n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5]
row-major for C, column-major for Fortran
Special routines for 2D and 3D cases
nd -> 2d, 3d
n_dim -> nx, ny or nx, ny, nz
![Page 25: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/25.jpg)
4/18/00 Spring 2000 FFTw workshop 25
1D Real TransformRoutines, similar to 1D complex case, except …
rfftw_plan rfftw_create_plan( , , );
void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);
rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_real *in, int istride, fftw_real *out, int ostride);
void rfftw(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_real *out, int ostride, int odist);
rfftw_destroy_plan(rfftw_plan plan);
![Page 26: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/26.jpg)
4/18/00 Spring 2000 FFTw workshop 26
1D Real Transform (cont.)Arguments
dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1 FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1
others have the same meaning as before
![Page 27: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/27.jpg)
4/18/00 Spring 2000 FFTw workshop 27
nD Real TransformRoutines, similar to 1D real case, but …
rfftwnd_plan rfftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);
void rfftwnd_one_real_to_complex(rfftwnd_plan plan, fftw_real *in, fftw_complex *out);
void rfftwnd_one_complex_to_real(rfftwnd_plan plan, fftw_complex *in, fftw_real *out);
void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_complex *out, int ostride, int odist);
![Page 28: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/28.jpg)
4/18/00 Spring 2000 FFTw workshop 28
nD Real Transform (cont.)Routines (cont.)
void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_real *out, int ostride, int odist);
rfftwnd_destroy_plan(rfftwnd plan);
Special 2D and 3D routines
![Page 29: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/29.jpg)
4/18/00 Spring 2000 FFTw workshop 29
nD Array Format
nD arrays stored as a single contiguous blockC order, Row-major order
First index most slowly, last most quickly
Fortran order, Column-major orderFirst index most quickly, last most slowly
Static Array - no problemDynamic Array - may have problem in nD case
![Page 30: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/30.jpg)
4/18/00 Spring 2000 FFTw workshop 30
Parallel FFTw
Multi-thread Skipped
MPI nD complex
RoutinesNotesData Layout
1D complexnD real
![Page 31: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/31.jpg)
4/18/00 Spring 2000 FFTw workshop 31
nD Complex MPI FFTwRoutines, similar to uniprocessor case, except mpi…
fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm, int rank, const *int n, fftw_direction dir, int flags);
void fftwnd_mpi_local_size(fftwnd_mpi_plan p, int *local_first, int *local_first_start, int *local_second_after_transpose, int *local_second_start_after_transpose, int *total_local_size);
local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);
work = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);
![Page 32: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/32.jpg)
4/18/00 Spring 2000 FFTw workshop 32
nD Complex MPI FFTw (cont.)
Routines (cont.)
void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);
void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);
![Page 33: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/33.jpg)
4/18/00 Spring 2000 FFTw workshop 33
nD Complex MPI FFTw (cont.)Notets
First argument: comm - MPI communicatorData layoutAll fftw_mpi are in-placework:
Optional, Same size as local_data, great efficiency by extra storage
output_order: normal/transposedtransposed: performance improvements, need to reshape the data manually, may have problem sometimes
![Page 34: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/34.jpg)
4/18/00 Spring 2000 FFTw workshop 34
nD Complex MPI FFTw (cont.)Data layout
Distributed dataDivided according to row (1st dimension) in CDivided according to column (last dimension) in Fortran
Given plan, all other parameters regarding to data layout are determined by fftwnd_mpi_local_sizetotal_local_size = n1/np*n1*n2…*nk*n_fieldstransposed_order: n2 will be the 1st dimension in output
inverse transform n[n2,n1,n3,...,nk]
![Page 35: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/35.jpg)
4/18/00 Spring 2000 FFTw workshop 35
1D Complex MPI FFTw Routines, similar to nD case, except no nd…
fftw_mpi_plan fftw_create_plan(mpi_comm comm, int n, fftw_direction dir, int flags);
void fftw_mpi_local_size(fftw_mpi_plan p, int *local_n, int *local_n_start, int *local_n_after_transpose, int *local_start_after_transpose, int *total_local_size);
![Page 36: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/36.jpg)
4/18/00 Spring 2000 FFTw workshop 36
1D Complex MPI FFTw (cont.) Routines (cont.)
void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);
void fftw_mpi_destroy_plan(fftw_mpi_plan p);
Generally worse speedup than nD, fit large size
![Page 37: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/37.jpg)
4/18/00 Spring 2000 FFTw workshop 37
nD Real MPI FFTw
Similar to that for uniprocessor and complex MPI Speedup 2, save 1/2 space at the expense of more complicated data formatCan have transposed-order output dataNo 1D Real MPI FFTw
![Page 38: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/38.jpg)
4/18/00 Spring 2000 FFTw workshop 38
Break
![Page 39: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/39.jpg)
4/18/00 Spring 2000 FFTw workshop 39
FFTw Performance
By Sirpa Saarinen
http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt
![Page 40: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/40.jpg)
4/18/00 Spring 2000 FFTw workshop 40
C Example Codes
By Sirpa Saarinen
http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt
![Page 41: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/41.jpg)
4/18/00 Spring 2000 FFTw workshop 41
FFTW Fortran Wrappersand Example Codes
By Guobin Ma
![Page 42: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/42.jpg)
4/18/00 Spring 2000 FFTw workshop 42
FFTw Fortran-Callable WrappersRoutine names, append _f77 in C routine names
fftw/fftwnd/rfftw/rfttwnd ->
fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77fftw_mpi/fftwnd_mpi -> fftw_f77_mpi/fftwnd_f77_mpie.g. fftwnd_create_plan(3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE)
-> fftwnd_f77_create_plan(plan, 3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)
![Page 43: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/43.jpg)
4/18/00 Spring 2000 FFTw workshop 43
FFTw Fortran-Callable WrappersNotes
Any function that returns a value is converted into a subroutines with an additional (first) parameter. No null in Fortran, must allocate and pass an array for out. nD arrays, column-major, Fortran orderplan variables: be declared as integer
ConstantsFFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, …
separated by ‘+’ instead of ‘|’In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i
![Page 44: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/44.jpg)
4/18/00 Spring 2000 FFTw workshop 44
Fortran ExamplesSource codes at AHPCC (tested on Turing, BB, SGI):
~gbma/workshop/fftw/codes orhttp://www.arc.unm.edu/~gbma/Workshop/FFTW/codesComplex data
1D serial, fftw_1d.f901D parallel, fftw_1d_p.f90nD serial, fftw_3d.f90nD Parallel
Normal order, fftw_3d_p_n.f90 Transposed order, fftw_3d_p_t.f90
![Page 45: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/45.jpg)
4/18/00 Spring 2000 FFTw workshop 45
Fortran Examples (cont.)1D case
Input
Forward output Inverse output
nD caseInput
Forward outputInverse output
2
2)(
N
N
ikxdkkexf
)1,...12,2,12,...,,...,2,1,0()( NNNkkF)(xf
zyxzkykxki
zyx dkdkdkekkkzyxf zyx )(),,(
),,( zyxf
)1,...,,...,2,1,0(),,( zyxzyx kkkkkkF
![Page 46: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/46.jpg)
4/18/00 Spring 2000 FFTw workshop 46
1D Serial Fortran ExampleFFTw codes
...
call fftw_f77_create_plan(plan_forward,N, &
FFTW_FORWARD, FFTW_ESTIMATE)
call fftw_f77_create_plan(plan_reverse,N, &
FFTW_BACKWARD,FFTW_ESTIMATE)
...
call fftw_f77_one(plan_forward,in,out)
...
call fftw_f77_one(plan_reverse,out,in)
...
call fftw_f77_destroy_plan(plan_forward)
call fftw_f77_destroy_plan(plan_reverse)
![Page 47: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/47.jpg)
4/18/00 Spring 2000 FFTw workshop 47
1D Parallel Fortran ExampleFFTw codes
...
call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, &
FFTW_FORWARD,FFTW_ESTIMATE)
...
call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, &
local_n_after_trans, local_start_after_trans, total_local_size)
...
allocate( psi_local(0:total_local_size-1) )
...
allocate( work(0:total_local_size-1) )
![Page 48: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/48.jpg)
4/18/00 Spring 2000 FFTw workshop 48
1D Parallel Fortran Example (cont.)FFTw codes (cont.)
...
call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)
...
call fftw_f77_mpi_destroy_plan(p_fwd)
...
call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, &
FFTW_BACKWARD,FFTW_ESTIMATE)
...
call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)
...
call fftw_f77_mpi_destroy_plan(p_rvs)
![Page 49: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/49.jpg)
4/18/00 Spring 2000 FFTw workshop 49
nD Serial Fortran ExampleFFTw codes
call fftwnd_f77_create_plan(p_fwd,nd,n_dim, &
FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)
call fftwnd_f77_one(p_fwd,psi,0)
call fftwnd_f77_destroy_plan(p_fwd)
call fftwnd_f77_create_plan(p_rvs,nd,n_dim, &
FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)
call fftwnd_f77_one(p_rvs,psi,0)
call fftwnd_f77_destroy_plan(p_rvs)
![Page 50: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/50.jpg)
4/18/00 Spring 2000 FFTw workshop 50
nD Parallel Fortran Example FFTw codes, normal order, nD local array
n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz
call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&
nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &
local_last_start, local_nlast2_after_trans, &
local_last2_start_after_trans, total_local_size)
allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) )
allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )
![Page 51: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/51.jpg)
4/18/00 Spring 2000 FFTw workshop 51
nD Parallel Fortran Example (cont.) FFTw codes, normal order, nD local array (cont.)
call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_fwd)
call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &
nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_rvs)
![Page 52: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/52.jpg)
4/18/00 Spring 2000 FFTw workshop 52
nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array
n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz
call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&
nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &
local_last_start, local_nlast2_after_trans, &
local_last2_start_after_trans, total_local_size)
allocate( psi_local(0:total_local_size-1) )
allocate( work(0:total_local_size-1) )
![Page 53: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/53.jpg)
4/18/00 Spring 2000 FFTw workshop 53
nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array (cont.)
call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_fwd)
n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny
call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &
nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_rvs)
![Page 54: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/54.jpg)
4/18/00 Spring 2000 FFTw workshop 54
nD Parallel Fortran Example (cont.) Notes
Normal orderEasy to code, ‘low’ performance
Transposed order‘High’ performance, complicated to code, user reorder data
Use-workHigh efficiency, large memory space
![Page 55: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/55.jpg)
4/18/00 Spring 2000 FFTw workshop 55
Run the Examples at AHPCC Copy files to your directory
cp ~gbma/workshop/fftw/codes/*.* .Compile
make filename.turmake filename.bbmake filename.sgiwith link specification -lfftw -lfftw_mpi (only for MPI)
RunBB: qsub -I -l nodes=2
mpirun -np 2 -machinefile $PBS_NODEFILE filename.bbTuring: filename.turSGI: mpirun -np 2 filename.sgi
![Page 56: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/56.jpg)
4/18/00 Spring 2000 FFTw workshop 56
References Numerical Recipe (FOTRAN)
by / William T. Vetterling et al., New York : Cambridge University Press, 1992
Numerical integration by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell Pub. Co. 1967
www.fftw.orgFFTW User’s manual
by M. Frigo & S. G. Johnson
![Page 57: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),](https://reader035.vdocuments.mx/reader035/viewer/2022070409/56649e745503460f94b74961/html5/thumbnails/57.jpg)
4/18/00 Spring 2000 FFTw workshop 57
Acknowledgement Brain Baltz
installation of FFTw at AHPCCrunning MPI at AHPCC
John Greenfieldsetting up the grid access
Andrew Pinedacomputer work environment at AHPCC
Brain Smith & Susan Atlas many stimulated discussions
Many others ...