a new lanczos-based low rank algorithm for inhomogeneous dynamical mean-field theory
DESCRIPTION
A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory. Pierre Carrier University of Minnesota. Yousef Saad University of Minnesota. James K. Freericks Georgetown University. Inhomogeneous-DMFT Low-Rank Lanczos for finding diagonal of inverse - PowerPoint PPT PresentationTRANSCRIPT
A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory
Pierre Carrier
University of Minnesota
Yousef Saad
University of Minnesota
James K. Freericks
Georgetown University
•Inhomogeneous-DMFT
•Low-Rank Lanczos for finding diagonal of
inverse
•Implementation using Co-Array
Fortran=fortran2008
•Towards 3D petascale computations (CPU time vs
memory)
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Slides available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf
NISE-2011
M.-T. Tran, Phys. Rev. B 73, 205110 (2006)
W. Metzner & D. Vollhardt, Phys. Rev. Lett. 62, 324 (1989)
Transport in multilayered nanostructures, the dynamical mean-field theory (Imperial college Press, 2006)
Mapping
•Inhomogeneous-DMFT
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
effective medium
Dyson equation Diagonal of inverse
The Green’s functions are complex-symmetric matrices
and, in general, non-diagonally dominant
... ...
fixed-point iteration
-self-energies-hopping matrix-density-potentials-...
-Standard Lanczos for Hermitian matrices:
G = )(( )( )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
T
leads to G-1
R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos
[See also Proc. of the HPCMP Users Group Conference, page 223 (2010)]
G = )(( )( ) Need to deal with possible “breakdowns”...
-Lanczos for complex-symmetric matrices:
T
• Low-Rank Lanczos for finding diagonal of inverse
-Standard Lanczos for Hermitian matrices:
G = )(( )( )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
T
leads to G-1
R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos
[See also other approach shown in Proc. of the HPCMP Users Group Conference, page 223 (2010)]
G = )(( )( ) Need to deal with possible “breakdowns”...
-Lanczos for complex-symmetric matrices:
T
• Low-Rank Lanczos for finding diagonal of inverse
G = )(( )( )
Standard Lanczos isprohibitive when N is large ( N
~100X100X100 sites)
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Large, dense +
requires re-orthogonalizationAlternative: reduce the number of Lanczos
steps and keep the solution exactuse a low-rank matrix
T
• Low-Rank Lanczos for finding diagonal of inverse
1234567890123456789012345
1234567890123456789012345
Example in 2D:decomposition
into4 domainsgives block diagonal matrices
9 interface sites
1111111111222222
2222221111111111
Gij=
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
9 inte
rface
sit
es
• Low-Rank Lanczos for finding diagonal of inverse
(G= )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
interface sites
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
• Low-Rank Lanczos for finding diagonal of inverse
(G= )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
( )-1
-=
interface sitesnumber of interface sites
= Schur complement’s
rank
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
• Low-Rank Lanczos for finding diagonal of inverse
(G= )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
( )-1
-=
interface sitesnumber of interface sites
= Schur complement’s
rank
[G]-1 =( )-1
-DomainDecomp( )
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
Domain decompositio
n
Rather complicated expression
• Low-Rank Lanczos for finding diagonal of inverse
(G= )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
( )-1
-=
interface sitesnumber of interface sites
= Schur complement’s
rank
[G]-1 =( )-1
-DomainDecomp( )
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
Domain decompositio
n
X=[G]-1 - ( )-1=( )( )(
)Lanczos on
interface sites
Lanczos-based
low-rank
swap
• Low-Rank Lanczos for finding diagonal of inverse
(G= )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
( )-1
-=
interface sitesnumber of interface sites
= Schur complement’s
rank
[G]-1 =( )-1
-DomainDecomp( )
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
Domain decompositio
n
X=[G]-1 - ( )-1=( )( )(
)Lanczos on
interface sites
Lanczos-based
low-rank
swap
• Low-Rank Lanczos for finding diagonal of inverse
SAME LOW-RANK AS OR
(G= )
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
( )-1
-=
interface sitesnumber of interface sites
= Schur complement’s
rank
[G]-1 =( )-1
-DomainDecomp( )
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
Domain decompositio
n
Lanczos-based
low-rankX=[G]-1 - ( )
-1=( )( )(
)Lanczos on
interface sites
This is the low-rank matrix
T
• Low-Rank Lanczos for finding diagonal of inverse
Lanczos-based
low-rankX=[G]-1 - ( )
-1=
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
X
( )-1
Parallel complex-GMRES
on Gu = qj
Complex-GMRESper each block B
(-1)-
AT EACH LANCZOS STEP
ALSO USED AS PRECONDITIONER
(rank of Schur complement) S << N (rank of full G matrix)
•Implementation using Co-Array Fortran=fortran2008
( )( )( )
Lanczos on interface sites
T
•Implementation using Co-Array Fortran=fortran2008
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
•IDMFT SCF loop; Falicov-Kimball; 3D•Distributed sparse matrices (CSR format)•Includes parallel complex- GMRES, Block-GMRES and Lanczos •Written entirely in CAF- fortran 2008:
-CAF is available on any Cray machines using PrgEnv-cray “ftn -hcaf <routine.f08>”-“allocate( GreensFunction(subarray)[domain, 0:Matsubara]”-CO_SUM = MPI_Allreduce(..., SUM, ...)
...
Matsubara=0Matsubara=1 Matsubara=2 Matsubara=62
domain=1
domain=2
domain=3
domain=4
domain=5
Example: “Greens(34)[1,0] = Greens(56)[3,62]”
(Schur)
Greens(56)
Greens(34)
ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Diagonally dominant matrices require relatively more additional iterations; Null space is more often visited
•Implementation using Co-Array Fortran=fortran2008
Freund’s Lanczos breaking down behavior:
Freunds’ algorithm works well, ~20% more iterations can be required before convergenceHopper
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Goal for 3D: More domains and larger systems
Time Scaling is equivalent to that of domain decomposition
Tested in the past in 2D, sequential, F90, see:http://www.sciencedirect.com/science/article/pii/S1875389211003257
Hopper
•Towards 3D petascale computations (CPU time vs memory)
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
-Small number of Lanczos iteration-Large memory per block
Memory too large due to the size of each blocks
-Large number of Lanczos iteration-Small memory per block
Memory too large due to the size of
Lanczos re-orthogonalization
•Towards 3D petascale computations (CPU time vs memory)MEMORY conflict
with the low-rank Lanczos algorithm:
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
optimum subdomain size for low-rank Lanczos
•Towards 3D petascale computations (CPU time vs memory)
The cusp appears at (331) in 2D
case
Hopper
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
•Towards 3D petascale computations (CPU time vs memory)
100000
Number of sites
Using 2X2X2 Lanczos domains only
~363~507
~675~867
~1323~1875
~2883
Maximum on Cray XE6 (1333MB/processor)
number of Lanczos steps
41X41X41 sites max~4800 Lanczos
steps
31X31X31
TOTAL MEMORY/proc.
Initialization MEMORY/proc.
Indexing MEMORY/proc.
Hopper
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
allocate( GreensFunction(subarray)[subdomains, Lanczos, Matsubara])
SLIDES available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdfCODES available at: http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm
•Towards 3D petascale computations (CPU time vs memory)81X81X81=531,441 sites
~19,683 Lanczos steps
2(2)X2(2)X2(2) = 64 Lanczos (subdomains)
160X160X160= 4,173,281 sites~77,763 Lanczos steps
2(4)X2(4)X2(4) = 512 Lanczos (subdomains)
-other patterns of interface-out-of-core Lanczos vectors
Thank you
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
•Bob Numrich (CUNY)•Jok Tang (Vortech)•Woo-Sun Yang (NERSC)•Haw-ren Fang (U Minnesota)SLIDES available at
http://www.msi.umn.edu/~carrierp/images/APS2012.pdfCODES available at: http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm
•What is a general IDMFT loop implementation
...
IDMFT_loop:Do
Matsubara=0 Matsubara=1 Matsubara=2 Matsubara=nmax
sync all (MPI_barrier)
-Density( )...
...
-Define the diagonal from the Dyson equation.
-Solve:
-Impurity solvers:
End do IDMFT_loopY24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
-Update the self-energy
Fixed
-poin
t it
era
tion
full lattice
effective medium
THE MAIN DIFFICULTY OF THE ALGORITHM IS TO FIND (nmax+1) SIMULTANEOUSLY SEVERAL
DIAGONAL OF THE INVERSE OF THE LATTICE DYSON EQUATION (COMPLEX SYMMETRIC SPARSE MATRICES)
Fixed-point iteration
Mapping•What is a general IDMFT loop implementation
self-energylocal
noninteracting
Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Hopping+
Chemical and trap
potentials
Complexor real
Matsubarafrequencie
s