a new lanczos-based low rank algorithm for inhomogeneous dynamical mean-field theory

23
A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory Pierre Carrier University of Minnesota Yousef Saad University of Minnesota James K. Freericks Georgetown University Inhomogeneous-DMFT Low-Rank Lanczos for finding diagonal of inverse Implementation using Co-Array Fortran=fortran2008 Towards 3D petascale computations (CPU time vs memory) .00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks Slides available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf NISE-2011

Upload: binta

Post on 27-Jan-2016

38 views

Category:

Documents


1 download

DESCRIPTION

A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory. Pierre Carrier University of Minnesota. Yousef Saad University of Minnesota. James K. Freericks Georgetown University. Inhomogeneous-DMFT Low-Rank Lanczos for finding diagonal of inverse - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Pierre Carrier

University of Minnesota

Yousef Saad

University of Minnesota

James K. Freericks

Georgetown University

•Inhomogeneous-DMFT

•Low-Rank Lanczos for finding diagonal of

inverse

•Implementation using Co-Array

Fortran=fortran2008

•Towards 3D petascale computations (CPU time vs

memory)

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

Slides available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf

NISE-2011

Page 2: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

M.-T. Tran, Phys. Rev. B 73, 205110 (2006)

W. Metzner & D. Vollhardt, Phys. Rev. Lett. 62, 324 (1989)

Transport in multilayered nanostructures, the dynamical mean-field theory (Imperial college Press, 2006)

Mapping

•Inhomogeneous-DMFT

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

effective medium

Dyson equation Diagonal of inverse

The Green’s functions are complex-symmetric matrices

and, in general, non-diagonally dominant

... ...

fixed-point iteration

-self-energies-hopping matrix-density-potentials-...

Page 3: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

-Standard Lanczos for Hermitian matrices:

G = )(( )( )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

T

leads to G-1

R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos

[See also Proc. of the HPCMP Users Group Conference, page 223 (2010)]

G = )(( )( ) Need to deal with possible “breakdowns”...

-Lanczos for complex-symmetric matrices:

T

• Low-Rank Lanczos for finding diagonal of inverse

Page 4: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

-Standard Lanczos for Hermitian matrices:

G = )(( )( )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

T

leads to G-1

R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos

[See also other approach shown in Proc. of the HPCMP Users Group Conference, page 223 (2010)]

G = )(( )( ) Need to deal with possible “breakdowns”...

-Lanczos for complex-symmetric matrices:

T

• Low-Rank Lanczos for finding diagonal of inverse

Page 5: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

G = )(( )( )

Standard Lanczos isprohibitive when N is large ( N

~100X100X100 sites)

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

Large, dense +

requires re-orthogonalizationAlternative: reduce the number of Lanczos

steps and keep the solution exactuse a low-rank matrix

T

• Low-Rank Lanczos for finding diagonal of inverse

Page 6: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

1234567890123456789012345

1234567890123456789012345

Example in 2D:decomposition

into4 domainsgives block diagonal matrices

9 interface sites

1111111111222222

2222221111111111

Gij=

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

9 inte

rface

sit

es

• Low-Rank Lanczos for finding diagonal of inverse

Page 7: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

(G= )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

interface sites

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

• Low-Rank Lanczos for finding diagonal of inverse

Page 8: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

(G= )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

( )-1

-=

interface sitesnumber of interface sites

= Schur complement’s

rank

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

• Low-Rank Lanczos for finding diagonal of inverse

Page 9: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

(G= )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

( )-1

-=

interface sitesnumber of interface sites

= Schur complement’s

rank

[G]-1 =( )-1

-DomainDecomp( )

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

Domain decompositio

n

Rather complicated expression

• Low-Rank Lanczos for finding diagonal of inverse

Page 10: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

(G= )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

( )-1

-=

interface sitesnumber of interface sites

= Schur complement’s

rank

[G]-1 =( )-1

-DomainDecomp( )

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

Domain decompositio

n

X=[G]-1 - ( )-1=( )( )(

)Lanczos on

interface sites

Lanczos-based

low-rank

swap

• Low-Rank Lanczos for finding diagonal of inverse

Page 11: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

(G= )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

( )-1

-=

interface sitesnumber of interface sites

= Schur complement’s

rank

[G]-1 =( )-1

-DomainDecomp( )

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

Domain decompositio

n

X=[G]-1 - ( )-1=( )( )(

)Lanczos on

interface sites

Lanczos-based

low-rank

swap

• Low-Rank Lanczos for finding diagonal of inverse

SAME LOW-RANK AS OR

Page 12: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

(G= )

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

( )-1

-=

interface sitesnumber of interface sites

= Schur complement’s

rank

[G]-1 =( )-1

-DomainDecomp( )

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

Domain decompositio

n

Lanczos-based

low-rankX=[G]-1 - ( )

-1=( )( )(

)Lanczos on

interface sites

This is the low-rank matrix

T

• Low-Rank Lanczos for finding diagonal of inverse

Page 13: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Lanczos-based

low-rankX=[G]-1 - ( )

-1=

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

X

( )-1

Parallel complex-GMRES

on Gu = qj

Complex-GMRESper each block B

(-1)-

AT EACH LANCZOS STEP

ALSO USED AS PRECONDITIONER

(rank of Schur complement) S << N (rank of full G matrix)

•Implementation using Co-Array Fortran=fortran2008

( )( )( )

Lanczos on interface sites

T

Page 14: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

•Implementation using Co-Array Fortran=fortran2008

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

•IDMFT SCF loop; Falicov-Kimball; 3D•Distributed sparse matrices (CSR format)•Includes parallel complex- GMRES, Block-GMRES and Lanczos •Written entirely in CAF- fortran 2008:

-CAF is available on any Cray machines using PrgEnv-cray “ftn -hcaf <routine.f08>”-“allocate( GreensFunction(subarray)[domain, 0:Matsubara]”-CO_SUM = MPI_Allreduce(..., SUM, ...)

...

Matsubara=0Matsubara=1 Matsubara=2 Matsubara=62

domain=1

domain=2

domain=3

domain=4

domain=5

Example: “Greens(34)[1,0] = Greens(56)[3,62]”

(Schur)

Greens(56)

Greens(34)

ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php

Page 15: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

Diagonally dominant matrices require relatively more additional iterations; Null space is more often visited

•Implementation using Co-Array Fortran=fortran2008

Freund’s Lanczos breaking down behavior:

Freunds’ algorithm works well, ~20% more iterations can be required before convergenceHopper

Page 16: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

Goal for 3D: More domains and larger systems

Time Scaling is equivalent to that of domain decomposition

Tested in the past in 2D, sequential, F90, see:http://www.sciencedirect.com/science/article/pii/S1875389211003257

Hopper

•Towards 3D petascale computations (CPU time vs memory)

Page 17: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

-Small number of Lanczos iteration-Large memory per block

Memory too large due to the size of each blocks

-Large number of Lanczos iteration-Small memory per block

Memory too large due to the size of

Lanczos re-orthogonalization

•Towards 3D petascale computations (CPU time vs memory)MEMORY conflict

with the low-rank Lanczos algorithm:

Page 18: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

optimum subdomain size for low-rank Lanczos

•Towards 3D petascale computations (CPU time vs memory)

The cusp appears at (331) in 2D

case

Hopper

Page 19: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

•Towards 3D petascale computations (CPU time vs memory)

100000

Number of sites

Using 2X2X2 Lanczos domains only

~363~507

~675~867

~1323~1875

~2883

Maximum on Cray XE6 (1333MB/processor)

number of Lanczos steps

41X41X41 sites max~4800 Lanczos

steps

31X31X31

TOTAL MEMORY/proc.

Initialization MEMORY/proc.

Indexing MEMORY/proc.

Hopper

Page 20: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

allocate( GreensFunction(subarray)[subdomains, Lanczos, Matsubara])

SLIDES available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdfCODES available at: http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm

•Towards 3D petascale computations (CPU time vs memory)81X81X81=531,441 sites

~19,683 Lanczos steps

2(2)X2(2)X2(2) = 64 Lanczos (subdomains)

160X160X160= 4,173,281 sites~77,763 Lanczos steps

2(4)X2(4)X2(4) = 512 Lanczos (subdomains)

-other patterns of interface-out-of-core Lanczos vectors

Page 21: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

Thank you

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

•Bob Numrich (CUNY)•Jok Tang (Vortech)•Woo-Sun Yang (NERSC)•Haw-ren Fang (U Minnesota)SLIDES available at

http://www.msi.umn.edu/~carrierp/images/APS2012.pdfCODES available at: http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm

Page 22: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

•What is a general IDMFT loop implementation

...

IDMFT_loop:Do

Matsubara=0 Matsubara=1 Matsubara=2 Matsubara=nmax

sync all (MPI_barrier)

-Density( )...

...

-Define the diagonal from the Dyson equation.

-Solve:

-Impurity solvers:

End do IDMFT_loopY24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

-Update the self-energy

Fixed

-poin

t it

era

tion

Page 23: A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

full lattice

effective medium

THE MAIN DIFFICULTY OF THE ALGORITHM IS TO FIND (nmax+1) SIMULTANEOUSLY SEVERAL

DIAGONAL OF THE INVERSE OF THE LATTICE DYSON EQUATION (COMPLEX SYMMETRIC SPARSE MATRICES)

Fixed-point iteration

Mapping•What is a general IDMFT loop implementation

self-energylocal

noninteracting

Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

Hopping+

Chemical and trap

potentials

Complexor real

Matsubarafrequencie

s