chengkun huang et al- quickpic: a highly efficient fully parallelized pic code for plasma-based...

20
QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration Chengkun Huang, V. K. Decyk, M. Zhou, W. Lu, W. B. Mori (UCLA), J.H. Cooley, T.M. Antonsen Jr. (U. Maryland), B. Feng, T. Katsouleas (USC) Jorge Vieira (IST)

Upload: yuers

Post on 29-Jul-2015

61 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

QuickPIC: a highly efficient fully parallelized

PIC code for plasma-based acceleration

Chengkun Huang, V. K. Decyk, M. Zhou, W. Lu, W. B. Mori (UCLA),

J.H. Cooley, T.M. Antonsen Jr. (U. Maryland),

B. Feng, T. Katsouleas (USC)

Jorge Vieira (IST)

Page 2: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 2

Particle Accelerators

• Limited by peak power andbreakdown

• 20-100 MeV/m

• No breakdown limit

• 10-100 GeV/m

Conventional Accelerators Plasma

• Plasma Wake Field Accelerator (PWFA)

A high energy electron bunch

• Laser Wake Field Accelerator (LWFA)

A single short-pulse of photons

Why Plasmas?

Dawson & Tajima 1979

PIC Model required!

Page 3: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 3

Accelerating force

Plasma/Laser Wakefield Acceleration

Uniform accelerating field Linear focusing field

E

z ,maxmc

pe 0.96

n

cm3 V / cmFocusing force

++ ++ ++ ++ ++ ++ + ++ + ++ ++ ++ ++ ++ ++ ++

-- -- -- - ---- -- --- ---- ----- - -- - - - ---

Fr

--

---------- --- --- - - - -- - -- -- -----------

-

----------

-- --- --

-- --------- -

-- - --------

----

-- ------

- - - --- - --

-- - ---

++++++++++++++++++++++++++ +++++++++++++++ +++++++++++++++

-

---

-- ---

---- ---------FFzz

Page 4: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 4

Plasma Accelerator Progress“Accelerator Moore’s Law”

RAL

LBLOsaka

UCLA

E164X

ILC

Current Energy Frontier

ANL

E167

How do we gethere?

LBL

Page 5: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 5

For 50GeV energy gain, 261,712 node-hours is needed for the Osiris simulation.

Challenge in PIC modeling

7μm 7μm 45μm

Beam spot size

~1.8E10 e-

Beamcharge

37 μm50Gev26~2E16 cm-3

Collisionless

skin depth c/ p

Beamenergy

Density ratio

nb,peak/n0

Plasmadensity

Quasi-static PIC code.

Beam, plasma evolutiontime scale separated

Full electromagnetic

PIC code

Feature

5,234t< 0.05 p-1~0.05c/ pOsiris

t<0.05 -1

=

Timestep limit

~0.05c/ p

Grid size limit

67QuickPIC

Total time ofsimulation per GeVstage (node-hour)

SimulationCodes

13

0.05 2 p1

Typical PWFA simulation parameters and requirement

Page 6: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 6

Challenge in PIC modeling

0.8 μm

Laserwavelength

~200TW

Laser intensity

3.7 μm30 fs0.87E-3~1.5E18 cm-3

Collisionlessskin depth c/ p

Laserduration

Density ratio

n/ncrit

Plasmadensity

Typical LWFA simulation parameters

~ 5,000t< 0.05 p-1~0.05c/ p

PonderomotiveGuiding Center PIC

code

FullPIC(Vorpal)

with PGC

Quasi-static PIC code.

Laser, plasma evolutiontime scale separated

Full electromagnetic

PIC code

Feature

~ 1.2 105t< 0.2 0-1~0.05 Osiris/Vorp

al

t < 0.05 tr

Timestep limit

~0.05c/ p

Grid size limit

~ 192QuickPIC

Total time ofsimulation per GeVstage (node-hour)

SimulationCodes

13

Page 7: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 7

Quasi-static Model

• There are two intrinsic time scales, one fast time scale associatedwith the plasma motion and one slow time scale associated withthe betatron motion of an ultra-relativistic electron beam.

• Quasi-static approximation eliminates the need to follow fastplasma motion for the whole simulation.

• Ponderomotive Guiding Center approximation: High frequencylaser oscillation can be averaged out, laser pulse will be repre-sented by its envelope.

Page 8: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 8

Implementation

The driver evolution can be calculated in a 3D moving box,while the plasma response can be solved for slice by slicewith the longitudinal index being a time-like variable.

Page 9: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 9

• Object-Oriented design in modern Fortran language,easily ported to major operating systems.

• Parallelized in both the plasma and the particle/laserbeam solvers.

• Use fast Sin/Cos transform to perform FFT.• Can accommodate beam drivers, laser driver and

external injection simultaneously.• Include ionization process and radiation damping.• Can model plasma channel with arbitrary profile.• Include ion motion.

Code Features

Page 10: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 10

Benchmark with full PIC code

-3

-2

-1

0

1

2

-5 0 5 10

OSIRISQuickPIC

Lon

gitu

dina

l wak

efie

ld(m

cp/e

)

(c/p)

-3

-2

-1

0

1

2

3

-8 -6 -4 -2 0 2 4 6 8

OsirisQuickPIC (l=2)QuickPIC (l=4)

Long

itudi

nal W

akef

ield

(mc

p/e)

(c/p)

-0.1

-0.05

0

0.05

0.1

-10 -5 0 5 10

Osiris

QuickPIC (l=2)

Long

itudi

nal W

akef

ield

(mc

p/e)

(c/p)

-1

-0.5

0

0.5

1

-6 -4 -2 0 2 4 6

Osiris QuickPIC (l=2)

Long

itudi

nal W

akef

ield

(mc

p/e)

(c/p)

e- driver e+ driver

e- driver with

ionization laser driver

100+ CPU savings with “no” loss in accuracy

Page 11: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 11

-4

-2

0

2

4

6

8

0

2

4

6

8

10

12

0 100 200 300 400 500

E (

GeV

)

Beam

curren

t (KA

)

z(μm)

Emax ~ 5 0.5( tron radiation) = 4.5GeV

Emax ~ 4GeV

(initial energy chirp

considered)

E164X experiment

QuickPIC simulation

Modeling self-ionized PWFA experiment

Experiment conducted at SLAChas shown 4 GeV energy gain ofthe electron beam in 30 cmplasma.

QuickPIC simulation hasshown 4.5 GeV energy gainwith similar features in theenergy diagnostics.

Page 12: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 12

A TeV class afterburner

Page 13: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 13

Simulation result of 1 TeV PWFA

500 GeV energy gain in 25m! Energy spread ~ 5%

Wakefield evolution is stable

Page 14: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 14

Laser wakefield simulation

QuickPIC simulation for LWFA in the blow-out regime

Page 15: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 15

Modeling LWFA in a plasma channel

Page 16: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 16

0.8mm 2.6mm 3.4mm

3D OSIRIS

QuickPIC

An accurate and efficient tool for LWFA

Tremendous time-saving!

Page 17: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 17

Exploiting more parallelism: Pipelining

• Pipelining technique exploits parallelism in a sequential operationstream and can be adopted in various levels.

• Modern CPU designs include instruction level pipeline toimprove performance by increasing the throughput.

• In scientific computation, software level pipeline is less commondue to hidden parallelism in the algorithm.

• We are implementing a software level pipeline in QuickPIC.

Moving Window

plasma response

1 ~(# of slices)/25 ~ 31Stages

Plasma/beamupdate

IF, ID, EX, MEM,WB

Operation

Plasma sliceInstruction streamOperand

Software pipelineInstruction pipeline

Page 18: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 18

beam

solve plasmaresponse

update beam

Initialplasma slab

Without pipelining: Beam is not advanced

until entire plasma response is determined

solve plasmaresponse

update beam

solve plasmaresponse

update beam

solve plasmaresponse

update beam

solve plasmaresponse

update beam

beam

1 2 3 4

With pipelining: Each section is updated when its

input is ready, the plasma slab flows in the pipeline.

Initialplasma slab

Pipelining: scaling QuickPICto 10,000+ processors

Page 19: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 19

Speedup in pipeline mode

0

10

20

30

40

50

60

70

0 20 40 60 80

Stages in the pipeline

Sp

eed

up

Actual

Ideal

Performance in pipeline mode

• Preliminary benchmark shows thatpipeline operation can reachperformance very close to the idealsituation.

• Time to transfer a plasma slice betweensuccessive stages is inexpensive anddoes not depend on the number ofstages.

• Speedup will saturate when overhead(time to transfer a slice) becomessignificant in the total time spend ineach stage.

• In each stage, the number of processorsis chosen according to the transversesize of the problem.

16,384512Total

12832# of stages

12816# of CPU ineach stage

HighRes.

Typical

Page 20: Chengkun Huang et al- QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

06/26/06 SCIDAC 2006 20

Summary

By taking advantage of the two different time scales in PWFA/LWFAproblems, QuickPIC allows 100-1000 times time-saving for simulations ofstate-of-art experiments.

QuickPIC enables scientific discovery in plasma-based acceleration byexploring parameter space which are not easily accessible throughconventional PIC code.

We are working to scale QuickPIC to the petascale platform using thesoftware pipelining technique. Initial benchmark shows very promisingperformance enhancement.