user-level threads and openmp - xcalablempmassive on-node parallelism • to address massive on...

Click here to load reader

Post on 23-Jan-2020

3 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • SangminSeo

    AssistantComputerScientistArgonneNationalLaboratory

    [email protected]

    November7,2016

    User-Level Threads and OpenMP

  • Top500 Supercomputers Today1. SunwayTaihuLight (SunwayMPP):2. Tianhe-2(IntelXeon+XeonPhi):3. Titan(CrayXK7+Nvidia K20x):4. Sequoia(BlueGene/Q):5. Kcomputer(SPARC64):6. Mira(BlueGene/Q):

    10,646,600cores3,120,000cores560,640cores

    1,572,864cores705,024cores786,432cores

    NumberofcorespersocketinTop500 TotalnumberofcoresinTop500rank#14thXMPWorkshop(11/7/2016) 2

    0%

    20%

    40%

    60%

    80%

    100%1 2 4 6 8 10 12 14-

    0K

    2,000K

    4,000K

    6,000K

    8,000K

    10,000K

    12,000K

    TotalN

    umbe

    rofC

    oers

  • Massive On-node Parallelism

    • Toaddressmassiveon-nodeparallelism,thenumberofworkunits(e.g.,threads)mustincreaseby100X

    • MPI+OpenMP issufficientformanyapps,butimplementationispoor– TodayMPI+OpenMP ==MPI+Pthreads

    • Pthreadabstractionistoogeneric,notsuitableforHPC– Lackoffine-grainedscheduling,memory

    management,networkmanagement,signaling,etc.

    • BetterruntimecansignificantlyimproveMPI+OpenMP performanceandsupportotheremergingprogrammingmodels

    core

    MPIprocesswithmanyOpenMP threads

    4thXMPWorkshop(11/7/2016) 3

  • Outline

    • Motivation• User-LevelThreads(ULTs)• Argobots• BOLT:OpenMP overLightweightThreads• Summary

    4thXMPWorkshop(11/7/2016) 4

  • User-Level Threads (ULTs)

    • Whatisuser-levelthread(ULT)?– Providesthreadsemanticsinuserspace– Executionmodel:cooperativetimesharing

    • MorethanoneULTcanbemappedtoasinglekernelthread

    • ULTsonthesameOSthreaddonotexecuteinparallel– Canbeimplementedwithcoroutines

    • Enableexplicitsuspendandresumeofitsprogressbypreservingexecutionstate

    • SomelanguagessuchasPythonandGousecoroutines forasynchronousI/O

    timeline

    Context switch

    Context switch

    ULT1

    ULT2

    Core Core Core Core Core Core Core Core

    ULTs :

    Kernel threads :

    4thXMPWorkshop(11/7/2016) 5

  • User-Level Threads (ULTs)

    • WhyULTs?– Conventionalthreads(e.g.,Pthreads)aretooexpensiveto

    expressmassiveparallelism– IfwecreatePthreadsmorethan# ofcores

    (i.e.,oversubscription):• Context-switchandsynchronizationoverheadwillincreasedramatically

    – ULTscanmitigatehighoverheadofPthreadsbutneedexplicitcontrol

    • Wheretouse?– Tobetteroverlapcomputationandcommunication/IO

    • Lowcontext-switchingoverheadofULTscangivemoreopportunitiestohidecommunication/IOlatency

    – Toexploitfine-grainedtaskparallelism• LightweightULTsaremoresuitabletoexpressmassivetaskparallelism

    U

    U

    U

    OSthread

    4thXMPWorkshop(11/7/2016) 6

  • Pthreads vs. ULTs

    • Averagetimeforcreatingandjoiningonethread• Pthread:6.6us- 21.2us(avg.34,953cycles)• ULT(Argobots):78ns- 130ns(avg.191cycles)• ULTis64x- 233xfaster thanPthread

    – HowfastisULT?• L1$access:1.112ns,L2$access:5.648ns,memoryaccess:18.4ns• Contextswitch(2processes):1.64us

    1

    10

    100

    1000

    10000

    100000

    1 2 4 8 16 32 64 128 256 512 1024 2048

    Avg.Create&

    JoinTime/thread

    (ns)

    NumberofThreads

    Pthread ULT(Argobots)

    *measuredusingLMbench3

    4thXMPWorkshop(11/7/2016) 7

  • Growing Interests in ULTs

    • ULTandtasklibraries– Conversethreads,Qthreads,MassiveThreads,Nanos++,

    Maestro,GnuPth,StackThreads/MP,Protothreads,Capriccio,StateThreads,TiNy-threads,etc.

    • OSsupports– Windowsfibers,Solaristhreads

    • Languageandprogrammingmodels– Cilk,OpenMP task,C++11task,C++17coroutine proposal,

    Stackless Python,Gocoroutines,etc.

    • Pros– EasytousewithPthreads-likeinterface

    • Cons– Runtimetriestodosomethingsmart(e.g.,work-stealing)– Thismayconflictwiththecharacteristicsanddemandsof

    applications

    4thXMPWorkshop(11/7/2016) 8

  • Argobots

    Overview• Separationofmechanismsandpolicies• Massiveparallelism

    – Exec.Streams guaranteeprogress– WorkUnits executetocompletion

    • User-levelthreads(ULTs)vs.Tasklets• Clearlydefinedmemorysemantics

    – Consistencydomains• ProvideEventualConsistency

    – Softwarecanmanageconsistency

    ArgobotsInnovations• Enablingtechnology,butnotapolicymaker

    – High-levellanguages/librariessuchasOpenMPorCharm++havemoreinformationabouttheuserapplication(datalocality,dependencies)

    • Explicitmodel:– Enablesdynamism,butalwaysmanaged

    byhigh-levelsystems

    Argobots

    coreProcessor

    Programming Models(MPI, OpenMP, Charm++, PaRSEC, …)

    U User-Level Thread T TaskletLightweightWork Units

    Exec

    utio

    n St

    ream

    Private pool Private poolShared pool

    U U

    U T

    TTU TU

    Exec

    utio

    n St

    ream

    Exec

    utio

    n St

    ream

    Alightweightlow-levelthreadingandtaskingframework(http://www.mcs.anl.gov/argobots/)

    9

    *Currentteammembers:PavanBalaji,SangminSeo,Halim Amer (ANL),L.Kale,NitinBhat (UIUC)

    4thXMPWorkshop(11/7/2016)

  • Argobots Execution Model

    • ExecutionStreams(ES)– Sequentialinstructionstream

    • Canconsistofoneormoreworkunits– Mappedefficientlytoahardware

    resource– Implicitlymanagedprogresssemantics

    • OneblockedEScannotblockotherESs

    • User-levelThreads(ULTs)– Independentexecutionunitsinuser

    space– AssociatedwithanESwhenrunning– Yieldableandmigratable– Canmakeblockingcalls

    • Tasklets– Atomicunitsofwork– Asynchronouscompletionvia

    notifications– Notyieldable,migratablebefore

    execution– Cannotmakeblockingcalls

    S

    Scheduler Pool

    U

    ULT

    T

    Tasklet

    E

    Event

    ES1 Sched

    U

    U

    E

    E

    E

    E

    U

    S

    S

    T

    T

    T

    T

    T

    Argobots Execution Model

    ...

    ESn

    • Scheduler– Stackableschedulerwithpluggable

    strategies• Synchronizationprimitives

    – Mutex,conditionvariable,barrier,future• Events

    – Communicationtriggers

    4thXMPWorkshop(11/7/2016) 10

  • Explicit Mapping ULT/Tasklet to ES

    • TheuserneedstomapworkunitstoESs• Nosmartscheduling,nowork-stealingunlesstheuserwants

    touse

    ES1

    U0

    U1

    T1

    T2

    U2

    U3

    ES2

    U4

    U5

    • Benefits– Allowlocalityoptimization

    • ExecuteworkunitsonthesameES– NoexpensivelockisneededbetweenULTsonthesameES

    • Theydonotruninparallel• Aflagisenough

    4thXMPWorkshop(11/7/2016) 11

  • Stackable Scheduler with Pluggable Strategies

    • AssociatedwithanES• CanhandleULTsandtasklets• Canhandleschedulers

    – Allowstostackschedulershierarchically• Canhandleasynchronousevents• Userscanwriteschedulers

    – Providesmechanisms,notpolicies– Replacethedefaultscheduler

    • E.g.,FIFO,LIFO,PriorityQueue,etc.• ULTcanexplicitlyyieldto anotherULT

    – Avoidscheduleroverhead

    Sched

    U

    U

    E

    E

    E

    E

    U

    S

    S

    T

    T

    T

    T

    T

    U S U U U

    yield() yield_to(target)

    4thXMPWorkshop(11/7/2016) 12

  • Performance: Create/Join Time

    • Idealscalability– IftheULTruntimeisperfectlyscalable,thetimeshouldbethesame

    regardlessofthenumberofESs

    10

    100

    1000

    10000

    1 2 4 8 16 24 32 36 40 48 56 64 72

    Create/JoinTimepe

    rULT(cycles)

    NumberofExecutionStreams(Workers)

    Qthreads MassiveThreads(H) MassiveThreads(W)

    Argobots(ULT) Argobots(Tasklet)

    4thXMPWorkshop(11/7/2016) 13

  • Argobots’ Position

    NodeOS

    Argobots Comm.Lib.

    High-Level Programming Models/LibrariesDomain SpecificLanguages(DSLs)

    Applications

    ...

    NodeOS

    Argobots Comm.Lib.

    Argobotsisalow-levelthreading/taskingruntime!

    4thXMPWorkshop(11/7/2016) 14

  • Argobots Ecosystem

    ES1 Sched

    U

    U

    E

    E

    E

    E

    U

    S

    S

    T

    T

    T

    T

    T

    Argobots

    ...

    ESn

    MPI+Argobots

    ULT

    ES

    ULT

    ES

    MPI

    Argobots runtime

    Communication libraries

    Charm++

    Applications

    Charm++Cilk “Worker”

    ArgobotsES

    RWSULT

    FusedULT1

    FusedULT2

    FusedULTN

    …CilkBots

    PO

    GE

    TR

    SYTR TR

    PO GE GE

    TR TR

    SYSY GE

    PO

    TR

    SY

    PO

    SY SY

    PaRSEC

    OpenMP MercuryRPC

    Origin

    Target

    RPCproc

    RPCproc

    OmpSs

    TASCEL,XMP,ROSE,GridFTP,Kokkos,RAJA,etc.MoreConnections

    4thXMPWorkshop(11/7/2016) 15

  • OpenMP

    • Directivebasedprogrammingmodel• Commonlyusedforshared-memoryprogramminginanode• Manydifferentimplementations

    – TypicallyontopofPthreadslibrary– Intel,GCC,Clang,IBM,etc.

    Sequentialcodefor (i = 0; i < N; i++) {

    do_something();}

    OpenMPcode#pragma omp parallel forfor (i = 0; i < N; i++) {

    do_something();}

    zz

    zzzz

    164thXMPWorkshop(11/7/2016)

  • Nested Parallel Loop: Microbenchmark

    17

    AthreadforeachCPUiscreatedbydefault

    Eachthreadexecutesaportion

    Eachthreadcreatesmorethreadsforthesecondloop

    Eachinnerthreadexecutesaportion

    int in[1000][1000], out[1000][1000];

    #pragma omp parallel for

    for (i = 0; i < 1000; i++) {

    lib_compute(i);

    }

    lib_compute(int x)

    {

    #pragma omp parallel for

    for (j = 0; j < 1000; j++)

    out[x][j] = compute(in[x][j]);

    }

    Contribution:AdrianCastello(Universitat Jaume I)

    4thXMPWorkshop(11/7/2016)

  • Nested Parallel Loop: Implementations

    • GCC– Doesnotreusetheidlethreadsinnestedparallelconstructs– Allthreadteamsinsideaparallelregionneedtobecreated

    • ICC– Reuseidlethreads

    • Iftherearenotmorethreadsavailable,newthreadsarecreated• AllcreatedthreadsareOSthreadsandaddoverhead

    • ImplementationusingArgobots– CreatesULTsortasklets forbothouterloopandinnerloop

    18

    ES0

    ESN-1…

    WU

    Outerloopsynchronizationpoint

    S

    S

    S

    S

    WU

    WU

    WU WU

    WU WU WU

    OneESforeachcoreWorkunitfortheouterloop

    Eachworkunitexecutesaportionoftheinnerloop

    Innerloopsynchronizationpoint

    4thXMPWorkshop(11/7/2016)

  • 0.00

    0.01

    0.10

    1.00

    10.00

    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35Time(s)

    #OMPThreads|ArgobotsULTs/tasks(innerloop)

    ICC/Pthreads ICC/ArgobotsULTs ICC/Argobotstasks

    0.00

    0.01

    0.10

    1.00

    10.00

    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

    Time(s)

    #OMPThreads|ArgobotsULTs/tasks(innerloop)

    GCC/Pthreads GCC/ArgobotsULTs GCC/Argobotstasks

    Nested Parallel Loop: Performance

    GCCOpenMPimplementationdoesnotreuseidlethreadsinnestedparallelregions,alltheteamsofthreadsneedtobecreatedineachiteration

    Executiontimefor36threadsintheouterloop

    SomeoverheadisaddedbycreatingULTsinsteadoftasks

    Lowerisbetter

    Lowerisbetter

    194thXMPWorkshop(11/7/2016)

  • Nested Parallel Loop: Analysis

    • Howdoeseachimplementationmanagethethreadsinnestedparallelregions?

    Imple. #CreatedThreads #ReusedThreads # CreatedULTs

    # CreatedTasks

    GCC C+N*(T-1) =35,036 0 --- ---

    ICC C+C*(T-1)=1,296 (N-C)*(T-1) =33,740 --- ---

    Argobots tasks C=36 0 C=36 N*T=36,000

    § Parameters:• C:numberofcores(orthreadscreatedbyuseratthebeginning)• N:numberofiterationsoftheouterloop• M:numberofiterationsoftheinnerloop• T:numberofthreadsfortheinnerloop

    Sequentialcreation Parallelcreation

    Example->C:36,N:1,000,M:1,000,T:36

    204thXMPWorkshop(11/7/2016)

  • BOLT: A Lightning-Fast OpenMP Implementation

    • AboutBOLT– BOLTisarecursiveacronymthatstandsfor

    "BOLTisOpenMP overLightweightThreads"– https://www.mcs.anl.gov/bolt/

    • Objective– OpenMP frameworkthatexploitslightweightthreadsandtasks

    21

    ImprovedNestedMassiveParallelism

    EnhancedFine-GrainedTaskParallelism

    BetterInteroperabilitywithMPIandOtherInternodeProgrammingModels

    4thXMPWorkshop(11/7/2016)

  • Approach & Development

    • Basicapproach– CompilersimplygeneratesruntimeAPIcalls,whiletheruntime

    createsULTs/tasklets andmanagesthemoverafixedsetofcomputationalresources

    – UseArgobots astheunderlyingthreadingandtaskingmechanism– ABIcompatibilitywithIntelOpenMP compilers,LLVM/Clang,andGCC

    (i.e.,canbeusedwiththesecompilers)• Development

    – Runtime• BasedonIntelOpenMP RuntimeAPI• GeneratesArgobotsworkunitsfromOpenMP pragmas• CangenerateULTsortasklets dependingoncodecharacteristics

    – Compiler(planned)• LLVM/Clang• Passescharacteristicsofparallelregionortask(e.g.,existenceofblocking

    calls)totheruntime• Extendspragmaswiththeoption“nonblocking”

    224thXMPWorkshop(11/7/2016)

  • BOLT Execution Model

    • OpenMP threadsandtasksaretranslatedintoArgobotsworkunits(i.e.,ULTsandtasklets)

    • Sharedpoolsareutilizedtohandlenestedparallelism• AcustomizedArgobotsschedulermanagesschedulingofworkunits

    acrossexecutionstreams

    23

    T TU T

    OpenMP

    Argobots

    U

    ULT

    T

    Tasklet

    #pragmaomp parallel #pragmaomp task

    U U

    ExecutionStream

    CPUcore

    PrivatePool

    U TU T USharedPool

    CPU

    OpenMPthreads

    OpenMPtasks

    4thXMPWorkshop(11/7/2016)

  • Prototype Implementation of BOLT Runtime

    • BasedonIntel’sopen-sourceOpenMP runtime– http://openmp.llvm.org/

    • KepttheoriginalruntimeAPIfortheABIcompatibility• Designedandimplementedthethreadinglayerusing

    Argobots andmodifiedtheruntimeinternallayer

    24

    ThreadingLayer

    Pthreads

    RuntimeAPILayer

    RuntimeInternalLayer

    Argobots

    4thXMPWorkshop(11/7/2016)

  • OpenMP Pragma Translation

    1. AsetofN threadsiscreatedatruntime– Iftheyhavenotbeencreatedyet– CommonlyasmanyasthenumberofCPUcores

    2. Thenumberofiterationsisdividedbetweenallthethreads3. Asynchronizationpointisaddedaftertheforloop

    – Implicitbarrierattheendofparallelfor

    #pragma omp parallel for (1,2)for (i = 0; i < N; i++) {

    do_something();} (3)

    254thXMPWorkshop(11/7/2016)

  • OpenMP Compiler & BOLT Runtime

    __kmpc_fork_call(…){

    }

    __kmp_fork_call(…)

    __kmp_join_call(…)

    IntelOpenMPRuntimeAPI

    • CreateExecutionStreams(ifneeded)• AddaULTortasklettoeachES• Launchthework

    • Join workunitscreated

    BOLTruntime

    #pragmaompparallelClangandIntelcompiler

    264thXMPWorkshop(11/7/2016)

  • parallel for

    #pragma omp parallel forfor (i = 0; i < N; i++) {

    do_something();}

    Createsthreads

    Dividesalliterationsamongthreads

    Synchronizationpoint

    ES0

    ES1

    ESK

    WU

    WU

    WU

    Eachworkunitexecutesaportionoftheforloop

    ImplementationusingArgobots

    S

    S

    S

    Asynchronizationpointisadded

    OneExecutionStreamforeachCPU core(orhardwarethread)

    274thXMPWorkshop(11/7/2016)

  • OpenUH OpenMP Validation Suite 3.1

    GCC 6.1 ICC17.0.0 +IntelOpenMP

    ICC 17.0.0+BOLTruntime(Argobots)

    BOLT(clang+Argobots)

    #oftestedOpenMPconstructs 62 62 62 62

    #ofusedtests 123 123 123 123

    #ofsuccessful tests 118 118 122 112

    #offailedtests 5 5 1 1

    Passrate(%) 95.9 95.9 99.2 99.2

    28

    • TheBOLTprototypefunctionallyworkswell!

    4thXMPWorkshop(11/7/2016)

  • Nested Parallel Loop Microbenchmark

    29

    10

    100

    1000

    10000

    100000

    1000000

    10000000

    100000000

    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

    ExecutionTime(us)

    #ofThreadsfortheInnerLoop

    gcc6.1.0 icc17.0.0 BOLT(ULT) BOLT(ULT+tasklet) Argobots(ULT)

    *Thenumberofthreadsfortheouterloopwasfixedat36.

    4thXMPWorkshop(11/7/2016)

  • Application Study: KIFMM

    • Kernel-IndependentFastMultipoleMethod(KIFMM)– Offloaddgemv operationstoIntelMKL

    • EvaluatedtheefficiencyofthenestedparallelismsupportinIntelOpenMPandBOLTduringtheDownwardstage– 9threadsfortheapplication(outerparallelregion)

    30

    0

    2

    4

    6

    8

    10

    12

    14

    1 2 4 8

    ExecutionTime(s)

    #ofThreadsforIntelMKL

    IntelOMP:core-close IntelOMP:core-true IntelOMP:no-binding BOLT

    4thXMPWorkshop(11/7/2016)

  • Application Study: ACME mini-app

    • ACME(AcceleratedClimateModelingforEnergy)– ImplementingadditionallevelsofparallelismthroughOpenMP

    nestedparallelloopsforupcomingmany-coremachines• Preliminaryresultsoftestingthetransport_semini-appversion

    ofHOMME(ACME’sCAM-SEdycore)

    31

    0%

    20%

    40%

    60%

    80%

    100%

    120%

    H=16,V=1 H=8,V=2 H=4,V=4 H=4,V=8 H=8,V=4

    Normalize

    dExecutionTime(%

    )

    ACMEmini-app(transport_se)

    ICC+IntelOpenMP(15.0.0) ICC+BOLT(Argobots)

    Lowerisbetter(upto3.16x

    faster)

    oversubscription

    4thXMPWorkshop(11/7/2016)

  • Summary

    • Massiveon-nodeparallelismisinevitable– Needruntimesystemsutilizingsuchparallelism

    • User-levelthreads(ULTs)– Lightweightthreadsmoresuitableforfine-graineddynamic

    parallelismandcomputation-communicationoverlap• Argobots

    – Alightweightlow-levelthreading/taskingframework– Providesefficientmechanisms,notpolicies,tousers(library

    developersorcompilers)• Theycanbuildtheirownsolutions

    • BOLT:OpenMP overLightweightThreads– MoreefficientsupportofnestedparallelismwithArgobotsULTs

    andtasklets– PreliminaryresultsshowthatBOLTispromising

    4thXMPWorkshop(11/7/2016) 32

  • Argo Concurrency Team

    • ArgonneNationalLaboratory(ANL)– Pavan Balaji (co-lead)– SangminSeo– Abdelhalim Amer– PeteBeckman(PI)

    • UniversityofIllinoisatUrbana-Champaign(UIUC)– Laxmikant Kale(co-lead)– MarcSnir– NitinKundapur Bhat

    • UniversityofTennessee,Knoxville(UTK)– GeorgeBosilca– ThomasHerault– DamienGenet

    • PacificNorthwestNationalLaboratory(PNNL)– Sriram Krishnamoorthy

    PastTeamMembers:• CyrilBordage (UIUC)• Prateek Jindal(UIUC)• JonathanLifflander (UIUC)• EstebanMeneses

    (UniversityofPittsburgh)• Huiwei Lu(ANL)• Yanhua Sun(UIUC)

    4thXMPWorkshop(11/7/2016) 33

  • BOLT Collaborations

    • Maintainers– ArgonneNationalLaboratory

    • Sangmin Seo• Abdelhalim Amer• Pavan Balaji

    • Contributors– Universitat Jaume IdeCastelló

    • Adrián Castelló• RafaelMayo• EnriqueS.Quintana-Ortí

    – BarcelonaSupercomputingCenter(BSC)• AntonioJ.Peña• JesusLabarta

    – RIKEN• Jinpil Lee• Mitsuhisa Sato

    344thXMPWorkshop(11/7/2016)

  • Try Argobots & BOLT

    • Argobots– http://www.mcs.anl.gov/argobots/– git repository

    • https://github.com/pmodels/argobots– Wiki

    • https://github.com/pmodels/argobots/wiki– Doxygen

    • http://www.mcs.anl.gov/~sseo/public/argobots/

    • BOLT– http://www.mcs.anl.gov/bolt/– git repository

    • https://github.com/pmodels/bolt-runtime

    354thXMPWorkshop(11/7/2016)

  • Funding AcknowledgmentsFundingGrantProviders

    InfrastructureProviders

    4thXMPWorkshop(11/7/2016)

  • Programming Models and Runtime Systems GroupGroupLead– PavanBalaji(computerscientistandgroup

    lead)

    CurrentStaffMembers– Abdelhalim Amer (postdoc)– Yanfei Guo (postdoc)– RobLatham(developer)– LenaOden(postdoc)– KenRaffenetti (developer)– Sangmin Seo (assistantcomputerscientist)– MinSi(postdoc)

    PastStaffMembers– AntonioPena(postdoc)– WesleyBland(postdoc)– DariusT.Buntinas (developer)– JamesS.Dinan (postdoc)– DavidJ.Goodell (developer)– Huiwei Lu(postdoc)– MinTian(visitingscholar)– Yanjie Wei(visitingscholar)– Yuqing Xiong (visitingscholar)– JianYu(visitingscholar)– Junchao Zhang(postdoc)– Xiaomin Zhu(visitingscholar)

    – Ashwin Aji (Ph.D.)– Abdelhalim Amer (Ph.D.)– Md.Humayun Arafat(Ph.D.)– AlexBrooks(Ph.D.)– AdrianCastello (Ph.D.)– Dazhao Cheng(Ph.D.)– Hoang-VuDang(Ph.D.)– JamesS.Dinan (Ph.D.)– Piotr Fidkowski (Ph.D.)– PriyankaGhosh(Ph.D.)– Sayan Ghosh (Ph.D.)– RalfGunter(B.S.)– Jichi Guo (Ph.D.)– Yanfei Guo (Ph.D.)– MariusHorga (M.S.)– JohnJenkins(Ph.D.)– Feng Ji (Ph.D.)– PingLai(Ph.D.)– Palden Lama(Ph.D.)– YanLi(Ph.D.)– Huiwei Lu(Ph.D.)– JintaoMeng (Ph.D.)– GaneshNarayanaswamy (M.S.)– Qingpeng Niu (Ph.D.)– Ziaul Haque Olive(Ph.D.)

    – DavidOzog (Ph.D.)– Renbo Pang(Ph.D.)– Nikela Papadopoulou (Ph.D)– Sreeram Potluri (Ph.D.)– Sarunya Pumma (Ph.D)– LiRao (M.S.)– Gopal Santhanaraman (Ph.D.)– ThomasScogland (Ph.D.)– MinSi(Ph.D.)– BrianSkjerven (Ph.D.)– RajeshSudarsan (Ph.D.)– LukaszWesolowski (Ph.D.)– Shucai Xiao(Ph.D.)– Chaoran Yang(Ph.D.)– Boyu Zhang(Ph.D.)– Xiuxia Zhang(Ph.D.)– XinZhao(Ph.D.)

    Advisory Board– PeteBeckman(seniorscientist)– RustyLusk(retired,STA)– MarcSnir (divisiondirector)– RajeevThakur(deputydirector)

    CurrentandRecentStudents

    4thXMPWorkshop(11/7/2016)

  • Q&A

    • Thankyouforyourattention!

    Questions?

    4thXMPWorkshop(11/7/2016)