for most users and applicaons, using default sengs work · 2010. 11. 3. · gnu so‐so fortran,...

35

Upload: others

Post on 19-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte
Page 2: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Formostusersandapplica1ons,usingdefaultse5ngsworkverywell

  Foruserswhowanttoexperimenttogetthebestperformancetheycan,thefollowingpresenta1ongivesyousomeinforma1ononcompilersandse5ngstotry  Whileitdoesn’tcoverabsolutelyeverything,thepresenta7ontriestoaddresssomeofthetunableparameterswhichwehavefoundtoprovideincreasedperformanceinthesitua7onsdiscussed

Cray Inc. Preliminary and Proprietary 2

Page 3: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

xtpe‐mc12Ifnomoduleisloaded,andno‘arch’specifiedinthecompilerop;ons,thecompilersdefaulttothenodetypeonwhichthecompilerisrunning:Whichmaynotbethesameasthecompute

nodes!

Cray Inc. Preliminary and Proprietary 3

Page 4: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

Thebestcompilerisnotthesameforeveryapplica1on

Cray Inc. Preliminary and Proprietary 4

Page 5: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  PGI–VerygoodFortran,okayCandC++  Goodvectoriza7on  Goodfunc7onalcorrectnesswithop7miza7onenabled  Goodmanualandautoma7cprefetchcapabili7es  VeryinterestedintheLinuxHPCmarket,althoughthatisnottheironlyfocus  Excellentworkingrela7onshipwithCray,goodbugresponsiveness

  Pathscale–GoodFortran,C,probablygoodC++  Outstandingscalarop7miza7onforloopsthatdonotvectorize  FortranfrontendusesanolderversionoftheCCEFortranfrontend  OpenMPusesanon‐pthreadsapproach  Scalarbenefitswillnotgetasmuchmileagewithlongervectors

  (NotNERSCsupported)Intel–GoodFortran,excellentCandC++(ifyouignorevectoriza1on)  Automa7cvectoriza7oncapabili7esaremodest,comparedtoPGIandCCE  Useofinlineassemblyisencouraged  Focusismoreonbestspeedforscalar,non‐scalingapps  TunedforIntelarchitectures,butactuallyworkswellforsomeapplica7onson

AMD

…from Cray’s Perspective

5 Cray Inc. Preliminary and Proprietary

Page 6: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  GNUso‐soFortran,outstandingCandC++(ifyouignorevectoriza1on)  Obviously,thebestforgcccompatability  Scalarop7mizerwasrecentlyrewriPenandisverygood  Vectoriza7oncapabili7esfocusmostlyoninlineassembly  Notethelastthreereleaseshavebeenincompa7blewitheachother(4.3,4.4,

and4.5)andrequiredrecompila7onofFortranmodules  CCE–OutstandingFortran,verygoodC,andokayC++

  Verygoodvectoriza7on  VerygoodFortranlanguagesupport;onlyrealchoiceforCoarrays  Csupportisquitegood,withUPCsupport  Verygoodscalarop7miza7onandautoma7cparalleliza7on  Cleanimplementa7onofOpenMP3.0,withtasks  SoledeliveryfocusisonLinux‐basedCrayhardwaresystems  Bestbugturnaround7me(ifitisn’t,letusknow!)  Cleanestintegra7onwithotherCraytools(performancetools,debuggers,

upcomingproduc7vitytools)  Noinlineassemblysupport

6 Cray Inc. Preliminary and Proprietary

…from Cray’s Perspective

Page 7: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Usedefaultop1miza1onlevels  It’stheequivalentofmostothercompilers–O3or–fast

  Use–O3,fp3(or–O3–hfp3,orsomevaria1on)  ‐O3onlygivesyouslightlymorethan–O2  ‐hfp3givesyoualotmorefloa7ngpointop7miza7on,esp.32‐bit

  Ifanapplica1onisintolerantoffloa1ngpointreassocia1on,tryalower–hfpnumber–try–hfp1first,only–hfp0ifabsolutelynecessary  MightbeneededforteststhatrequirestrictIEEEconformance  Orapplica7onsthathave‘validated’resultsfromadifferentcompiler

  Donotsuggestusing–Oipa5,‐Oaggress,andsoon–highernumbersarenotalwayscorrelatedwithbeVerperformance

  Compilerfeedback:‐rm(Fortran)‐hlist=m(C)  Ifyouknowyoudon’twantOpenMP:‐xompor‐Othread0  mancray[n;mancraycc;mancrayCC

7 Cray Inc. Preliminary and Proprietary

Page 8: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  PGI  ‐fast–Mipa=fast(,safe)  Ifyoucanbeflexiblewithprecision,alsotry‐Mfprelaxed  Compilerfeedback:‐Minfo=all‐Mneginfo  manpgf90;manpgcc;manpgCC;orpgf90‐help

  Pathscale  ‐OfastNote:thisisaliPlelooserwithprecisionthanothercompilers  Compilerfeedback:‐LNO:simd_verbose=ON  maneko(“EveryKnownOp7miza7on”)

  GNU  ‐O3–ffast‐math–funroll‐loops  Compilerfeedback:‐iree‐vectorizer‐verbose=2  mangfortran;mangcc;mang++

  Intel  ‐fast  Compilerfeedback:  manifort;manicc;maniCC

Cray Inc. Preliminary and Proprietary 8

Page 9: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

Usethextpe‐mc12moduleanditisallautoma1c

TheOpenMPthreadedBLAS/LAPACKlibraryisthedefaultifthextpe‐mc12moduleisloaded.Theserialversionisusedif

‘OMP_NUM_THREADS’isnotsetorsetto1.

Cray Inc. Preliminary and Proprietary 9

Page 10: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

ExperimentwiththeEnvironmentVariableGrabBag

Cray Inc. Preliminary and Proprietary 10

Page 11: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  OnlyrelevantformixedMPI/SHMEM/UPC/CAFcodes  NormallywanttoleaveenabledsoMPICH2andDMAPPcansharethe

samememoryregistra1oncache  Mayhavetodisableforcodesthatcallshmem_inita[erMPI_Init.  Mayhavetosettodisableifonegetsatracebacklikethis:

Cray Inc. Preliminary and Proprietary 11

Rank 834 Fatal error in MPI_Alltoall: Other MPI error, error stack: MPI_Alltoall(768).......................: MPI_Alltoall(sbuf=0x2aab9c301010, scount=2596, MPI_DOUBLE, rbuf=0x2aab7ae01010, rcount=2596, MPI_DOUBLE, comm=0x84000004) failed MPIR_Alltoall(469)......................: MPIC_Isend(453).........................: MPID_nem_lmt_RndvSend(102)..............: MPID_nem_gni_lmt_initiate_lmt(580)......: failure occurred while attempting to send RTS packet MPID_nem_gni_iStartContigMsg(869).......: MPID_nem_gni_iSendContig_start(763).....: MPID_nem_gni_send_conn_req(626).........: MPID_nem_gni_progress_send_conn_req(193): MPID_nem_gni_smsg_mbox_alloc(357).......: MPID_nem_gni_smsg_mbox_block_alloc(268).: GNI_MemRegister GNI_RC_ERROR_RESOURCE)‏

Page 12: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Defaultis8192bytes  Maximumsizemessagethatcangothroughtheeagerprotocol.  Mayhelpforappsthataresendingmediumsizemessages,anddobeVer

whenlooselycoupled.Doesapplica1onhavealargeamountof1meinMPI_Waitall?Se5ngthisenvironmentvariablehighermayhelp.

  Maxvalueis131072bytes.  Rememberforthispathithelpstopre‐postreceivesifpossible.  Notethata40‐byteCH3headerisincludedwhenaccoun1ngforthe

messagesize.

Cray Inc. Preliminary and Proprietary 12

Page 13: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Defaultisenabled  Controlswhetherornottousealazymemoryderegistra1onpolicyinside

UDREG.  Mayhelpforappsthat

  TransferringalotofmessagesviaLMTpath  Anduse4KBpages

  Disablingresultsinasignificantdropinmeasuredbandwidthforlargetransfers~40‐50%.

  MaybebeVerofftryingtouselargepagesthanse5ngthisto'disabled'.

Cray Inc. Preliminary and Proprietary 13

Page 14: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Defaultis6432Kbuffers(2Mtotal)  Controlsnumberof32KDMAbuffersavailableforeachranktouseinthe

Eagerprotocoldescribedearlier  Mayhelptomodestlyincrease.Butotherresourcesconstrainthe

usabilityofalargenumberofbuffers.

Cray Inc. Preliminary and Proprietary 14

Page 15: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Defaultvalueis1024.  ControlsthethresholdatwhichtheGNInetmodswitchesfromusingFMA

forRDMAread/writeopera1onstousingtheBTE.  SinceBTEismanagedinthekernel,BTEini1atedRDMArequestscan

progresseveniftheapplica1onisn'tinMPI,allowingpossiblyforslightlybeVerchancesofge5ngsomeoverlapofcommunica1onwithcomputa1on.

  OwingtoOpteron/HTquirks,theBTEiso[enbeVerformovingdatato/frommemoriesthatarefartherfromtheGemini.

Cray Inc. Preliminary and Proprietary 15

Page 16: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Defaultvalueis8192bytes.  Specifiesthresholdatwhichthesharedmemorychannelswitchestoa

single‐copy(XPMEM)protocolforintra‐nodemessagesfromadoublecopyprotocol.

Cray Inc. Preliminary and Proprietary 16

Page 17: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Star1ngwithMPT5.0.2,thedefaultissinglecopyviaXPMEMisenabled(=‘0’).InolderversionssinglecopyviaXPMEMisdisabled(=‘1’).

  SpecifieswhetherornottouseaXPMEM‐basedsingle‐copyprotocolforintra‐nodemessagesofsizeMPICH_SMP_SINGLE_COPY_SIZEbytesorlarger.

Cray Inc. Preliminary and Proprietary 17

Page 18: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

Thisallowsformoreasyncmessagetransfer.

Buttheaddi1onalcopyonthereceivingsidemayoffsetthegain.

Thisisimportantenoughthatwearemen;oningtwice!

Cray Inc. Preliminary and Proprietary 18

Page 19: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

MemoryAlloca1on:Makeitlocal

Cray Inc. Preliminary and Proprietary 19

Page 20: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Linuxhasa“firsttouchpolicy”formemoryalloca1on  *allocfunc7onsdon’tactuallyallocateyourmemory  Memorygetsallocatedwhen“touched”

  Problem:Acodecanallocatemorememorythanavailable  Linuxassumes“swapspace,”wedon’thaveany  Applica7onswon’tfailfromover‐alloca7onun7lthememoryisfinallytouched

  Problem:Memorywillbeputonthecoreofthe“touching”thread  Onlyaproblemifthread0allocatesallmemoryforanode

  Solu1on:Alwaysini1alizeyourmemoryimmediatelya[eralloca1ngit  Ifyouover‐allocate,itwillfailimmediately,ratherthanastrangeplaceinyour

code  Ifeverythreadtouchesitsownmemory,itwillbeallocatedontheproper

socket/die.

Cray Inc. Preliminary and Proprietary 20

Page 21: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

Isyournearestneighborreallyyournearestneighbor?Anddoyouwantthemtobeyournearestneighbor?

Cray Inc. Preliminary and Proprietary 21

Page 22: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Thedefaultorderingcanbechangedusingthefollowingenvironmentvariable:  MPICH_RANK_REORDER_METHOD

  Thesearethedifferentvaluesthatyoucansetitto:  0:Round‐robinplacement–Sequen7alranksareplacedonthenextnodeinthe

list.Placementstartsoverwiththefirstnodeuponreachingtheendofthelist.  1:(DEFAULT)SMP‐styleplacement–Sequen7alranksfillupeachnodebefore

movingtothenext.  2:Foldedrankplacement–Similartoround‐robinplacementexceptthateach

passoverthenodelistisintheoppositedirec7onofthepreviouspass.  3:Customordering.Theorderingisspecifiedinafilenamed

MPICH_RANK_ORDER.  Whenisthisuseful?

  Point‐to‐pointcommunica7onconsumesasignificantfrac7onofprogram7meandaloadimbalancedetected

  Alsoshowntohelpforcollec7ves(alltoall)onsubcommunicators  SpreadoutIOacrossnodes

Cray Inc. Preliminary and Proprietary 22

Page 23: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  GYRO8.0  B3‐GTCproblemwith1024processes

  RunwithalternateMPIorderings

Cray Inc. Preliminary and Proprietary 23

Reordermethod Comm.1me

1–SMP(Default) 11.26s

0–round‐robin 6.94s

2–folded‐rank 6.68s

Note:• Therankreorderingonlyworksonnodes.Ifyouwanttopackwithinanodeinaspecialwayusetheaprun–cc‘cpulist’.• Hencetogetabitmoreoutofthefolded‐rankop7onuseaprun–cc0,6,12,18,19,13,7,1,2,8,14,20,21,15,9,3,4,10,16,22,23,17,11,5Thisfoldsacrossnodes,andfoldswithindiesonanode.

Page 24: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  TGYRO1.0  SteadystateturbulenttransportcodeusingGYRO,NEO,TGLFcomponents

  ASTRAtestcase  TestedMPIorderingsatlargescale  Originallytes7ngweak‐scaling,butfoundreorderingveryuseful

Reordermethod

TGYROwall1me(min)

20480Cores

40960Cores

81920Cores

Default 99m 104m 105m

Round‐robin 66m 63m 72m Huge win!

Cray Inc. Preliminary and Proprietary 24

Page 25: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

Cray Inc. Preliminary and Proprietary 25

  Communica1onisnearest‐neighbor.(Haloexchange)

  Defaultorderingresultsin12x1x1blockoneachnode.(Istanbulexample,12cores)

  Acustomreorderingisnowgenerated:3x2x2blockspernode,resul1nginmoreon‐nodecommunica1on

Applica1ondataisina3Dspace,X*Y*Z

Note:Usingblocksorslabswithinanodemayhelpsomecommunica1ons.Fora6x4chunk,youcouldtrya46x1,or43x2chunks,witheachdiege5ngonechunk.

Lower is better

Page 26: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

X X o o

X X o o

o o o o

o o o o

  NodesmarkedXheavilyuseasharedresource

  Ifthesharedresourceis:  Memorybandwidth:scaPertheX's  Networkbandwidthtoothers,againscaPer

  Networkbandwidthamongthemselves,concentrate

  Checkoutpat_report,grid_order,andmgrid_orderforgenera1ngcustomrankordersbasedon:  Measureddata  Communica7onpaPerns  Datadecomposi7on

Cray Inc. Preliminary and Proprietary Slide 26

Page 27: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

GeminilovestouseHugepages

Cray Inc. Preliminary and Proprietary 27

Page 28: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  TheGeminiperformbeVerwithHUGEpagesthanwith4Kpages.

  HUGEpagesuselessGEMINIresourcesthan4kpages(fewerbytes).

  YourcodemayrunwithfewerTLBmisses(hencefaster).

Cray Inc. Preliminary and Proprietary 28

Page 29: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Linkinthehugetlbfslibraryintoyourcode‘‐lhugetlbfs’  SettheHUGETLB_MORECOREenvinyourrunscript.

  Example:exportHUGETLB_MORECORE=yes  Usetheaprunop1on–m###htoaskfor###MegofHUGEpages.  Example:aprun–m500h(Request500MegsofHUGEpagesasavailable,use4Kpagesthereaier)

  Example:aprun–m500hs(Request500MegsofHUGEpages,ifnotavailableterminatelaunch)

  Note:IfnotenoughHUGEpagesareavailable,thecostoffillingtheremainingwith4Kpagesmaydegradeperformance.

Cray Inc. Preliminary and Proprietary 29

Page 30: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

Butisn’tthatasystemcall?

TrickQues1on:No,butit’sexpensiveandshouldbedoneasinfrequentlyaspossible!

Cray Inc. Preliminary and Proprietary 30

Page 31: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  GNUmalloclibrary  malloc,calloc,realloc,freecalls

  Fortrandynamicvariables

  Malloclibrarysystemcalls  Mmap,munmap=>forlargeralloca7ons  Brk,sbrk=>increase/decreaseheap

  Malloclibraryop1mizedforlowsystemmemoryuse  Canresultinsystemcalls/minorpagefaults

Cray Inc. Preliminary and Proprietary 31

Page 32: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Detec1ng“bad”mallocbehavior  Profiledata=>“excessivesystem7me”

  Correc1ng“bad”mallocbehavior  Eliminatemmapusebymalloc  Increasethresholdtoreleaseheapmemory

  Useenvironmentvariablestoaltermalloc  MALLOC_MMAP_MAX_=0  MALLOC_TRIM_THRESHOLD_=536870912(orappropriatesize) (onlytrimsheapwhenthisamounttotalisfreed)

  Possibledownsides  Heapfragmenta7on  Userprocessmaycallmmapdirectly  Userprocessmaylaunchotherprocesses

  PGI’s–Msmartallocdoessomethingsimilarforyouatcompile1me

Cray Inc. Preliminary and Proprietary 32

Page 33: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

Op1on

‐D

‐n

‐N

‐m

‐d

‐S

‐ss

Debug(showsthelayoutaprunwilluse)

NumberofMPItasksNote:Ifyoudonotspecifythenumberoftaskstoaprun,thesystemwilldefault

to1.

NumberoftasksperNode

MemoryrequiredperTask

NumberofthreadsperMPITask.Note:IfyouspecifyOMP_NUM_THREADSbutdonotgivea–dop1on,aprunwill

allocateyourthreadstoasinglecore.YoumustuseOMP_NUM_THREADStospecifythenumberofthreadsperMPItask,andyoumustuse–dtotellaprunhowtoplacethosethreads

NumberofPestoallocateperNUMANode

StrictmemorycontainmentperNUMANode

Descrip1on

33 Cray Inc. Preliminary and Proprietary

Page 34: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Torunusing1376MPItaskswith4threadsperMPItask:exportOMP_NUM_THREADS=4aprun‐ss‐N4‐d4‐n1376./xhpl_mp

  Torunwithoutthreading:exportOMP_NUM_THREADS=1aprun–ss–N16‐n5504./xhpl_mp

Cray Inc. Preliminary and Proprietary 34

Page 35: For most users and applicaons, using default sengs work · 2010. 11. 3. · GNU so‐so Fortran, outstanding C and C++ (if you ignore vectorizaon ) Obviously ... Note that a 40‐byte

  Thankstothefollowingpeopleforcontribu1ngexper1seandslidestothispresenta1on:  JeffLarkin,CrayORNLCenterofExcellence  CrayPerformanceTeam  CrayMPTGroup  CrayProgrammingEnvironmentGroup

Cray Inc. Preliminary and Proprietary 35