![Page 1: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/1.jpg)
Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
IncreasingCoherenceBetweenSimulationandDataAnalyticsChesapeake Large Scale Data Analytics ConferenceAnnapolis, MDOctober 25, 2016 RobLeland
VicePresident,Science&TechnologyChiefTechnologyOfficerSandiaNationalLaboratories
SAND2016-10762 C
![Page 2: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/2.jpg)
Outline
2
§ Ataleoftwovisions
§ Somebackground
§ AchargefromtheNationalStrategicComputingInitiative
§ Answerstothreekeyquestions§ Whyisanincreasingcoherencebetweensimulationandanalyticsimportant?§ Whatisreallymeantby“increasingcoherence”betweenthetwo?§ Howmightcoherencebefurtheredinpractice?
§ Aunifyingvision
![Page 3: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/3.jpg)
Vision1:Fromascientificperspective
FromTheFourthParadigm:Data-IntensiveScientificDiscoverybyJimGray
Dataanalysiscomplementstheory,experiment,andcomputation
![Page 4: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/4.jpg)
GraphmatchingexampleofdataanalyticsAkeyanalyticprimitive-- usedtofindaspecificinstanceofanabstractpatternofinterest
FromCoffman,Greenblatt,andMarcus,Graph-BasedTechnologiesforIntelligenceAnalysis, CommunicationsoftheACM,47,March2004.
Vision2:Fromanationalsecurityperspective
![Page 5: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/5.jpg)
Somebackground
5
§ Simulation§ Computationstounderstandphysicalphenomenaorconductengineering
§ LargeScaleDataAnalytics(LSDA)§ DataAnalytics=Discoveringmeaningfulpatternsindata§ LargeScale=Requiringleading-edgeprocessingandstoragecapabilities
§ LSDAisincreasinginimportance§ Pervasive
§Commerce,finance,healthcare,science,engineering,nationalsecurity,...§ Lastingsocietalsignificance
§ Internetsearch,genomics,climatemodeling,Higgsparticle,...
§ LSDAisgetting“harder”§ Captureddatagrowingexponentiallywithtime§ Individualanalysisbecomingmoresophisticated§ Morepeopleexaminingmoredatamorefrequently§ AggregateworkgrowingmuchfasterthanMoore’sLaw
TheEconomist:
![Page 6: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/6.jpg)
NationalStrategicComputingInitiative(NSCI)
6
![Page 7: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/7.jpg)
NSCIStrategicObjectives
7
§ (1)Acceleratingdeliveryofacapableexascale computingsystemthatintegrateshardwareandsoftwarecapabilitytodeliverapproximately100timestheperformanceofcurrent10petaflopsystemsacrossarangeofapplicationsrepresentinggovernmentneeds.
§ (2)Increasingcoherencebetweenthetechnologybaseusedformodelingandsimulationandthatusedfordataanalyticcomputing.
§ (3)Establishing,overthenext15years,aviablepathforwardforfutureHPCsystemsevenafterthelimitsofcurrentsemiconductortechnologyarereached(the"post-Moore'sLawera").
§ (4)IncreasingthecapacityandcapabilityofanenduringnationalHPCecosystembyemployingaholisticapproachthataddressesrelevantfactorssuchasnetworkingtechnology,workflow,downwardscaling,foundationalalgorithmsandsoftware,accessibility,andworkforcedevelopment.
§ (5)Developinganenduringpublic-privatecollaborationtoensurethatthebenefitsoftheresearchanddevelopmentadvancesare,tothegreatestextent,sharedbetweentheUnitedStatesGovernmentandindustrialandacademicsectors.
![Page 8: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/8.jpg)
Q1:Whyisincreasingcoherencebetweensimulationandanalyticsimportant?
8
§ Forsimulation§ HPCsimulationmustrideonsomecommoditycurve§ Largermarketforcesbehindanalytics§ Canexploitcommoditycomponenttechnologyfromanalytics
§ Foranalytics§ LargeScaleDataAnalyticsproblemsbecomingevermoresophisticated§ Requiringmorecoupledmethods§ CanexploitarchitecturallessonsfromHPCsimulation
§ Forboth:Integrationofsimulationandanalyticsinthesameworkflow§ Automationofanalysisofdatafromsimulation§ Creationofsyntheticdataviasimulationtoaugmentanalysis§ Automatedgenerationandtestingofhypothesis§ Explorationofnewscientificandtechnicalscenarios§ ...
Mutualinspiration,technicalsynergy,andeconomiesofscaleinthecreation,deployment,anduseofHPCresources
![Page 9: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/9.jpg)
9
Achallengebecausesimulationandanalyticsdifferinmanyrespects…
![Page 10: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/10.jpg)
DatastructuresdescribingsimulationandanalyticsdifferGraphsfromsimulationsmaybeirregular,buthavemorelocalitythanthosederivedfromanalytics
ComputationalSimulationofphysicalphenomena:
Climatemodeling Carcrash
Internetconnectivity Yeastproteininteractions
LargeScaleDataAnalytics:
FiguresfromLelandet.al.courtesyofYelick,LBNL.
![Page 11: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/11.jpg)
TheU.S.roadmap,whichhasspatiallocalityandisthusmostsimilarofthethreeinstructuretocomputationalpatternsthatwouldariseintypicalphysicalsimulations.
Computationandcommunicationpatternsdiffer
Black =timespentcomputingGreen =timespentcommunicatingWhite =timespentwaitingfordatatobecommunicated
TheErdős-Rényi graph,awell-studiedexampleingraphtheorywork.
A scale-freegraph,anexamplemorereflectiveofreal-worldnetworks.
FigurefromLelandet.al.courtesyofJohnson,PNNL.
![Page 12: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/12.jpg)
Simulation
Analytics
Standardbenchmarksinclude:• LINPACK(smallestdataintensiveness;barelyvisibleongraph)• STREAM• SPECFP• SpecInt
MemoryperformancedemandsdifferAkeydifferentiatorintheperformanceofsimulationandanalytics
FigurefromMurphy&Kogge withadjustmenttodoubleradiusofLinpack datapointtomakeitvisible.
Areaofthecircle=relativedataintensiveness(i.e.totalamountofuniquedataaccessed overafixedintervalofinstructions)
Simulation
Analytics
![Page 13: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/13.jpg)
Applicationcodeproperty Simulation Analytics
Spatiallocality High Low
Temporallocality Moderate Low
Memoryfootprint Moderate High
Computationtype Maybefloating-pointdominated* Integerintensive
Input-outputorientation Outputdominated Inputdominated
*Increasingly,simulationworkhasbecomelessfloating-pointdominated
Applicationcodecharacteristicsdiffer
Contrastingproperties:
![Page 14: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/14.jpg)
Q2:Sowhatdowereallymeanby“increasingcoherence”betweensimulationandanalytics?
14
§ NOTonesystemostensiblyoptimizedforbothsimulationandanalytics
§ Greatercommonalityinunderlyingcomponentryanddesignprinciples
§ Greaterinteroperability,allowinginterleavingofbothtypesofcomputations
…Amorecommonhardwareandsoftwareroadmapbetweensimulationandanalytics
![Page 15: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/15.jpg)
15
Andyet,thereishope…
![Page 16: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/16.jpg)
Simulationandanalyticsareevolvingtobecomemoresimilarintheirarchitecturalneeds
16
§ CurrentchallengesfortheLSDAcommunity§ Datamovement§ Powerconsumption§ Memory/interconnectbandwidth§ Scalingefficiency
§ InstructionmixforSandia’sHPCengineeringcodes§ Memoryoperations 40%§ Integeroperations 40%§ Floatingpoint 10%§ Other 10%
§ Commondesignimpactsofenergycosttrends§ Increasedconcurrency(processingthreads,cores,memorydepth)§ Increasedcomplexityandburdenon
§ systemsoftware,languages,tools,runtimesupport,codes
…similartoHPCsimulation
…similartoLSDA
![Page 17: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/17.jpg)
Energycostofmovingdataisbecomingdominant
Energycost,inpicojou
les(pJ),pe
r64
-bitflo
ating-po
into
peratio
n
Costestimatesfortechnologyyear
Energycostforvariouscommonoperations
FromDanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.
![Page 18: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/18.jpg)
ArchitecturalCharacteristic
Simulation Analytics
Computation Memoryaddressgenerationdominated Same
Primarymemory Lowpower,highbandwidth,semi-randomaccess Same
Secondarymemory Emergingtechnologiesmayoffsetcost,allowingmuchmorememory …require extremelylargememoryspaces
Storage Integrationofanotherlayerofmemoryhierarchytosupportcheckpoint/restart …tosupportout-of-coredatasetaccess
Interconnecttechnology Highbisectionbandwidth,(forrelativelycoarse-grainedaccess) …(forfine-grainedaccess)
Systemsoftware(node-level)
Lowdependenceonsystemservices,increasinglyadaptive,resourcemanagementforstructured parallelism
…highlyadaptive,resourcemanagementforunstructured parallelism
Systemsoftware(system-level) Increasinglyirregularworkflows Irregularworkflows
Emergingarchitecturalandsystemsoftwaresynergies
Similarneeds:
![Page 19: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/19.jpg)
Q3:Howmightcoherencebefurtheredinpractice?
19
§ Makingitanelementofnationalstrategy§ CheckviatheNSCI
§ Buildingthisintoexascale computingefforts§ AlsoacomponentoftheNSCI
§ Communicatingwithandenlistingthetechnicalcommunitiesconcerned§ Thisforumandsimilarevents
§ Furtherdevelopingthevision§ Today’sdialoguesession!
![Page 20: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/20.jpg)
Acknowledgements
20
![Page 21: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,](https://reader033.vdocuments.mx/reader033/viewer/2022050220/5f65f196454ad16b2e3959bd/html5/thumbnails/21.jpg)
Additionalreferences
21
§ TheEconomist,“Data,Data,Everywhere,” Feb25th,2010
§ R.C.MurphyandP.M.Kogge,“OntheMemoryAccessPatternsofSupercomputerApplications:BenchmarkSelectionandItsImplications,”IEEETransactionsonComputers56(7,July2007):937–945.
§ R.Murphy,“PowerIssues,”presentationtoJASON2012,June2012.
§ PeterKogge (editor)etal.,ExaScale ComputingStudy:TechnologyChallengesinAchievingExascaleSystems. DARPA,2008.
§ DanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.
§ TonyHey,StewartTansley,andKristinTolle(editors), TheFourthParadigm:Data-IntensiveScientificDiscovery,MicrosoftResearch,2009.
§ JimGray,TheFourthParadigm:Data-IntensiveScientificDiscovery