architecture and programming model for interactive real ... · architecture and programming model...

Post on 11-Apr-2018

228 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Architecture andProgramming Model for

Interactive Real TimeComputing

JackDennis,Arvind:MIT-CSAILXiaoming li,Lian-PingWang,Guang Gao:

UniversityofDelaware

SupportedbyAFOSRGrantFA9550-13-1-0213ProgramManager:Dr.FredericaDarema

Project Goal

Demonstrateeffectivenessof

FreshBreezetechnologyfor

DDDAS

MotivatingDDDASApplications

Domain application: turbulent particle-laden flows• Turbulent flows laden with solid particles or liquid droplets in a complex

geometry: flow drag, device erosion and damage, visibility, etc.

• Multi-scales & coupled multi-physical processes−Varying scales: Physics at fluid-particle interfaces è particle-particle interactions

è system scale (particle distribution, turbulence modulation by particles) −Particle-wall interactions: turbulence and drag modulation, erosion

• Related applications systems−High-speed gas turbine combustor−Operation of aircraft and engine in polluted environment−Explosion in military operations−Treatment of particulate nuclear waste

Environmental applications: sediment transport, rain formation, air-sea interactions

The mesoscopic simulation system of multiphase turbulent flows (based on the kinetic Boltzmann equation)

Physical considerations:Interface and particle scalesNo-slip boundary conditionHydrodynamic force / torqueViscous boundary layerVortex shedding and wakes

Channel domain-scale:Turbulent boundary layer with solid particlesDistribution of particlesImpact of particles on the drag force at channel wallEffects of particle size, particle inertia, and sedimentation

Local flow vorticity and particle positions on a 2D slide

Computational approach and challenges: Resolving multi-scale physicsScalable computation, visualization, and analysis

Wang’s group at UD pioneers this new approach. He will host the International Conference for MesoscopicMethods in Engineering and Science (ICMMES) in 2018 at UD.

The measurement system by plenoptic (light field) particle tracking velocimetry(with Jingyi Yu at UD, A CRI (CISE Research Infrastructure) project recently funded by NSF)

• A plenoptic camera is a single-shot, multi-view acquisition device.

• Allow 3D reconstruction of particles via stereo matching

• Robustly handle heavy occlusions• Two plenoptic cameras with color pattern-

coded particles allows observation of position, translational and rotational velocities

Measurements at the particle-scale:• Translational and rotational

accelerations• RMS fluctuations • Particle-particle, particle-wall

interactions

Measurements at the channel scale:• Mean velocity profiles of both phases• Turbulent statistics of both phases• Particle distributions

Sensors

ReducedModel

Simulation

Interface-resolvedDNS

Freshbreezesystemarchitecture

D=f(Re,dp/H,ϕ,…)Coarse-grainedmodels

Particulate Multiphase Flow as a DDDAS

The channel and the light field camera

Wideparameterrangesbutlimitedinstrument resolution andincomplete data

Limited parameterrangesbutcomplete space/time4Ddata

Applications

Keyquestions

Merge DNSSimulator

withdifferentinputdata

PTVView4

Comparison,Analysis,Decision,andControl

InteractiveVisualization

PTVView2

PTVView1

PTVView3 Distribute

Particulate Multiphase Flow as a DDDAS: data flow

Results data:Fluidflowdata,particle data,Ref.single-phase flowdata

Inputdata:Particlesize,volumefraction,densityratio,Gridresolution, etc.

light field camera

Codingchoices:numericalmethod,computeralgorithm, implementation

Mahali SpaceWeatherMonitoringFluctuationsinpropagationofGPSsignalsareusedtomeasureelectrondensityintheionosphere

Satellite1 Satellite2

Receiver1 Receiver2 Receiver3 Receiver4

A)NormalAtmosphericConditionReceiversneedtobereal-time redirected(pointingangle,satellitesignalband)accordingtosatellite

motion,timeofdayandanalysisofreceiveddata.

Mahali DataCollection

GlobalDataNetwork

DataProcessingSystem

Receiver0

Receiver1

Receiver2

Receiver3

Receiver4

Receiver5

Receiver6

Receiver7

DDDAS Requirements• Realtimeinteractionwithsensorsandeffectors:Henceinput/outputfacilitiescapableofrapidresponsetolargenumbersofindependentevents.

• Abilitytoexecuteintenseprocessingfunctionstoreactappropriatelytochangingsituation.

• Theserequirementsimplytheneedforeffectivedynamicmanagementofmemoryandprocessingresources.

• Datastreamsarethenaturalprogramminginterfacetoinput/outputdevices

DDDAS Development Support

• ThefunJava ProgrammingLanguageformodularprogrammingofDDDAS.

• TheFreshBreezearchitectureforparallelcomputingwithfine-grainexecutionofmyriadcodelets.

• TheKiva systemsimulatorcapableofcycleaccuratesimulationofsystemswiththousandsofcomponents.

• TheFreshBreezecompilerforgeneratingcodeletsforhighlyparallelcomputationfromfunJavaprograms.

funJavaA Functional Programming

Language for DDDAS

• Alanguageinwhichallformsofparallelismarereadilyexpressed:ExpressionParallel,DataParallel,Producer-ConsumerandTransactionProcessing.

• Ahighlevelprogramminglanguageinwhichdatastreamsarefirstclassdataobjects

• RetainsthetypesecurityfeatureoftheJavalanguage.

Data Uniformity: Trees of Chunks

DataChunkse.g. 128 Bytes

RootChunk

Cycle-Free Heap Arrays as Trees of Chunks

Stream as a Chain of Chunks

Application Composability: Codelet

§ A block of instructions scheduled for execution when needed data objects are available.§ Results made available to successor codelets.§ Data objects are trees of chunks.

Codelet

ObjectA

ObjectB

Illustration:Non-deterministicStreamProcessing

Timer GPSReceiverControl

GPSSignalStrength Notsynchronized

Filtering&Analysis

Illustration:Non-deterministicStreamProcessing

Timer GPSReceiverControl

GPSSignalStrength Notsynchronized

Filtering&Analysis

funJava:Nativesupportsnon-deterministic streammerging.Vs.

Synchronous streamprocessing: Potentialtiminghazards.

Advantages ofFreshBreezeStreamProcessing

• Firstinstanceofsupportforhighlevelmodularprogrammingwithstreams.

• Fine-grainconcurrencywithouttiminghazards.

•Highperformance:Pipelineprocessingoftenmillionstreamelementspersecond

ADDDASApplicationImplementedinFreshBreeze

Mahali SpaceWeatherMonitoringFluctuationsinpropagationofGPSsignalsareusedtomeasureelectrondensityintheionosphere

Satellite1 Satellite2

Receiver1 Receiver2 Receiver3 Receiver4

A)NormalAtmosphericCondition

Receiversareredirected(pointingangle,satellitesignalband)accordingtosatellitemotion,timeofdayand

analysisofreceiveddata.

Mahali inStreamProcessingRepresentation

GPS Receiver 0 Tag 0

Stream Merge

Filter & Analysis

Select 0

GPS Receiver 1

GPS Receiver n

Tag 1

Tag n

Select 1

Select n

Stream <Data>

Stream <Data>

Stream <Data>

Stream <Tagged Command>

Timer

Stream <Tagged Data>

Stream <Command>

Stream <Command>

Stream <Command>

AFreshBreezeMahali Simulation

IOProcessor0 4x4PacketRoutingNetwork

(Commands)

4x4PacketRoutingNetwork

(Responses)

MemoryUnit0

MemoryUnit1

MemoryUnit2

MemoryUnit3

Receiver0

Receiver1

Receiver2

Receiver3

Receiver4

Receiver5

Receiver6

Receiver7

IOProcessor1

IOProcessor2

IOProcessor3

• IOProcessors areFreshBreeze ProcessingUnitswithcapabilitiesforcommunicatingwithGPSReceivers

• EachIOProcessorhasIOportsfortwoGPSReceivers.• TheLoadBalancerandTaskSchedulersarenotshown.

AMahali Scenario

• Kivasimulationrunshavebeenperformedforsystemswithfour,eightandsixteenProcessors.

• Thesimulationsconfirmabilityofeachprocessortohandleinputdataatratesupto6GBsor 50Mpackets persecond.

• Thesimulationsdemonstrateabilitytoperformrealtimeinteractionstogetherwithanalyticcomputation.

funJava Matrix MultiplyMultiplicationofsquarematrices

MatrixSize NumberofTasks16 1,55832 6,81264 34,600128 202,304256 1,330,288512 10,563,804

Foreachmatrixsizethecomputationisrunonnineconfigurationsrangingfromoneprocessorto256processors.These simulation runsstressedtheKivasimulatorfortargetsystemswithasmanyas64,515Kivacomponents.

1.0

10.0

100.0

1 2 4 8 16 32 64 128 256

SpeedUp

(Logscale)

NumberofProcessors

16 32 64 128 256 512

Nomanualorruntimecodeoptimization.Nosystemadaption.

FreshBreezeBenefitsforDDDAS• Hardwaresupportforfinegraintaskingallowscomputationtobedistributedovermanyprocessingcores.

• DirectcommunicationwithIOdevicesinsupportoflowlatencyrealtimeinteraction.

• Lowenergy:Noruntimesoftwareoverhead.Nocostforcacheconsistency,

• ExpressionofDDDASinahighlevelprogramminglanguagesupportingcomponentbasedsoftware.

• Highperformance:Pipelineprocessingoftenmillionstreamelementspersecond

Achievements• funJava:Ahighlevelfunctionalprogramming languagewithstreamprocessing andrealtime IO.

• FreshBreezearchitectureforefficient executionofDDDAS.• IncorporationofTaskSchedulingandLoadBalancingfor500ormoreprocessing cores.

• Demonstration oflinearspeedup toover500processing cores• Acodelet compiler thatgenerateshighperformance codewithoutmanual tuning.

• Demonstration ofrealtimeinteractivedatacollectionderivedfromMahali spaceweathermonitoring.

• Parallelmultiscale mesoscopic simulationsofmultiphase flowsonthousandsofprocessors

• Lightfieldreconstructionofmultiphase flowsatvariousscales

PublicationsPublished:• JackB.Dennis.IFIPWorkingGroup2.8FunctionalProgramming.October14- 18,2013Meeting,Assois,FranceAugust10- 15,2014Meeting,EstesPark,Colorado

May25- 29,2015,Kefalonia,Greece

• WangL-P,PengC,Guo ZL,YuZS,2016,LatticeBoltzmannSimulationofParticle-LadenTurbulentChannel Flow,Computers&Fluids,124:226-236.• WangL-P,Ardila OGC,AyalaO,GaoH,PengC,2016,Studyoflocalturbulenceprofilesrelativetotheparticlesurfaceinparticle-ladenturbulentflows,ASMEJ.of

FluidsEngr.,138:041203.

• WangL-P,PengC,Guo ZL,YuZS,2016,FlowModulationbyFinite-SizeNeutrallyBuoyantParticlesinaTurbulentChannelFlow,ASMEJ.ofFluidsEngr.,138:041103.• YuZS,LinZW,ShaoXM,WangL-P,Aparallelfictitiousdomainmethodfortheinterface-resolvedsimulationofparticle-ladenflowsanditsapplicationtothe

turbulentchannelflow,Engr.Appl.Comput.FluidMech.10:160-170.

• PengC,Teng Y,HwangB,Guo ZL,WangL-P,2016,ImplementationissuesandbenchmarkingoflatticeBoltzmannmethodformovingparticlesimulationsinaviscousflow,Computers&MathematicswithApplication.doi:10.1016/j.camwa.2015.08.027

• Zong Y.,Guo ZL,Wang L-P,2015,DesigningCorrectFluidHydrodynamicsonARectangularGridusingMRTLatticeBoltzmannApproach,Computers&MathematicswithApplication,doi:10.1016/j.camwa.2015.05.021.

• Lin,ZW,ShaoXM,YuZS,andWangL-P,2016,Effectsoffinite-sizeheavyparticlesontheturbulentflowsinasquareduct.J.Hydrodynamics,accepted.

Submitted:

ChenSY,PengC,Teng YH,WangL-P,2015,ImprovinglatticeBoltzmannsimulationofmovingparticlesinaviscousflowusinglocalgridrefinement,Computers&Fluids.Lin,Zhaowu;Shao,Xueming;Yu,Zhaosheng;Wang,L-P,2015,Eects ofnite-sizeneutrallybuoyantparticlesontheturbulentflowsinasquareduct.J.FluidMech.

Lin,Zhaowu;Shao,Xueming;Yu,Zhaosheng;Wang,L-P,2015,Effectsofparticleinertiaontheinteractionsbetweentheturbulentchannelflowandthefinite-sizeparticles.PhysicsofFluids,HaodaMin,PengC,Guo ZL,WangL-P,2016,Aninversedesignanalysisofmesoscopic implementation ofnon-uniformforcinginMRTlatticeBoltzmannmodels,Computers&MathematicswithApplications.

PengC,Guo ZL,WangL-P,2016,Alattice-BGKmodelfortheNavier-Stokesequationsbasedonarectangulargrid,Computers&MathematicswithApplications.PengC,MinHD,Guo ZL,WangL-P,2016,Ahydrodynamically-consistentMRTlatticeBoltzmannmodelona2Drectangulargrid,J.Comp.Phys.

WangP,Guo,ZL;XuK;WangLP,2016,AcomparativestudyofDUGKSandLBEmethodsforlaminarflowsanddecayinghomogeneousisotropicturbulenceflows.Phys.Rev.E.BoYT,WangP,Guo ZL,WangL-P,2016,ParallelimplementationandvalidationofDUGKSforthree-dimensionalTaylor-Greenvortexflowandturbulentchannelflow,Computers&MathematicswithApplications.

WangL-P,MinHD,PengC,GenevaN,Guo ZL,2016,Alattice-BoltzmannschemeoftheNavier-Stokesequationonathree-dimensionalcuboidlattice.Computers&Fluids.

Further Work• Develop acomplete simulationmodel foratypicalfieldedMahali datacollectionnetwork.

• ExtendFreshBreeze simulation tomodel asystemwithmultiplemulti-corenodesandasharedDRAMmemory.

• Implement GarbageCollection.• Extendtheexpressive poweroffunJava andtheFreshBreeze compiler.

• EvaluateenergyefficiencyofFreshBreezesystems.

• StudyotherDDDAStoassess anylimitationsofourapproach.

top related