big data meets hpc - exploiting hpc technologies for accelerating big data processing

BigDataMeetsHPC–Exploi4ngHPCTechnologiesforAccelera4ngBigDataProcessing

DhabaleswarK.(DK)PandaTheOhioStateUniversity

E-mail:[email protected]

h<p://www.cse.ohio-state.edu/~panda

KeynoteTalkatHPCAC-Switzerland(Mar2016)

by

HPCAC-Switzerland(Mar‘16) 2NetworkBasedCompu4ngLaboratory

•  BigDatahasbecometheoneofthemostimportantelementsofbusinessanalyFcs

•  ProvidesgroundbreakingopportuniFesforenterpriseinformaFonmanagementanddecisionmaking

•  Theamountofdataisexploding;companiesarecapturinganddigiFzingmoreinformaFonthanever

•  TherateofinformaFongrowthappearstobeexceedingMoore’sLaw

Introduc4ontoBigDataApplica4onsandAnaly4cs


•  Commonlyaccepted3V’sofBigData•  Volume,Velocity,VarietyMichaelStonebraker:BigDataMeansatLeastThreeDifferentThings,hWp://www.nist.gov/itl/ssd/is/upload/NIST-stonebraker.pdf

•  4/5V’sofBigData–3V+*Veracity,*Value

4VCharacteris4csofBigData

Courtesy:hWp://api.ning.com/files/tRHkwQN7s-Xz5cxylXG004GLGJdjoPd6bVfVBwvgu*F5MwDDUCiHHdmBW-JTEz0cfJjGurJucBMTkIUNdL3jcZT8IPfNWfN9/dv1.jpg


•  Webpages(content,graph)

•  Clicks(ad,page,social)

•  Users(OpenID,FBConnect,etc.)

•  e-mails(Hotmail,Y!Mail,Gmail,etc.)

•  Photos,Movies(Flickr,YouTube,Video,etc.)

•  Cookies/trackinginfo(seeGhostery)

•  Installedapps(Androidmarket,AppStore,etc.)

•  LocaFon(LaFtude,Loopt,Foursquared,GoogleNow,etc.)

•  Usergeneratedcontent(Wikipedia&co,etc.)

•  Ads(display,text,DoubleClick,Yahoo,etc.)

•  Comments(Discuss,Facebook,etc.)

•  Reviews(Yelp,Y!Local,etc.)

•  SocialconnecFons(LinkedIn,Facebook,etc.)

•  Purchasedecisions(Netflix,Amazon,etc.)

•  InstantMessages(YIM,Skype,Gtalk,etc.)

•  Searchterms(Google,Bing,etc.)

•  NewsarFcles(BBC,NYTimes,Y!News,etc.)

•  Blogposts(Tumblr,Wordpress,etc.)

•  Microblogs(Twi<er,Jaiku,Meme,etc.)

•  Linksharing(Facebook,Delicious,Buzz,etc.)

DataGenera4oninInternetServicesandApplica4ons

NumberofAppsintheAppleAppStore,AndroidMarket,Blackberry,andWindowsPhone(2013)•  AndroidMarket:<1200K•  AppleAppStore:~1000KCourtesy:hWp://dazeinfo.com/2014/07/10/apple-inc-aapl-ios-google-inc-goog-android-growth-mobile-ecosystem-2014/


VelocityofBigData–HowMuchDataIsGeneratedEveryMinuteontheInternet?

TheglobalInternetpopula4ongrew18.5%from2013to2015andnowrepresents3.2BillionPeople.

Courtesy:hWps://www.domo.com/blog/2015/08/data-never-sleeps-3-0/


•  ScienFficDataManagement,Analysis,andVisualizaFon

•  ApplicaFonsexamples–  Climatemodeling

–  CombusFon

–  Fusion

–  Astrophysics

–  BioinformaFcs

•  DataIntensiveTasks–  Runslarge-scalesimulaFonsonsupercomputers

–  Dumpdataonparallelstoragesystems

–  Collectexperimental/observaFonaldata

–  Moveexperimental/observaFonaldatatoanalysissites

–  VisualanalyFcs–helpunderstanddatavisually

NotOnlyinInternetServices-BigDatainScien4ficDomains


•  Hadoop:h<p://hadoop.apache.org–  ThemostpopularframeworkforBigDataAnalyFcs

–  HDFS,MapReduce,HBase,RPC,Hive,Pig,ZooKeeper,Mahout,etc.

•  Spark:h<p://spark-project.org–  ProvidesprimiFvesforin-memoryclustercompuFng;Jobscanloaddataintomemoryandqueryitrepeatedly

•  Storm:h<p://storm-project.net–  Adistributedreal-FmecomputaFonsystemforreal-FmeanalyFcs,onlinemachinelearning,conFnuous

computaFon,etc.

•  S4:h<p://incubator.apache.org/s4–  AdistributedsystemforprocessingconFnuousunboundedstreamsofdata

•  GraphLab:h<p://graphlab.org–  ConsistsofacoreC++GraphLabAPIandacollecFonofhigh-performancemachinelearninganddatamining

toolkitsbuiltontopoftheGraphLabAPI.

•  Web2.0:RDBMS+Memcached(h<p://memcached.org)–  Memcached:Ahigh-performance,distributedmemoryobjectcachingsystems

TypicalSolu4onsorArchitecturesforBigDataAnaly4cs


BigDataProcessingwithHadoopComponents •  Majorcomponentsincludedinthis

tutorial:–  MapReduce(Batch)–  HBase(Query)–  HDFS(Storage)–  RPC(Inter-processcommunicaFon)

•  UnderlyingHadoopDistributedFileSystem(HDFS)usedbybothMapReduceandHBase

•  ModelscalesbuthighamountofcommunicaFonduringintermediatephasescanbefurtheropFmized

HDFS

MapReduce

HadoopFramework

UserApplica4ons

HBase

HadoopCommon(RPC)


Worker

Worker

Worker

Worker

Master

HDFSDriver

Zookeeper

Worker

SparkContext

•  Anin-memorydata-processingframework–  IteraFvemachinelearningjobs–  InteracFvedataanalyFcs–  ScalabasedImplementaFon

–  Standalone,YARN,Mesos

•  ScalableandcommunicaFonintensive–  WidedependenciesbetweenResilient

DistributedDatasets(RDDs)–  MapReduce-likeshuffleoperaFonsto

reparFFonRDDs

–  SocketsbasedcommunicaFon

SparkArchitectureOverview

hWp://spark.apache.org


MemcachedArchitecture

•  DistributedCachingLayer–  AllowstoaggregatesparememoryfrommulFplenodes–  Generalpurpose

•  Typicallyusedtocachedatabasequeries,resultsofAPIcalls•  Scalablemodel,buttypicalusageverynetworkintensive

Main memory CPUs

SSD HDD

High Performance Networks

... ...

...

Main memory CPUs

SSD HDD

Main memory CPUs

SSD HDD

Main memory CPUs

SSD HDD

Main memory CPUs

SSD HDD


•  SubstanFalimpactondesigninganduFlizingdatamanagementandprocessingsystemsinmulFpleFers

–  Front-enddataaccessingandserving(Online)•  Memcached+DB(e.g.MySQL),HBase

–  Back-enddataanalyFcs(Offline)•  HDFS,MapReduce,Spark

DataManagementandProcessingonModernClusters

Internet

Front-end Tier Back-end Tier

Web ServerWeb

ServerWeb Server

Memcached+ DB (MySQL)Memcached

+ DB (MySQL)Memcached+ DB (MySQL)

NoSQL DB (HBase)NoSQL DB

(HBase)NoSQL DB (HBase)

HDFS

MapReduce Spark

Data Analytics Apps/Jobs

Data Accessing and Serving


•  IntroducedinOct2000•  HighPerformanceDataTransfer

–  InterprocessorcommunicaFonandI/O–  Lowlatency(<1.0microsec),Highbandwidth(upto12.5GigaBytes/sec->100Gbps),and

lowCPUuFlizaFon(5-10%)•  MulFpleOperaFons

–  Send/Recv–  RDMARead/Write–  AtomicOperaFons(veryunique)

•  highperformanceandscalableimplementaFonsofdistributedlocks,semaphores,collecFvecommunicaFonoperaFons

•  Leadingtobigchangesindesigning–  HPCclusters–  Filesystems–  CloudcompuFngsystems–  GridcompuFngsystems

OpenStandardInfiniBandNetworkingTechnology


InterconnectsandProtocolsinOpenFabricsStack

Kernel Space

Applica4on/Middleware

Verbs

EthernetAdapter

EthernetSwitch

EthernetDriver

TCP/IP

1/10/40/100GigE

InfiniBandAdapter

InfiniBandSwitch

IPoIB

IPoIB

EthernetAdapter

EthernetSwitch

HardwareOffload

TCP/IP

10/40GigE-TOE

InfiniBandAdapter

InfiniBandSwitch

UserSpace

RSockets

RSockets

iWARPAdapter

EthernetSwitch

TCP/IP

UserSpace

iWARP

RoCEAdapter

EthernetSwitch

RDMA

UserSpace

RoCE

InfiniBandSwitch

InfiniBandAdapter

RDMA

UserSpace

IBNa4ve

Sockets

Applica4on/MiddlewareInterface

Protocol

Adapter

Switch

InfiniBandAdapter

InfiniBandSwitch

RDMA

SDP

SDP


HowCanHPCClusterswithHigh-PerformanceInterconnectandStorageArchitecturesBenefitBigDataApplica4ons?

BringHPCandBigDataprocessingintoa“convergenttrajectory”!

Whatarethemajorbo<lenecksincurrentBig

Dataprocessingmiddleware(e.g.Hadoop,Spark,andMemcached)?

Canthebo<lenecksbealleviatedwithnewdesignsbytakingadvantageofHPCtechnologies?

CanRDMA-enabledhigh-performanceinterconnectsbenefitBigDataprocessing?

CanHPCClusterswithhigh-performance

storagesystems(e.g.SSD,parallelfile

systems)benefitBigDataapplicaFons?

Howmuchperformancebenefits

canbeachievedthroughenhanced

designs?HowtodesignbenchmarksforevaluaFngthe

performanceofBigDatamiddlewareon

HPCclusters?


DesigningCommunica4onandI/OLibrariesforBigDataSystems:Challenges

BigDataMiddleware(HDFS,MapReduce,HBase,SparkandMemcached)

NetworkingTechnologies(InfiniBand,1/10/40/100GigE

andIntelligentNICs)

StorageTechnologies(HDD,SSD,andNVMe-SSD)

ProgrammingModels(Sockets)

Applica4ons

CommodityCompu4ngSystemArchitectures

(Mul4-andMany-corearchitecturesandaccelerators)

OtherProtocols?

Communica4onandI/OLibraryPoint-to-PointCommunica4on

QoS

ThreadedModelsandSynchroniza4on

Fault-ToleranceI/OandFileSystems

Virtualiza4on

Benchmarks

UpperlevelChanges?


•  Socketsnotdesignedforhigh-performance–  StreamsemanFcsowenmismatchforupperlayers–  Zero-copynotavailablefornon-blockingsockets

CanBigDataProcessingSystemsbeDesignedwithHigh-PerformanceNetworksandProtocols?

CurrentDesign

Applica4on

Sockets

1/10/40/100GigENetwork

OurApproach

Applica4on

OSUDesign

10/40/100GigEorInfiniBand

VerbsInterface


•  RDMAforApacheSpark

•  RDMAforApacheHadoop2.x(RDMA-Hadoop-2.x)–  PluginsforApache,Hortonworks(HDP)andCloudera(CDH)HadoopdistribuFons

•  RDMAforApacheHadoop1.x(RDMA-Hadoop)

•  RDMAforMemcached(RDMA-Memcached)

•  OSUHiBD-Benchmarks(OHB)–  HDFSandMemcachedMicro-benchmarks

•  hWp://hibd.cse.ohio-state.edu

•  UsersBase:155organizaFonsfrom20countries

•  Morethan15,500downloadsfromtheprojectsite

•  RDMAforApacheHBaseandImpala(upcoming)

TheHigh-PerformanceBigData(HiBD)Project

AvailableforInfiniBandandRoCE


•  High-PerformanceDesignofHadoopoverRDMA-enabledInterconnects

–  HighperformanceRDMA-enhanceddesignwithnaFveInfiniBandandRoCEsupportattheverbs-levelforHDFS,MapReduce,andRPCcomponents

–  EnhancedHDFSwithin-memoryandheterogeneousstorage

–  HighperformancedesignofMapReduceoverLustre

–  Plugin-basedarchitecturesupporFngRDMA-baseddesignsforApacheHadoop,CDHandHDP

–  Easilyconfigurablefordifferentrunningmodes(HHH,HHH-M,HHH-L,andMapReduceoverLustre)anddifferentprotocols(naFveInfiniBand,RoCE,andIPoIB)

•  Currentrelease:0.9.9

–  BasedonApacheHadoop2.7.1

–  CompliantwithApacheHadoop2.7.1,HDP2.3.0.0andCDH5.6.0APIsandapplicaFons

–  Testedwith

•  MellanoxInfiniBandadapters(DDR,QDRandFDR)

•  RoCEsupportwithMellanoxadapters

•  VariousmulF-corepla|orms

•  DifferentfilesystemswithdisksandSSDsandLustre

–  hWp://hibd.cse.ohio-state.edu

RDMAforApacheHadoop2.xDistribu4on


•  High-PerformanceDesignofSparkoverRDMA-enabledInterconnects–  HighperformanceRDMA-enhanceddesignwithnaFveInfiniBandandRoCEsupportattheverbs-level

forSpark

–  RDMA-baseddatashuffleandSEDA-basedshufflearchitecture

–  Non-blockingandchunk-baseddatatransfer

–  Easilyconfigurablefordifferentprotocols(naFveInfiniBand,RoCE,andIPoIB)


–  BasedonApacheSpark1.5.1

–  Testedwith•  MellanoxInfiniBandadapters(DDR,QDRandFDR)

•  RoCEsupportwithMellanoxadapters

•  VariousmulF-corepla|orms

•  RAMdisks,SSDs,andHDD


RDMAforApacheSparkDistribu4on


•  High-PerformanceDesignofMemcachedoverRDMA-enabledInterconnects–  HighperformanceRDMA-enhanceddesignwithnaFveInfiniBandandRoCEsupportattheverbs-levelfor

MemcachedandlibMemcachedcomponents

–  HighperformancedesignofSSD-AssistedHybridMemory

–  EasilyconfigurablefornaFveInfiniBand,RoCEandthetradiFonalsockets-basedsupport(EthernetandInfiniBandwithIPoIB)


–  BasedonMemcached1.4.24andlibMemcached1.0.18

–  CompliantwithlibMemcachedAPIsandapplicaFons

–  Testedwith•  MellanoxInfiniBandadapters(DDR,QDRandFDR)•  RoCEsupportwithMellanoxadapters•  VariousmulF-corepla|orms•  SSD


RDMAforMemcachedDistribu4on


•  Micro-benchmarksforHadoopDistributedFileSystem(HDFS)–  SequenFalWriteLatency(SWL)Benchmark,SequenFalReadLatency(SRL)

Benchmark,RandomReadLatency(RRL)Benchmark,SequenFalWriteThroughput(SWT)Benchmark,SequenFalReadThroughput(SRT)Benchmark

–  Supportbenchmarkingof•  ApacheHadoop1.xand2.xHDFS,HortonworksDataPla|orm(HDP)HDFS,Cloudera

DistribuFonofHadoop(CDH)HDFS

•  Micro-benchmarksforMemcached–  GetBenchmark,SetBenchmark,andMixedGet/SetBenchmark

•  Currentrelease:0.8

•  hWp://hibd.cse.ohio-state.edu

OSUHiBDMicro-Benchmark(OHB)Suite–HDFS&Memcached


•  HHH:HeterogeneousstoragedeviceswithhybridreplicaFonschemesaresupportedinthismodeofoperaFontohavebe<erfault-toleranceaswellasperformance.Thismodeisenabledbydefaultinthepackage.

•  HHH-M:Ahigh-performancein-memorybasedsetuphasbeenintroducedinthispackagethatcanbeuFlizedtoperformallI/OoperaFonsin-memoryandobtainasmuchperformancebenefitaspossible.

•  HHH-L:Withparallelfilesystemsintegrated,HHH-LmodecantakeadvantageoftheLustreavailableinthecluster.

•  MapReduceoverLustre,with/withoutlocaldisks:Besides,HDFSbasedsoluFons,thispackagealsoprovidessupporttorunMapReducejobsontopofLustrealone.Here,twodifferentmodesareintroduced:withlocaldisksandwithoutlocaldisks.

•  RunningwithSlurmandPBS:SupportsdeployingRDMAforApacheHadoop2.xwithSlurmandPBSindifferentrunningmodes(HHH,HHH-M,HHH-L,andMapReduceoverLustre).

DifferentModesofRDMAforApacheHadoop2.x


•  RDMA-basedDesignsandPerformanceEvaluaFon–  HDFS–  MapReduce–  RPC–  HBase–  Spark–  Memcached(Basic,HybridandNon-blockingAPIs)–  HDFS+Memcached-basedBurstBuffer–  OSUHiBDBenchmarks(OHB)

Accelera4onCaseStudiesandPerformanceEvalua4on


•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesJavabasedHDFSwithcommunicaFonlibrarywri<eninnaFvecode

DesignOverviewofHDFSwithRDMA

HDFS

Verbs

RDMACapableNetworks(IB,iWARP,RoCE..)

Applica4ons

1/10/40/100GigE,IPoIBNetwork

JavaSocketInterface JavaNa4veInterface(JNI)WriteOthers

OSUDesign

•  DesignFeatures–  RDMA-basedHDFSwrite–  RDMA-basedHDFS

replicaFon–  ParallelreplicaFon

support–  On-demandconnecFon

setup–  InfiniBand/RoCEsupport

N.S.Islam,M.W.Rahman,J.Jose,R.Rajachandrasekar,H.Wang,H.Subramoni,C.MurthyandD.K.Panda,HighPerformanceRDMA-BasedDesignofHDFSoverInfiniBand,Supercompu4ng(SC),Nov2012

N.Islam,X.Lu,W.Rahman,andD.K.Panda,SOR-HDFS:ASEDA-basedApproachtoMaximizeOverlappinginRDMA-EnhancedHDFS,HPDC'14,June2014


Triple-H

HeterogeneousStorage

•  DesignFeatures–  Threemodes

•  Default(HHH)•  In-Memory(HHH-M)

•  Lustre-Integrated(HHH-L)–  PoliciestoefficientlyuFlizetheheterogeneous

storagedevices•  RAM,SSD,HDD,Lustre

–  EvicFon/PromoFonbasedondatausagepa<ern

–  HybridReplicaFon

–  Lustre-Integratedmode:

•  Lustre-basedfault-tolerance

EnhancedHDFSwithIn-MemoryandHeterogeneousStorage

HybridReplica4on

DataPlacementPolicies

Evic4on/Promo4on

RAMDisk SSD HDD

Lustre

N.Islam,X.Lu,M.W.Rahman,D.Shankar,andD.K.Panda,Triple-H:AHybridApproachtoAccelerateHDFSonHPCClusterswithHeterogeneousStorageArchitecture,CCGrid’15,May2015

Applica4ons


DesignOverviewofMapReducewithRDMA

MapReduce

Verbs


OSUDesign

Applica4ons


JavaSocketInterface JavaNa4veInterface(JNI)

JobTracker

TaskTracker

Map

Reduce

•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesJavabasedMapReducewithcommunicaFonlibrarywri<eninnaFvecode

•  DesignFeatures–  RDMA-basedshuffle–  Prefetchingandcachingmapoutput–  EfficientShuffleAlgorithms–  In-memorymerge

–  On-demandShuffleAdjustment–  Advancedoverlapping

•  map,shuffle,andmerge

•  shuffle,merge,andreduce

–  On-demandconnecFonsetup–  InfiniBand/RoCEsupport

M.W.Rahman,X.Lu,N.S.Islam,andD.K.Panda,HOMR:AHybridApproachtoExploitMaximumOverlappinginMapReduceoverHighPerformanceInterconnects,ICS,June2014


•  AhybridapproachtoachievemaximumpossibleoverlappinginMapReduceacrossallphasescomparedtootherapproaches–  EfficientShuffleAlgorithms–  DynamicandEfficientSwitching

–  On-demandShuffleAdjustment

AdvancedOverlappingamongDifferentPhases

DefaultArchitecture

EnhancedOverlappingwithIn-MemoryMerge

AdvancedHybridOverlapping

M.W.Rahman,X.Lu,N.S.Islam,andD.K.Panda,HOMR:AHybridApproachtoExploitMaximumOverlappinginMapReduceoverHighPerformanceInterconnects,ICS,June2014


•  DesignFeatures–  JVM-bypassedbuffer

management–  RDMAorsend/recvbased

adapFvecommunicaFon–  IntelligentbufferallocaFonand

adjustmentforserializaFon–  On-demandconnecFonsetup

–  InfiniBand/RoCEsupport

DesignOverviewofHadoopRPCwithRDMA

HadoopRPC

Verbs


Applica4ons



OurDesign

Default

OSUDesign

•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesJavabasedRPCwithcommunicaFonlibrarywri<eninnaFvecode X.Lu,N.Islam,M.W.Rahman,J.Jose,H.Subramoni,H.Wang,andD.K.Panda,High-PerformanceDesignofHadoopRPCwith

RDMAoverInfiniBand,Int'lConferenceonParallelProcessing(ICPP'13),October2013.


0

50

100

150

200

250

80 100 120

Execu4

onTim

e(s)

DataSize(GB)

IPoIB(FDR)

0

50

100

150

200

250

80 100 120

Execu4

onTim

e(s)

DataSize(GB)

IPoIB(FDR)

PerformanceBenefits–RandomWriter&TeraGeninTACC-Stampede

Clusterwith32Nodeswithatotalof128maps

•  RandomWriter–  3-4ximprovementoverIPoIB

for80-120GBfilesize

•  TeraGen–  4-5ximprovementoverIPoIB

for80-120GBfilesize

RandomWriter TeraGen

Reducedby3x Reducedby4x


0100200300400500600700800900

80 100 120

Execu4

onTim

e(s)

DataSize(GB)

IPoIB(FDR) OSU-IB(FDR)

0

100

200

300

400

500

600

80 100 120

Execu4

onTim

e(s)

DataSize(GB)


PerformanceBenefits–Sort&TeraSortinTACC-Stampede

Clusterwith32Nodeswithatotalof128mapsand64reduces

•  SortwithsingleHDDpernode–  40-52%improvementoverIPoIB

for80-120GBdata

•  TeraSortwithsingleHDDpernode–  42-44%improvementoverIPoIB

for80-120GBdata

Reducedby52% Reducedby44%

Clusterwith32Nodeswithatotalof128mapsand57reduces


Evalua4onofHHHandHHH-LwithApplica4ons

HDFS(FDR) HHH(FDR)

60.24s 48.3s

CloudBurstMR-MSPolyGraph

0

200

400

600

800

1000

4 6 8

ExecuF

onTim

e(s)

Concurrentmapsperhost

HDFS Lustre HHH-L Reducedby79%

•  MR-MSPolygraphonOSURIwith1,000maps–  HHH-LreducestheexecuFonFme

by79%overLustre,30%overHDFS

•  CloudBurstonTACCStampede–  WithHHH:19%improvementover

HDFS


Evalua4onwithSparkonSDSCGordon(HHHvs.Tachyon/Alluxio)

•  For200GBTeraGenon32nodes

–  Spark-TeraGen:HHHhas2.4ximprovementoverTachyon;2.3xoverHDFS-IPoIB(QDR)

–  Spark-TeraSort:HHHhas25.2%improvementoverTachyon;17%overHDFS-IPoIB(QDR)

020406080

100120140160180

8:50 16:100 32:200

Execu4

onTim

e(s)

ClusterSize:DataSize(GB)

IPoIB(QDR) Tachyon OSU-IB(QDR)

0

100

200

300

400

500

600

700

8:50 16:100 32:200

Execu4

onTim

e(s)

ClusterSize:DataSize(GB)

Reducedby2.4x

Reducedby25.2%

TeraGen TeraSort

N.Islam,M.W.Rahman,X.Lu,D.Shankar,andD.K.Panda,PerformanceCharacteriza4onandAccelera4onofIn-MemoryFileSystemsforHadoopandSparkApplica4onsonHPCClusters,IEEEBigData’15,October2015


IntermediateDataDirectory

DesignOverviewofShuffleStrategiesforMapReduceoverLustre•  DesignFeatures

–  Twoshuffleapproaches•  Lustrereadbasedshuffle

•  RDMAbasedshuffle

–  Hybridshufflealgorithmtotakebenefitfrombothshuffleapproaches

–  Dynamicallyadaptstothebe<ershuffleapproachforeachshufflerequestbasedonprofilingvaluesforeachLustrereadoperaFon

–  In-memorymergeandoverlappingofdifferentphasesarekeptsimilartoRDMA-enhancedMapReducedesign

Map1 Map2 Map3

Lustre

Reduce1 Reduce2LustreRead/RDMA

In-memorymerge/sort

reduce

M.W.Rahman,X.Lu,N.S.Islam,R.Rajachandrasekar,andD.K.Panda,HighPerformanceDesignofYARNMapReduceonModernHPCClusterswithLustreandRDMA,IPDPS,May2015

In-memorymerge/sort

reduce


•  For500GBSortin64nodes–  44%improvementoverIPoIB(FDR)

PerformanceImprovementofMapReduceoverLustreonTACC-Stampede

•  For640GBSortin128nodes–  48%improvementoverIPoIB(FDR)

0

200

400

600

800

1000

1200

300 400 500

JobExecu4

onTim

e(sec)

DataSize(GB)

IPoIB(FDR)OSU-IB(FDR)

050

100150200250300350400450500

20GB 40GB 80GB 160GB 320GB 640GB

Cluster:4 Cluster:8 Cluster:16 Cluster:32 Cluster:64 Cluster:128

JobExecu4

onTim

e(sec) IPoIB(FDR) OSU-IB(FDR)

M.W.Rahman,X.Lu,N.S.Islam,R.Rajachandrasekar,andD.K.Panda,MapReduceoverLustre:CanRDMA-basedApproachBenefit?,Euro-Par,August2014.

•  LocaldiskisusedastheintermediatedatadirectoryReducedby48%Reducedby44%


•  For80GBSortin8nodes–  34%improvementoverIPoIB(QDR)

CaseStudy-PerformanceImprovementofMapReduceoverLustreonSDSC-Gordon

•  For120GBTeraSortin16nodes–  25%improvementoverIPoIB(QDR)

•  Lustreisusedastheintermediatedatadirectory

0

100

200

300

400

500

600

700

800

900

40 60 80

JobExecu4

onTim

e(sec)

DataSize(GB)

IPoIB(QDR)OSU-Lustre-Read(QDR)OSU-RDMA-IB(QDR)OSU-Hybrid-IB(QDR)

0

100

200

300

400

500

600

700

800

900

40 80 120

JobExecu4

onTim

e(sec)

DataSize(GB)

Reducedby25%Reducedby34%


HBase-RDMADesignOverview

•  JNILayerbridgesJavabasedHBasewithcommunicaFonlibrarywri<eninnaFvecode•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface

HBase

IBVerbs


OSU-IBDesign

Applica4ons

1/10/40/100GigE,IPoIBNetworks



HBase–YCSBRead-WriteWorkload

•  HBaseGetlatency(QDR,10GigE)–  64clients:2.0ms;128Clients:3.5ms–  42%improvementoverIPoIBfor128clients

•  HBasePutlatency(QDR,10GigE)–  64clients:1.9ms;128Clients:3.5ms–  40%improvementoverIPoIBfor128clients

0

1000

2000

3000

4000

5000

6000

7000

8 16 32 64 96 128

Time(us)

No.ofClients

010002000300040005000600070008000900010000

8 16 32 64 96 128

Time(us)

No.ofClients

10GigE

ReadLatency WriteLatency

OSU-IB(QDR)IPoIB(QDR)

J.Huang,X.Ouyang,J.Jose,M.W.Rahman,H.Wang,M.Luo,H.Subramoni,ChetMurthy,andD.K.Panda,High-PerformanceDesignofHBasewithRDMAoverInfiniBand,IPDPS’12


HBase–YCSBGetLatencyandThroughputonSDSC-Comet

•  HBaseGetaveragelatency(FDR)–  4clientthreads:38us–  59%improvementoverIPoIBfor4clientthreads

•  HBaseGettotalthroughput–  4clientthreads:102Kops/sec–  2.4ximprovementoverIPoIBfor4clientthreads

GetLatency GetThroughput

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4AverageLatency(m

s)

NumberofClientThreads


0

20

40

60

80

100

120

1 2 3 4TotalThrou

ghpu

t(Ko

ps/

sec)

NumberofClientThreads

IPoIB(FDR) OSU-IB(FDR)59% 2.4x


•  DesignFeatures–  RDMAbasedshuffle–  SEDA-basedplugins–  DynamicconnecFon

managementandsharing

–  Non-blockingdatatransfer–  Off-JVM-heapbuffer

management–  InfiniBand/RoCEsupport

DesignOverviewofSparkwithRDMA Spark Applications(Scala/Java/Python)

Spark(Scala/Java)

BlockFetcherIteratorBlockManager

Task TaskTask Task

Java NIO ShuffleServer

(default)

Netty ShuffleServer

(optional)

RDMA ShuffleServer

(plug-in)

Java NIO ShuffleFetcher(default)

Netty ShuffleFetcher

(optional)

RDMA ShuffleFetcher(plug-in)

Java Socket RDMA-based Shuffle Engine(Java/JNI)

1/10 Gig Ethernet/IPoIB (QDR/FDR) Network

Native InfiniBand (QDR/FDR)

•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesScalabasedSparkwithcommunicaFonlibrarywri<eninnaFvecode X.Lu,M.W.Rahman,N.Islam,D.Shankar,andD.K.Panda,Accelera4ngSparkwithRDMAforBigDataProcessing:Early

Experiences,Int'lSymposiumonHighPerformanceInterconnects(HotI'14),August2014


•  InfiniBandFDR,SSD,64WorkerNodes,1536Cores,(1536M1536R)

•  RDMA-baseddesignforSpark1.5.1

•  RDMAvs.IPoIBwith1536concurrenttasks,singleSSDpernode.–  SortBy:TotalFmereducedbyupto80%overIPoIB(56Gbps)

–  GroupBy:TotalFmereducedbyupto57%overIPoIB(56Gbps)

PerformanceEvalua4ononSDSCComet–SortBy/GroupBy

64WorkerNodes,1536cores,SortByTestTotalTime 64WorkerNodes,1536cores,GroupByTestTotalTime

0

50

100

150

200

250

300

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

0

50

100

150

200

250

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

57%80%


•  InfiniBandFDR,SSD,64WorkerNodes,1536Cores,(1536M1536R)


•  RDMAvs.IPoIBwith1536concurrenttasks,singleSSDpernode.–  Sort:TotalFmereducedby38%overIPoIB(56Gbps)

–  TeraSort:TotalFmereducedby15%overIPoIB(56Gbps)

PerformanceEvalua4ononSDSCComet–HiBenchSort/TeraSort

64WorkerNodes,1536cores,SortTotalTime 64WorkerNodes,1536cores,TeraSortTotalTime

050

100150200250300350400450

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

0

100

200

300

400

500

600

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

15%38%


•  InfiniBandFDR,SSD,32/64WorkerNodes,768/1536Cores,(768/1536M768/1536R)


•  RDMAvs.IPoIBwith768/1536concurrenttasks,singleSSDpernode.–  32nodes/768cores:TotalFmereducedby37%overIPoIB(56Gbps)

–  64nodes/1536cores:TotalFmereducedby43%overIPoIB(56Gbps)

PerformanceEvalua4ononSDSCComet–HiBenchPageRank

32WorkerNodes,768cores,PageRankTotalTime 64WorkerNodes,1536cores,PageRankTotalTime

050

100150200250300350400450

Huge BigData GiganFc

Time(sec)

DataSize(GB)

IPoIB

RDMA

0

100

200

300

400

500

600

700

800

Huge BigData GiganFc

Time(sec)

DataSize(GB)

IPoIB

RDMA

43%37%


•  ServerandclientperformanegoFaFonprotocol–  Masterthreadassignsclientstoappropriateworkerthread

•  Onceaclientisassignedaverbsworkerthread,itcancommunicatedirectlyandis“bound”tothatthread

•  AllotherMemcacheddatastructuresaresharedamongRDMAandSocketsworkerthreads

•  MemcachedServercanservebothsocketandverbsclientssimultaneously

•  MemcachedapplicaFonsneednotbemodified;usesverbsinterfaceifavailable

Memcached-RDMADesign

SocketsClient

RDMAClient

MasterThread

SocketsWorkerThreadVerbsWorkerThread

SocketsWorkerThread

VerbsWorkerThread

SharedData

MemorySlabsItems…

1

1

2

2


1

10

100

1000

1 2 4 8 16 32 641282565121K 2K 4K

Time(us)

MessageSize

OSU-IB(FDR)IPoIB(FDR)

0100200300400500600700

16 32 64 128 256 512 102420484080Thou

sand

sofT

ransac4o

ns

perS

econ

d(TPS)

No.ofClients

•  MemcachedGetlatency–  4bytesOSU-IB:2.84us;IPoIB:75.53us–  2KbytesOSU-IB:4.49us;IPoIB:123.42us

•  MemcachedThroughput(4bytes)–  4080clientsOSU-IB:556Kops/sec,IPoIB:233Kops/s–  Nearly2Ximprovementinthroughput

MemcachedGETLatency MemcachedThroughput

RDMA-MemcachedPerformance(FDRInterconnect)

ExperimentsonTACCStampede(IntelSandyBridgeCluster,IB:FDR)

LatencyReducedbynearly20X

2X


•  IllustraFonwithRead-Cache-Readaccesspa<ernusingmodifiedmysqlslaploadtesFngtool

•  Memcached-RDMAcan-  improvequerylatencybyupto66%overIPoIB(32Gbps)

-  throughputbyupto69%overIPoIB(32Gbps)

Micro-benchmarkEvalua4onforOLDPworkloads

012345678

64 96 128 160 320 400

Latency(sec)

No.ofClients

Memcached-IPoIB(32Gbps)Memcached-RDMA(32Gbps)

0

1000

2000

3000

4000

64 96 128 160 320 400

Throughp

ut(Kq

/s)

No.ofClients

Memcached-IPoIB(32Gbps)Memcached-RDMA(32Gbps)

D.Shankar,X.Lu,J.Jose,M.W.Rahman,N.Islam,andD.K.Panda,CanRDMABenefitOn-LineDataProcessingWorkloadswithMemcachedandMySQL,ISPASS’15

Reducedby66%


•  ohb_memlat&ohb_memthrlatency&throughputmicro-benchmarks

•  Memcached-RDMAcan-  improvequerylatencybyupto70%overIPoIB(32Gbps)

-  improvethroughputbyupto2XoverIPoIB(32Gbps)

-  Nooverheadinusinghybridmodewhenalldatacanfitinmemory

PerformanceBenefitsofHybridMemcached(Memory+SSD)onSDSC-Gordon

0

2

4

6

8

10

64 128 256 512 1024

Throughp

ut(m

illiontran

s/sec)

No.ofClients

IPoIB(32Gbps)RDMA-Mem(32Gbps)RDMA-Hybrid(32Gbps)

0

100

200

300

400

500

Averagelatency(us)

MessageSize(Bytes)

2X


–  MemcachedlatencytestwithZipfdistribuFon,serverwith1GBmemory,32KBkey-valuepairsize,totalsizeofdataaccessedis1GB(whendatafitsinmemory)and1.5GB(whendatadoesnotfitinmemory)

–  Whendatafitsinmemory:RDMA-Mem/Hybridgives5ximprovementoverIPoIB-Mem

–  Whendatadoesnotfitinmemory:RDMA-Hybridgives2x-2.5xoverIPoIB/RDMA-Mem

PerformanceEvalua4ononIBFDR+SATA/NVMeSSDs

0

500

1000

1500

2000

2500

Set Get Set Get Set Get Set Get Set Get Set Get Set Get Set Get

IPoIB-Mem RDMA-Mem RDMA-Hybrid-SATA RDMA-Hybrid-NVMe

IPoIB-Mem RDMA-Mem RDMA-Hybrid-SATA RDMA-Hybrid-NVMe

DataFitsInMemory DataDoesNotFitInMemory

Latency(us)

slaballocaFon(SSDwrite) cachecheck+load(SSDread) cacheupdate serverresponse clientwait miss-penalty


–  RDMA-AcceleratedCommunicaFonforMemcachedGet/Set

–  Hybrid‘RAM+SSD’slabmanagementforhigherdataretenFon

–  Non-blockingAPIextensions•  memcached_(iset/iget/bset/bget/test/wait)•  Achievenearin-memoryspeedswhilehiding

bo<lenecksofnetworkandSSDI/O•  AbilitytoexploitcommunicaFon/computaFon

overlap•  OpFonalbufferre-useguarantees

–  AdapFveslabmanagerwithdifferentI/Oschemesforhigherthroughput.

Accelera4ngHybridMemcachedwithRDMA,Non-blockingExtensionsandSSDs

D.Shankar,X.Lu,N.S.Islam,M.W.Rahman,andD.K.Panda,High-PerformanceHybridKey-ValueStoreonModernClusterswithRDMAInterconnectsandSSDs:Non-blockingExtensions,Designs,andBenefits,IPDPS,May2016

BLOCKINGAPI

NON-BLOCKINGAPIREQ.

NON-BLOCKINGAPIREPLY

CLIENT

SERV

ER

HYBRIDSLABMANAGER(RAM+SSD)

RDMA-ENHANCEDCOMMUNICATIONLIBRARY

RDMA-ENHANCEDCOMMUNICATIONLIBRARY

LIBMEMCACHEDLIBRARY

BlockingAPIFlow

Non-BlockingAPIFlow


–  Datadoesnotfitinmemory:Non-blockingMemcachedSet/GetAPIExtensionscanachieve•  >16xlatencyimprovementvs.blockingAPIoverRDMA-Hybrid/RDMA-Memw/penalty•  >2.5xthroughputimprovementvs.blockingAPIoverdefault/opFmizedRDMA-Hybrid

–  Datafitsinmemory:Non-blockingExtensionsperformsimilartoRDMA-Mem/RDMA-Hybridand>3.6ximprovementoverIPoIB-Mem

PerformanceEvalua4onwithNon-BlockingMemcachedAPI

0

500

1000

1500

2000

2500

Set Get Set Get Set Get Set Get Set Get Set Get

IPoIB-Mem RDMA-Mem H-RDMA-Def H-RDMA-Opt-BlockH-RDMA-Opt-NonB-i

H-RDMA-Opt-NonB-b

AverageLatency(us)

MissPenalty(BackendDBAccessOverhead)ClientWaitServerResponseCacheUpdateCachecheck+Load(Memoryand/orSSDread)SlabAllocaFon(w/SSDwriteonOut-of-Mem)

H=HybridMemcachedoverSATASSDOpt=AdapFveslabmanagerBlock=DefaultBlockingAPINonB-i=Non-blockingiset/igetAPINonB-b=Non-blockingbset/bgetAPIw/bufferre-useguarantee


•  DesignFeatures–  Memcached-basedburst-buffer

system

•  Hideslatencyofparallelfilesystemaccess

•  ReadfromlocalstorageandMemcached

–  DatalocalityachievedbywriFngdatatolocalstorage

–  DifferentapproachesofintegraFonwithparallelfilesystemtoguaranteefault-tolerance

Accelera4ngI/OPerformanceofBigDataAnaly4csthroughRDMA-basedKey-ValueStore

Applica4on

I/OForwardingModule

Map/ReduceTask DataNode

LocalDisk

DataLocalityFault-tolerance

Lustre

Memcached-basedBurstBufferSystem


Evalua4onwithPUMAWorkloads

GainsonOSURIwithourapproach(Mem-bb)on24nodes

•  SequenceCount:34.5%overLustre,40%overHDFS

•  RankedInvertedIndex:27.3%overLustre,48.3%overHDFS

•  HistogramRaFng:17%overLustre,7%overHDFS

050010001500200025003000350040004500

SeqCount RankedInvIndex HistoRaFng

Execu4

onTim

e(s)

Workloads

HDFS(32Gbps)Lustre(32Gbps)Mem-bb(32Gbps)

48.3%

40%

17%

N.S.Islam,D.Shankar,X.Lu,M.W.Rahman,andD.K.Panda,Accelera4ngI/OPerformanceofBigDataAnaly4cswithRDMA-basedKey-ValueStore,ICPP’15,September2015


•  Thecurrentbenchmarksprovidesomeperformancebehavior

•  However,donotprovideanyinformaFontothedesigner/developeron:–  Whatishappeningatthelower-layer?

–  Wherethebenefitsarecomingfrom?

–  Whichdesignisleadingtobenefitsorbo<lenecks?

–  Whichcomponentinthedesignneedstobechangedandwhatwillbeitsimpact?

–  Canperformancegain/lossatthelower-layerbecorrelatedtotheperformancegain/lossobservedattheupperlayer?

AretheCurrentBenchmarksSufficientforBigData?




andIntelligentNICs)

StorageTechnologies

(HDD,SSD,andNVMe-SSD)


Applica4ons



OtherProtocols?

Communica4onandI/OLibrary

Point-to-PointCommunica4on

QoS



Virtualiza4on

Benchmarks

RDMAProtocols

ChallengesinBenchmarkingofRDMA-basedDesigns

CurrentBenchmarks

NoBenchmarks

Correla4on?


•  Acomprehensivesuiteofbenchmarksto–  CompareperformanceofdifferentMPIlibrariesonvariousnetworksandsystems–  Validatelow-levelfuncFonaliFes–  ProvideinsightstotheunderlyingMPI-leveldesigns

•  Startedwithbasicsend-recv(MPI-1)micro-benchmarksforlatency,bandwidthandbi-direcFonalbandwidth•  Extendedlaterto

–  MPI-2one-sided–  CollecFves–  GPU-awaredatamovement–  OpenSHMEM(point-to-pointandcollecFves)–  UPC

•  Hasbecomeanindustrystandard•  Extensivelyusedfordesign/developmentofMPIlibraries,performancecomparisonofMPIlibrariesandeven

inprocurementoflarge-scalesystems•  Availablefromh<p://mvapich.cse.ohio-state.edu/benchmarks

•  AvailableinanintegratedmannerwithMVAPICH2stack

OSUMPIMicro-Benchmarks(OMB)Suite




andIntelligentNICs)

StorageTechnologies

(HDD,SSD,andNVMe-SSD)


Applica4ons



OtherProtocols?

Communica4onandI/OLibrary

Point-to-PointCommunica4on

QoS



Virtualiza4on

Benchmarks

RDMAProtocols

Itera4veProcess–RequiresDeeperInves4ga4onandDesignforBenchmarkingNextGenera4onBigDataSystemsandApplica4ons

Applica4ons-LevelBenchmarks

Micro-Benchmarks


•  HDFSBenchmarks

–  SequenFalWriteLatency(SWL)Benchmark

–  SequenFalReadLatency(SRL)Benchmark

–  RandomReadLatency(RRL)Benchmark

–  SequenFalWriteThroughput(SWT)Benchmark

–  SequenFalReadThroughput(SRT)Benchmark

•  MemcachedBenchmarks

–  GetBenchmark

–  SetBenchmark

–  MixedGet/SetBenchmark

•  AvailableasapartofOHB0.8

OSUHiBDBenchmarks(OHB)N.S.Islam,X.Lu,M.W.Rahman,J.Jose,andD.K.Panda,AMicro-benchmarkSuiteforEvalua4ngHDFSOpera4onsonModernClusters,Int'lWorkshoponBigDataBenchmarking(WBDB'12),December2012

D.Shankar,X.Lu,M.W.Rahman,N.Islam,andD.K.Panda,AMicro-BenchmarkSuiteforEvalua4ngHadoopMapReduceonHigh-PerformanceNetworks,BPOE-5(2014)

X.Lu,M.W.Rahman,N.Islam,andD.K.Panda,AMicro-BenchmarkSuiteforEvalua4ngHadoopRPConHigh-PerformanceNetworks,Int'lWorkshoponBigDataBenchmarking(WBDB'13),July2013

TobeReleased


•  UpcomingReleasesofRDMA-enhancedPackageswillsupport

–  HBase

–  Impala

•  UpcomingReleasesofOSUHiBDMicro-Benchmarks(OHB)willsupport

–  MapReduce

–  RPC•  Advanceddesignswithupper-levelchangesandopFmizaFons

–  MemcachedwithNon-blockingAPI

–  HDFS+Memcached-basedBurstBuffer

On-goingandFuturePlansofOSUHighPerformanceBigData(HiBD)Project


•  DiscussedchallengesinacceleraFngHadoop,SparkandMemcached

•  PresentediniFaldesignstotakeadvantageofInfiniBand/RDMAforHDFS,MapReduce,RPC,Spark,andMemcached

•  Resultsarepromising

•  Manyotheropenissuesneedtobesolved

•  WillenableBigDatacommunitytotakeadvantageofmodernHPCtechnologiestocarryouttheiranalyFcsinafastandscalablemanner

ConcludingRemarks


FundingAcknowledgmentsFundingSupportby

EquipmentSupportby


PersonnelAcknowledgmentsCurrentStudents

–  A.AugusFne(M.S.)

–  A.Awan(Ph.D.)–  S.Chakraborthy(Ph.D.)

–  C.-H.Chu(Ph.D.)–  N.Islam(Ph.D.)

–  M.Li(Ph.D.)

PastStudents–  P.Balaji(Ph.D.)

–  S.Bhagvat(M.S.)

–  A.Bhat(M.S.)

–  D.BunFnas(Ph.D.)

–  L.Chai(Ph.D.)

–  B.Chandrasekharan(M.S.)

–  N.Dandapanthula(M.S.)

–  V.Dhanraj(M.S.)

–  T.Gangadharappa(M.S.)–  K.Gopalakrishnan(M.S.)

–  G.Santhanaraman(Ph.D.)–  A.Singh(Ph.D.)

–  J.Sridhar(M.S.)

–  S.Sur(Ph.D.)

–  H.Subramoni(Ph.D.)

–  K.Vaidyanathan(Ph.D.)

–  A.Vishnu(Ph.D.)

–  J.Wu(Ph.D.)

–  W.Yu(Ph.D.)

PastResearchScien:st–  S.Sur

CurrentPost-Doc–  J.Lin

–  D.Banerjee

CurrentProgrammer–  J.Perkins

PastPost-Docs–  H.Wang

–  X.Besseron–  H.-W.Jin

–  M.Luo

–  W.Huang(Ph.D.)–  W.Jiang(M.S.)

–  J.Jose(Ph.D.)

–  S.Kini(M.S.)

–  M.Koop(Ph.D.)

–  R.Kumar(M.S.)

–  S.Krishnamoorthy(M.S.)

–  K.Kandalla(Ph.D.)

–  P.Lai(M.S.)

–  J.Liu(Ph.D.)

–  M.Luo(Ph.D.)–  A.Mamidala(Ph.D.)

–  G.Marsh(M.S.)

–  V.Meshram(M.S.)

–  A.Moody(M.S.)

–  S.Naravula(Ph.D.)

–  R.Noronha(Ph.D.)

–  X.Ouyang(Ph.D.)

–  S.Pai(M.S.)

–  S.Potluri(Ph.D.)

–  R.Rajachandrasekar(Ph.D.)

–  K.Kulkarni(M.S.)–  M.Rahman(Ph.D.)

–  D.Shankar(Ph.D.)–  A.Venkatesh(Ph.D.)

–  J.Zhang(Ph.D.)

–  E.Mancini–  S.Marcarelli

–  J.Vienne

CurrentResearchScien:stsCurrentSeniorResearchAssociate–  H.Subramoni

–  X.Lu

PastProgrammers–  D.Bureddy

-K.Hamidouche

CurrentResearchSpecialist–  M.Arnold


SecondInterna4onalWorkshoponHigh-PerformanceBigDataCompu4ng(HPBDC)

HPBDC2016willbeheldwithIEEEInterna4onalParallelandDistributedProcessingSymposium(IPDPS2016),Chicago,IllinoisUSA,May27th,2016

KeynoteTalk:Dr.ChaitanyaBaru,SeniorAdvisorforDataScience,Na4onalScienceFounda4on(NSF);Dis4nguishedScien4st,SanDiegoSupercomputerCenter(SDSC)

PanelModerator:JianfengZhan(ICT/CAS)

PanelTopic:MergeorSplit:MutualInfluencebetweenBigDataandHPCTechniques

SixRegularResearchPapersandTwoShortResearchPapers

hWp://web.cse.ohio-state.edu/~luxi/hpbdc2016

HPBDC2015washeldinconjunc4onwithICDCS’15

hWp://web.cse.ohio-state.edu/~luxi/hpbdc2015


[email protected]

ThankYou!

TheHigh-PerformanceBigDataProjecth<p://hibd.cse.ohio-state.edu/

Network-BasedCompuFngLaboratoryh<p://nowlab.cse.ohio-state.edu/

TheMVAPICH2Projecth<p://mvapich.cse.ohio-state.edu/