Transcript
Page 1: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

ResponsiveStorage:HomeAutomationfor

ResearchDataManagement

RyanChard,KyleChard,JasonAlt,DilworthY.Parkinson,SteveTuecke,andIanFosterArgonneNationalLab,UniversityofChicago,andLawrenceBerkeleyNationalLab

Page 2: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

TheProblem• Datagenerationratesareexploding

• Complexanalyticsprocesses

• Thedatalifecycleofteninvolvesmultipleorganisations,machines,andpeople

Thiscreatesasignificantstrainonresearchers

• Bestmanagementpractices(cataloguing,sharing,purging,etc.)canbeoverlooked

• Usefuldatamaybelost,siloed,orforgotten

Page 3: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLE:AprototyperesponsivestoragesolutionTransformstaticdatagraveyardsintoactive,responsivestoragedevices

• Automatedatamanagementprocessesandenforcebestpractices

• Reliableevent-drivenexecution

• Usersfocused:simpleif-trigger-then-actionrules• Accessibletoallendusers,notjustadminsandexpertusers

• Userscansetdatamanagementpoliciesandthenforgetaboutthem

• Combinerulesintoflowstocontrolend-to-enddatatransformations

• Passivelywaitsforfilesystemevents(verylittleoverhead)

• Filesystemagnostic– worksonbothedgeandleadershipplatforms

Page 4: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLEArchitecture(updated)Agent:

• Sits locally on the machine

• Detects & filters filesystem events

• Facilitates execution of actions

• Can receive new recipes

Service:

• Serverless architecture

• Lambda functions process events

• Orchestrates execution of actions

• Records and manages execution of flows

RippleRunner

Ripple Agent

SQLite

Filesystem

Docker,PBS,

SLURM,…

Lambda Functions

Monitor

Observers

Ripple Cloud API

Process

SQS Queue of events

Page 5: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLEAgentResponsiblefordetectingandreportingeventsofinterest

Filesystemagnostic– usesanappropriatemonitorfortheFS• LeveragesPythonWatchdogobservers

• inotify,polling,kqueue,etc.

• GlobusTransferAPIdetectsglobus events(transfer,create,delete)

RulesareretrievedfromthecloudserviceandstoredinanSQLitedatabase

Hybridfilteringmodel:• Localmonitorcheckseventsagainstactiverules• Iftheymatch,theyarereportedtothecloudforprocessing

Ripple Agent

SQLite

Filesystem

ProcessMonitor

Observers

Page 6: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLERunnerAbstractsexecutionenvironmentsandallowsjobsubmission/statuschecksviaAPI

HasaUUIDandpollsforactions– rulescaninvokeactionsonanyrunner

Canbedeployedalmostanywhere:LocallyinitiateDockercontainers,singularityexeccommands,andsubprocesses toactonlocalfiles(metadataextraction,dispatchjobs,etc.)

Cloudrunner(backedbyLambdafunctions)performscloudfunctions:Globustransfers,createsharedendpoints,sendemails,invokeotherLambdafunctionsetc.ThisfunctionsanAPIgatewayexposingtherunnerAPIandproxying requeststhroughtoLambdas

HPCsystemsemployarunnerforexposedbatchsubmission(currentlyjustSLURM)

RippleRunner

Docker,PBS,

SLURM,…

Page 7: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLECloudService• Gateway API exposes Ripple service

• Get rules

• Report events

• Update event status

• API either proxies Lambda functions (get rules) or inserts payload into SQS queue.

• Once an event reaches the SQS, it should not be lost

• SQS queue reports to SNS topic, triggering Lambda functions to pull from the queue• Dead letter queue after 3 processing failures

• CloudWatch timer triggers “cleanup” and “checkup” functions to process events still on the queue and outstanding jobs

Lambda Functions

Ripple Cloud API

SQS Queue of events

Page 8: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLERulesIFTTT-inspiredprogrammingmodel:

Triggers describewheretheeventiscomingfrom(filesystemcreateevents)andtheconditionstomatch(/path/to/monitor/.*.h5)

Actions describewhatservicetouse(e.g.,globus transfer)andargumentsforprocessing(source/dest endpoints).

Page 9: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

EventDetectionGoal:beabletomonitorHPCstorageworkload(>3milevents/day)

Inotify vspolling

Create/touch/delete10,000filesandrecordeventreportingduration(20ktotal)

Machines:• Laptop• c4.xlargeinstance• Edisonloginnode(gpfs)

Page 10: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

FilteringoverheadGoal:Determineoverheadcausedbyfilteringeventslocally

Measuredifferencesinevent/seconddetection

Filteringrequiresmatchingdirectorypathandfileextension

Pollingisoddasitonlypollsonceeverysecond

Page 11: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

LambdaPerformanceGoal:Understandlambdaperformancefordifferenttasks

ColdvsWarmedfunctions

Actions:• Globustransfer• SMSemail• DynamoDB insert/query

TransfersrequireahandshakewiththeGlobusservice,whichalsocommunicateswiththeendpoints

Page 12: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

UseCase:LargeSynopticSurveyTelescopeDevelopedarepresentativetestbedoftheLSSTstoragerequirements

• Automaticallypropagatedatabetweenstoragetiersandfacilities

• InvokeDockercontainerstoextractmetadataandmaintainafilecatalog

• Compressandarchivefiles

• Recoverdeleted/corruptedfileswhendeleteandmodificationeventsoccurCustodial Store

(Chile)

Archive: ANL’s Sparrow

Archiver

Landing

Magnetic

Forwarder

File Catalog

File Catalog

Custodial Store (NCSA)

Landing

Magnetic

Archive

metadataminidgzip

catalog....

1.

2.3.

4.

6.

7.

Page 13: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

UseCase:AdvancedLightSourceDeployedRippleonanALSandNERSCmachinetoautomatedataanalysis

• AtALS: DetectnewheartbeatbeamlinedataandinitiatetransfertoNERSC

• AtNERSC: Extractmetadata,createsbatchfile,dispatchanalysisjobto

Edisonqueue,detectresultandtransferbacktoALS

• AtALS: createasharedendpoint,notifycollaboratorsofresultviaemail

Page 14: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

NewUseCases

• AutomatedmetadataextractionandingestionintoGlobusSearch• UsessingularityandApacheTika toextractmetadata

• Metadataiswrappedintogmeta (json)documentsandingestedintosearch

• Offlinefeedbackmechanismforworkflows

• Researcherswantahumanqualitycontrolcomponent

• HaveRipplesendsubsetsofdatatoresearchersviaemailtocheckit

• Triggeractionsbasedoncontentofreplymessages

Page 15: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

SummaryEvent-drivenautomationofdatamanagementpractices

Userfocused

Monitoringagentagnostictounderlyingfilesystem

Serverless eventprocessingandactionorchestration

Page 16: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

FutureWorkMoreusecases!

Morerunners

Scalable&highperformanceeventmonitorsforleadershipresources

Programmingmodelforevent-baseddatamanagement

IntegrationwithGlobus


Top Related