responsive storage: home automation for research data ... · pbs, slurm, … lambda functions ......

16
Responsive Storage: Home Automation for Research Data Management Ryan Chard, Kyle Chard, Jason Alt, Dilworth Y. Parkinson, Steve Tuecke, and Ian Foster Argonne National Lab, University of Chicago, and Lawrence Berkeley National Lab

Upload: others

Post on 24-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

ResponsiveStorage:HomeAutomationfor

ResearchDataManagement

RyanChard,KyleChard,JasonAlt,DilworthY.Parkinson,SteveTuecke,andIanFosterArgonneNationalLab,UniversityofChicago,andLawrenceBerkeleyNationalLab

Page 2: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

TheProblem• Datagenerationratesareexploding

• Complexanalyticsprocesses

• Thedatalifecycleofteninvolvesmultipleorganisations,machines,andpeople

Thiscreatesasignificantstrainonresearchers

• Bestmanagementpractices(cataloguing,sharing,purging,etc.)canbeoverlooked

• Usefuldatamaybelost,siloed,orforgotten

Page 3: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLE:AprototyperesponsivestoragesolutionTransformstaticdatagraveyardsintoactive,responsivestoragedevices

• Automatedatamanagementprocessesandenforcebestpractices

• Reliableevent-drivenexecution

• Usersfocused:simpleif-trigger-then-actionrules• Accessibletoallendusers,notjustadminsandexpertusers

• Userscansetdatamanagementpoliciesandthenforgetaboutthem

• Combinerulesintoflowstocontrolend-to-enddatatransformations

• Passivelywaitsforfilesystemevents(verylittleoverhead)

• Filesystemagnostic– worksonbothedgeandleadershipplatforms

Page 4: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLEArchitecture(updated)Agent:

• Sits locally on the machine

• Detects & filters filesystem events

• Facilitates execution of actions

• Can receive new recipes

Service:

• Serverless architecture

• Lambda functions process events

• Orchestrates execution of actions

• Records and manages execution of flows

RippleRunner

Ripple Agent

SQLite

Filesystem

Docker,PBS,

SLURM,…

Lambda Functions

Monitor

Observers

Ripple Cloud API

Process

SQS Queue of events

Page 5: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLEAgentResponsiblefordetectingandreportingeventsofinterest

Filesystemagnostic– usesanappropriatemonitorfortheFS• LeveragesPythonWatchdogobservers

• inotify,polling,kqueue,etc.

• GlobusTransferAPIdetectsglobus events(transfer,create,delete)

RulesareretrievedfromthecloudserviceandstoredinanSQLitedatabase

Hybridfilteringmodel:• Localmonitorcheckseventsagainstactiverules• Iftheymatch,theyarereportedtothecloudforprocessing

Ripple Agent

SQLite

Filesystem

ProcessMonitor

Observers

Page 6: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLERunnerAbstractsexecutionenvironmentsandallowsjobsubmission/statuschecksviaAPI

HasaUUIDandpollsforactions– rulescaninvokeactionsonanyrunner

Canbedeployedalmostanywhere:LocallyinitiateDockercontainers,singularityexeccommands,andsubprocesses toactonlocalfiles(metadataextraction,dispatchjobs,etc.)

Cloudrunner(backedbyLambdafunctions)performscloudfunctions:Globustransfers,createsharedendpoints,sendemails,invokeotherLambdafunctionsetc.ThisfunctionsanAPIgatewayexposingtherunnerAPIandproxying requeststhroughtoLambdas

HPCsystemsemployarunnerforexposedbatchsubmission(currentlyjustSLURM)

RippleRunner

Docker,PBS,

SLURM,…

Page 7: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLECloudService• Gateway API exposes Ripple service

• Get rules

• Report events

• Update event status

• API either proxies Lambda functions (get rules) or inserts payload into SQS queue.

• Once an event reaches the SQS, it should not be lost

• SQS queue reports to SNS topic, triggering Lambda functions to pull from the queue• Dead letter queue after 3 processing failures

• CloudWatch timer triggers “cleanup” and “checkup” functions to process events still on the queue and outstanding jobs

Lambda Functions

Ripple Cloud API

SQS Queue of events

Page 8: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

RIPPLERulesIFTTT-inspiredprogrammingmodel:

Triggers describewheretheeventiscomingfrom(filesystemcreateevents)andtheconditionstomatch(/path/to/monitor/.*.h5)

Actions describewhatservicetouse(e.g.,globus transfer)andargumentsforprocessing(source/dest endpoints).

Page 9: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

EventDetectionGoal:beabletomonitorHPCstorageworkload(>3milevents/day)

Inotify vspolling

Create/touch/delete10,000filesandrecordeventreportingduration(20ktotal)

Machines:• Laptop• c4.xlargeinstance• Edisonloginnode(gpfs)

Page 10: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

FilteringoverheadGoal:Determineoverheadcausedbyfilteringeventslocally

Measuredifferencesinevent/seconddetection

Filteringrequiresmatchingdirectorypathandfileextension

Pollingisoddasitonlypollsonceeverysecond

Page 11: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

LambdaPerformanceGoal:Understandlambdaperformancefordifferenttasks

ColdvsWarmedfunctions

Actions:• Globustransfer• SMSemail• DynamoDB insert/query

TransfersrequireahandshakewiththeGlobusservice,whichalsocommunicateswiththeendpoints

Page 12: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

UseCase:LargeSynopticSurveyTelescopeDevelopedarepresentativetestbedoftheLSSTstoragerequirements

• Automaticallypropagatedatabetweenstoragetiersandfacilities

• InvokeDockercontainerstoextractmetadataandmaintainafilecatalog

• Compressandarchivefiles

• Recoverdeleted/corruptedfileswhendeleteandmodificationeventsoccurCustodial Store

(Chile)

Archive: ANL’s Sparrow

Archiver

Landing

Magnetic

Forwarder

File Catalog

File Catalog

Custodial Store (NCSA)

Landing

Magnetic

Archive

metadataminidgzip

catalog....

1.

2.3.

4.

6.

7.

Page 13: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

UseCase:AdvancedLightSourceDeployedRippleonanALSandNERSCmachinetoautomatedataanalysis

• AtALS: DetectnewheartbeatbeamlinedataandinitiatetransfertoNERSC

• AtNERSC: Extractmetadata,createsbatchfile,dispatchanalysisjobto

Edisonqueue,detectresultandtransferbacktoALS

• AtALS: createasharedendpoint,notifycollaboratorsofresultviaemail

Page 14: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

NewUseCases

• AutomatedmetadataextractionandingestionintoGlobusSearch• UsessingularityandApacheTika toextractmetadata

• Metadataiswrappedintogmeta (json)documentsandingestedintosearch

• Offlinefeedbackmechanismforworkflows

• Researcherswantahumanqualitycontrolcomponent

• HaveRipplesendsubsetsofdatatoresearchersviaemailtocheckit

• Triggeractionsbasedoncontentofreplymessages

Page 15: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

SummaryEvent-drivenautomationofdatamanagementpractices

Userfocused

Monitoringagentagnostictounderlyingfilesystem

Serverless eventprocessingandactionorchestration

Page 16: Responsive Storage: Home Automation for Research Data ... · PBS, SLURM, … Lambda Functions ... Deployed Ripple on an ALS and NERSC machine to automate data analysis •At ALS:Detect

FutureWorkMoreusecases!

Morerunners

Scalable&highperformanceeventmonitorsforleadershipresources

Programmingmodelforevent-baseddatamanagement

IntegrationwithGlobus