how the indigo-datacloud computing platform aims at ......indigo-datacloud • an h2020 project...
TRANSCRIPT
![Page 1: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/1.jpg)
HowtheINDIGO-DataCloudcomputingplatformaims
athelpingscientificcommunities
RIA-653549Giacinto DONVITO
INDIGOTechnicalDirectorINFNBari
![Page 2: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/2.jpg)
INDIGO-DataCloud
• AnH2020projectapprovedinJanuary2015intheEINFRA-1-2014call• 11.1M€,30months (fromApril2015toSeptember2017)
• Who:26Europeanpartnersin11Europeancountries• CoordinationbytheItalianNationalInstituteforNuclearPhysics(INFN)• Includingdevelopersofdistributedsoftware,industrialpartners,researchinstitutes,universities,e-infrastructures
• What:developanopensourceCloudplatform forcomputinganddata(“DataCloud”)tailoredtoscience.
• For:multi-disciplinaryscientificcommunities• E.g.structuralbiology, earthscience,physics,bioinformatics, culturalheritage,astrophysics,lifescience,climatology
• Where:deployableonhybrid(publicorprivate)Cloudinfrastructures• INDIGO=INtegratingDistributeddataInfrastructuresforGlobalExplOitation
• Why:answertothetechnologicalneedsofscientistsseekingtoeasilyexploitdistributedCloud/Gridcomputeanddataresources. 2
![Page 3: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/3.jpg)
FromthePaper“AdvancesinCloud”
• ECExpertGroupReportonCloudComputing,http://cordis.europa.eu/fp7/ict/ssai/docs/future-cc-2may-finalreport-experts.pdf
To reach the full promises of CLOUD computing, major aspects have not yet beendeveloped and realised and in some cases not even researched. Prominent among theseare open interoperation across (proprietary) CLOUD solutions at IaaS, PaaS and SaaSlevels. A second issue is managing multitenancy at large scale and in heterogeneousenvironments. A third is dynamic and seamless elasticity from in- house CLOUD to publicCLOUDs for unusual (scale, complexity) and/or infrequent requirements. A fourth is datamanagement in a CLOUD environment: bandwidth may not permit shipping data to theCLOUD environment and there are many associated legal problems concerning securityand privacy. All these challenges are opportunities towards a more powerful CLOUDecosystem.[…] A major opportunity for Europe involves finding a SaaS interoperable solution acrossmultiple CLOUD platforms. Another lies in migrating legacy applications without losingthe benefits of the CLOUD, i.e. exploiting the main characteristics, such as elasticity etc.
3
![Page 4: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/4.jpg)
INDIGOAddressesCloudGaps
• INDIGOfocusesonusecasespresentedbyitsscientificcommunities toaddressthegapsidentifiedbythepreviouslymentionedECReport,withregardto:• Redundancy/reliability• Scalability(elasticity)• Resourceutilization• Multi-tenancyissues• Lock-in• MovingtotheCloud• Datachallenges:streaming,multimedia,bigdata• Performance
• Reusingexistingopensourcecomponentswhereverpossibleandcontributingtoupstreamprojects (suchasOpenStack,OpenNebula,Galaxy,etc.)forsustainability.
4IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 5: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/5.jpg)
INDIGOandotherEuropeanProjects• TheINDIGOservicesarebeingdevelopedaccordingtotherequirementscollectedwithinmanymultidisciplinaryscientificcommunities,suchasELIXIR,WeNMR,INSTRUCT,EGI-FedCloud,DARIAH,INAF-LBT,CMCC-ENES,INAF-CTA,LifeWatch-Algae-Bloom,EMSO-MOIST,EuroBioImaging.However,theyareimplementedsothattheycanbeeasilyreusedbyotherusercommunities.• INDIGOhasstrongrelationshipswithcomplementaryinitiatives,suchasEGI-EngageontheoperationalsideandAARCwithrespecttoAuthN/AuthZ policies.UsersofEC-fundedinitiativessuchasPRACE andEUDAT arealsoexpectedtobenefitfromthedeploymentofINDIGOcomponentsinsuchinfrastructures.• SeveralNational/Regionalinfrastructuresarecoveredbythe26INDIGOpartners,locatedin11Europeancountries.• INDIGOismentionedintherecentImportantProjectofCommonEuropeanInterest(IPCEI) fortheexploitationofHPCandHTCresourcesatnational,regionalandEuropeanlevels.
5IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 6: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/6.jpg)
WorkPackages
6IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 7: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/7.jpg)
INDIGO-DataCloudGeneralArchitecture
7
JSAGA/JSAGAAdaptorsFuture GatewayEngineFuture GatewayRESTAPI
OtherScienceGateways
Mobile Apps
OpenMobileToolkit
Ophidpiaplugin
LONIplugin
Taverna,Keplerplugin
AdminPortlets
UserPortlets
DataAnalitics
WorkflowPortlets
SGMonGUIClients
FutureGatewayPortal WorkflowsMobileclientsSupportservices
WP6Services
Kubernetes Cluster
IAM
Service
PaaS
Orchestrator
QoS/SLA
CloudProvider
Ranker
Monitoring
Infrastructure
Manager
TOSCA
TOSCAWP5
Services
Onedata Dynafed
FTSDataServices
REST/CDMI/Wedbav/posix/GridftpOIDC
Accounting
Non-INDIGO
IaaS
NativeIaaS API
Heat/IM
TOSCA
WP4Services
Mesos
ClusterMesos
Cluster
Aut.Scaling
Service
Storage
Service
S3/CDMI/Posix/WebdavGridFTP
Smart
Scheduling
SpotIstances
Native
Docker
QoS Support
Identity
Armonization
Local
Repository
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 8: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/8.jpg)
IaaSFeatures(1)
• Improvedschedulingforallocationofresources bypopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• Enhancementswilladdressbothbetterschedulingalgorithmsandsupportforspot-instances.Thelatterareinparticularneededtosupportallocationmechanisms similartothoseavailableonpubliccloudssuchasAmazonandGoogle.
• Wewillalsosupportdynamicpartitioningofresourcesamong“traditionalbatchsystems”andCloudinfrastructures(forsomeLRMS).
• SupportforstandardsinIaaSresourceorchestrationengines throughtheuseoftheTOSCAstandard.• ThisovercomestheportabilityandusabilityproblemthatwaysoforchestratingresourcesinCloudcomputingframeworkswidelydifferamongeachother.
• ImprovedIaaSorchestrationcapabilities forpopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• EnhancementswillincludethedevelopmentofcustomTOSCAtemplatestofacilitateresourceorchestrationforendusers,increasedscalabilityofdeployedresourcesandsupportoforchestrationcapabilitiesforOpenNebula.
8IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 9: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/9.jpg)
IaaSFeatures(2)
• ImprovedQoS capabilitiesofstorageresources.• Bettersupportofhigh-levelstoragerequirementssuchasflexibleallocationofdiskortapestoragespaceandsupportfordatalifecycle.Thisisanenhancementalsowithrespecttowhatiscurrentlyavailableinpublicclouds,suchasAmazonGlacierandGoogleCloudStorage.
• Improvedcapabilitiesfornetworkingsupport.• EnhancementswillincludeflexiblenetworkingsupportinOpenNebula andhandlingofnetworkconfigurationsthroughdevelopmentsoftheOCCIstandardforbothOpenNebula andOpenStack.
• ImprovedandtransparentsupportforDockercontainers.• IntroductionofnativecontainersupportinOpenNebula,developmentofstandardinterfacesusingtheOCCIprotocoltodrivecontainersupportinbothOpenNebulaandOpenStack.
9IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 10: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/10.jpg)
PaaSFeatures(1)
• ImprovedcapabilitiesinthegeographicalexploitationofCloudresources.• Endusersneednottoknowwhereresourcesarelocated,becausetheINDIGOPaaSlayerishidingthecomplexityofbothschedulingandbrokering.
• StandardinterfacetoaccessPaaSservices.• Currently,eachPaaSsolutionavailableonthemarketisusingadifferentsetofAPIs,languages,etc.INDIGOwillusetheTOSCAstandardtohidethesedifferences.
• SupportfordatarequirementsinCloudresourceallocations.• Resourcescanbeallocatedwheredataisstored.
• IntegrateduseofresourcescomingfrombothpublicandprivateCloudinfrastructures.• TheINDIGOresourceorchestratoriscapableofaddressingbothtypesofCloudinfrastructuresthroughTOSCAtemplateshandledateitherthePaaSorIaaSlevel.
10IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 11: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/11.jpg)
PaaSFeatures(2)
• Distributeddatafederations supportinglegacyapplicationsaswellashighlevelcapabilitiesfordistributedQoS andDataLifecycleManagement.• ThisincludesforexampleremotePosix accesstodata.
• IntegratedIaaSandPaaSsupportinresourceallocations.• Forexample,storageprovidedattheIaaSlayerisautomaticallymadeavailabletohigher-levelallocationresourcesperformedatthePaaSlayer.
• Transparentclient-sideimport/exportofdistributedClouddata.• Thissupportsdropbox-likemechanismsforimportingandexportingdatafrom/totheCloud.ThatdatacanthenbeeasilyingestedbyCloudapplicationsthroughtheINDIGOunifieddatatools.
• Supportfordistributeddatacachingmechanismsandintegrationwithexistingstorageinfrastructures.• INDIGOstoragesolutionsarecapableofprovidingefficientaccesstodataandoftransparentlyconnectingtoPosix filesystemsalreadyavailableindatacenters.
11IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 12: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/12.jpg)
PaaSFeatures(3)
• Deployment,monitoringandautomaticscalabilityofexistingapplications.• Forexample,existingapplicationssuchaswebfront-endsorR-Studioserverscanbeautomaticallyanddynamicallydeployedinhighly-availableandscalableconfigurations.
• Integratedsupportforhigh-performanceBigDataanalytics.• ThisincludescustomframeworkssuchasOphidia(providingahighperformanceworkflowexecutionenvironmentforBigDataAnalyticsonlargevolumesofscientificdata)aswellasgeneralpurposeenginesforlarge-scaledataprocessingsuchasSpark,allintegratedtomakeuseoftheINDIGOPaaSfeatures.
• Supportfordynamicandelasticclustersofresources.• ResourcesandapplicationscanbeclusteredthroughtheINDIGOAPIs.Thisincludesforexamplebatchsystemson-demand(suchasHTCondor orTorque)andextensibleapplicationplatforms(suchasApacheMesos)capableofsupportingbothapplicationexecutionandinstantiationoflong-runningservices.
12IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 13: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/13.jpg)
AAIFeatures
• Provideanadvancedsetoffeaturesthatincludes:• Userauthentication(supportingSAML,OIDC,X.509)• Identityharmonization(linkheterogeneousAuthN mechanismstoasingleVOidentity)• ManagementofVOmembership(i.e.,groupsandotherattributes)• Managementofregistrationandenrolmentflows• ProvisioningofVOstructureandmembershipinformationtoservices• Management,distributionandenforcementofauthorizationpolicies
13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 14: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/14.jpg)
StorageQualityofServiceandtheCloud
14
Amazon S3 Glacier
Google Standard DurableReducesAvailability Nearline
HPSS/GPFS CorrespondstotheHPSSClasses(customizable)
dCache Resilient TAPEdisk+tape
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 15: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/15.jpg)
Nextstep:DataLifeCycle
15
• DataLifeCycleisjustthetimedependentchangeof• StorageQualityofService• OwnershipandAccessControl(PIOwned,noaccess,SiteOwned,Publicaccess)• Paymentmodel:Payasyougo;Payinadvanceforrestoflifetime.• Maybeotherthings
6m 1years 10years
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 16: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/16.jpg)
DataFederation
AmazonS3
DNS:p-aws-useast
INFNItaly
DockerOneclient
Docker
AWSUSA
DockerOnezone
VMonezone
DockerOneclient
Docker
NFSServer
VMoneprovider
VMnfs
VMoneclient
POSIXVolume
DockerOneclient
DockerUPVSpain
VM:demo-onedata-upv-provider
DockerOneclient
LaptopOSX
SAMBAExport
boot2docker
20IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 17: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/17.jpg)
17
FrontendServices/Toolkit
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 18: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/18.jpg)
Integration schemas
• WeprovidethegraphicaluserinterfacesintheformofthescientificgatewaysandworkflowsandthewaytoaccesstheINDIGOPaaS servicesandsoftwarestack,andallowdefineandsetuptheon-demandinfrafortheWP2usecases.• Settingupwholeusecaseinfrastructure:Theadministratorwillbeprovidedwiththereadytousereceiptsthathewillbeabletocustomize.Thefinaluserswillbeprovidedwiththeserviceend-pointsandwillnotbeawareofthebackend.
• UsetheINDIGOfeaturesfromtheirownPortals: Usercommunities, havingtheirownScientificGatewaysetup,canexploittheFutureGateway RESTAPItodealwithINDIGOwholesoftwarestack.
• UseoftheINDIGOtoolsandportals, including theFutureGateway,ScientificWorkflowsSystems,BigDataAnalyticsFrameworks(likeOphidia),MobileApplications.InthisscenariothefinalusersaswellasdomainadministratorswillusetheGUItools.Theadministratorwilluseitasdescribedinfirstcase.Inadditiondomainspecificuserswillbeprovidedwithspecificportlets/workflows/apps thatwillallowgraphicalinteractionwiththeirapplicationsrunviaINDIGOsoftwarestack.
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 18
![Page 19: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/19.jpg)
FromCSGFtoFutureGateway
GridEngine
JSAGA
Portlet Portlet …
ClassicCSGF (before INDIGO)
Liferay/Glassfish
JSAGA
Portlet Portlet …
FutureGateway Approach (INDIGO)
Liferay/Tomcat
Comunication Portlet-GridEngine-JSAGAonly possiblewithJAVAlibraries
APIServer
Comunication Portlet-APIServerviaRESTAPIs,thisallowstoserveexternalapplicationsTheAPIServerinteractsviaJAVA librariestoJSAGA
RESTAPIs
Web/MobileApps
• ThesameRESTAPIscouldbeusedbyMobileApps
• ThoseAPIsmakeeasiertheinteractionwiththePaaS layer
• ThoseRESTAPIsprovideaneasyexploitationofINDIGOCapabilitiestonon-INDIGOApplications
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud19
![Page 20: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/20.jpg)
Ophidia framework
• Ophidia isabigdataanalyticsframeworkforeScience• Primarilyusedfortheanalysisofclimatedata,exploitableinmultipledomains• “Datacube”abstractionandOLAP-basedapproachforbigdata• Supportforarray-baseddataanalysisandscientificdataformats• Parallelcomputingtechniquesandsmartdatadistributionmethods• ~100array-basedprimitivesand~50datacubeoperators
• i.e.:datasub-setting, dataaggregation,array-basedtransformations,datacube roll-up/drill-down,datacubeimport,etc.
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 20
![Page 21: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/21.jpg)
INDIGOmoduleforKepler
• TheKeplerscientificworkflowsystemisanopensourcetoolthatenablescreation,executionandsharingofworkflowsacrossabroadrangeofscientificandengineeringdisciplines.• FirstversionoftheINDIGOmoduledelivered,graduallyadded newfunctionalitiesavailablefortheusers.• INDIGOmodulebased ontheFutureGateway API• Atthemoment,itispossibletobuildworkflowsthatdefinetask,preparesinputsandtriggersexecution.WhileataskisexecutedwithinINDIGO'sinfrastructure,itispossibletocheckitsstatus.• FutureGateway APIclient: https://github.com/indigo-dc/indigoclient• Keplerbasedactors: https://github.com/indigo-dc/indigokepler
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 21
![Page 22: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/22.jpg)
22
Usecasesexamples
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 23: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/23.jpg)
Service Deployment andapplication execution
Integrating distributed data infrastructures with INDIGO-DataCloud 21
WearenowworkingonaddingaCaliconetworkconfiguration
![Page 24: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/24.jpg)
Application execution tothe PaaS Layer
• TheINDIGOapproachtotheapplicationdistributionandexecutionis:• BasedonDocker• ExploitsMesos+Chronos• AlltheapplicationexecutionsaredescribedexploitingaTOSCATemplatesviasimpleAPIsorPortlets• Theinput/outputareautomaticallymanagedbythePaaS layer(viaOnedata andexternalendpoints)• Dependencies,retryonfailuressupportedbymeansofChronos• Geographicaldata-awareschedulingprovidebyINDIGOPaaS orchestrator
13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 25: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/25.jpg)
• chronos_job_1:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Executeapp'• command: /bin/bashrun.sh• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_SPACE: {get_input: InputOnedataSpace}• INPUT_PATH: {get_input: InputPath}• ....• artifacts:• image:• file: indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1
• chronos_job_upload:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Uploadoutputdata'• command: /bin/bashrun.sh• retries:3
• environment_variables:• PROVIDER_HOSTNAME: <ONEDATA_PROVIDER_IP>• ONEDATA_TOKEN:<ROBOTToken>• ONEDATA_SPACE:<path>• INPUT_FILENAME: <inputfilename>• OUTPUT_FILENAME: <inputfilename-->coincindeswithamber-job-01
OUTPUT_FILENAME>• OUTPUT_PROTOCOL:http(s)|ftp(s)|S3|Swift|WebDav• OUTPUT_URL: <outputURL>• OUTPUT_CREDENTIALS: <e.g.username:password>• artifacts:• image:• file: indigodatacloud/jobuploader• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1• - job_predecessor: chronos_job_1•• docker_runtime1:• type:tosca.nodes.indigo.Container.Runtime.Docker• capabilities:• host:• properties:• num_cpus: 0.5• mem_size: 512MB
![Page 26: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/26.jpg)
• tosca_definitions_version:tosca_simple_yaml_1_0• imports:• - indigo_custom_types:https://raw.githubusercontent.com/indigo-dc/tosca-
types/master/custom_types.yaml• description:>• TOSCAexamplesforspecifyingaChronos Jobthatrunsanapplicationusing
Onedata storage.• inputs:• input_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sINPUTOnedata space• required:yes• output_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sOUTPUTOnedata space.
Itcanbethesameastheinputtoken• required:yes• #data_locality:• #type:boolean• #description:FlagthatcontrolstheINPUTdatalocality:ifyestheorchestrator
willselectthebestprovider,ifnotheuserhastospecifytheprovidertobeused• #required:yes• input_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomounttheInput
Onedata space.Ifnotprovided,datalocalityalgo willbeapplied.• entry_schema:• type:string• default:['']• required:no• output_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomountthe
OutputOnedata space.Ifnotprovided,thesameprovider(s)usedtomounttheinputspacewillbeused.
• entry_schema:• type:string• default:['']• required:no• input_onedata_space:• type:string
• required:yes• output_path:• type:string• description:PathtotheoutputdatainsidetheOutputOnedata space• required:yes• output_filenames:• type:list• description:Listoffilenamesgeneratedbytheapplicationrun• entry_schema:• type:string• required:yes• cpus:• type:float• description:AmountofCPUsforthisjob• required:yes• mem:• type:float• description:AmountofMemory(MB)forthisjob• required:yes• topology_template:• node_templates:• chronos_job:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule:'R0/2015-12-25T17:22:00Z/PT5M'• name:'JOB_ID_TO_BE_SET_BY_THE_ORCHESTRATOR'• description:'Executeapp'• command:'/bin/bashrun.sh'• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_TOKEN: {get_input:input_onedata_token }• OUTPUT_ONEDATA_TOKEN: {get_input:output_onedata_token }• INPUT_ONEDATA_PROVIDERS: {get_input:input_onedata_providers }• OUTPUT_ONEDATA_PROVIDERS: {get_input:output_onedata_providers }• INPUT_ONEDATA_SPACE: {get_input:input_onedata_space }• ....• artifacts:• image:• file:indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker
![Page 27: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/27.jpg)
UC:Awebportalthat exploits abatch systemtorunapplications
• Ausercommunitymaintainsa“vanilla”versionofportalandcomputingimageplussomespecificrecipestocustomizesoftwaretoolsanddata• Portalandcomputingarepartofthesameimagethatcantakedifferentroles.• Customizationmayincludecreatingspecialusers,copying(andregisteringintheportal)referencedata,installing(andagainregistering)processingtools.
• Typicallywebportalimagealsohasabatchqueueserverinstalled.
• Alltherunninginstancesshareacommondirectory.• Differentcredentials:end-userandapplicationdeployment.
13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 28: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/28.jpg)
UCInspiration:Galaxyonthecloud
• Galaxycanbeinstalledonadedicatedmachineorasafront/endtoabatchqueue.• Galaxyexposesawebinterfaceandexecutesalltheinteractions(includingdatauploading)asjobsinabatchqueue.• Requiresashareddirectoryamongtheworkingnodesandthefront/end.• Itsupportsaseparatestorageareafordifferentusers,managingthemthroughtheportal.
28IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 29: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/29.jpg)
UC:Awebportalthat exploits abatchsystem torunapplications
1) Thewebportalisinstantiated,installedandconfiguredautomaticallyexploitingAnsible recipesandTOSCATemplates.
2) Aremoteposix shareisautomaticallymountedonthewebportalusingOnedata
3) Thesameposix shareisautomaticallymountedalsoonworkernodesusingOnedata
4) End-userscanseeandaccessthesamefilesviasimplewebbrowsersorsimilar.5) AbatchsystemisdynamicallyandautomaticallyconfiguredviaTOSCA
Templates6) Theportalisautomaticallyconfiguredinordertoexecutejobonthebatch
cluster7) Thebatchclusterisautomaticallyscaledup&downlookingatthejobloadon
thebatchsystem.IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 29
![Page 30: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/30.jpg)
UC:UseCaseLifecycle
• Preliminary• Theusecaseadministratorcreatesthe“vanilla”imagesoftheportal+computingimage.
• Theusecaseadministrator,withthesupportofINDIGOexperts,writestheTOSCAspecificationoftheportal,queue,computingconfiguration.
• Group-specific• Theusecaseadministrator,withthesupportofINDIGOexperts,writesspecificmodulesforportal-specificconfigurations.
• Theusecaseadministratordeploysthevirtualappliance.• Dailywork• UsersAccesstheportalasifitwaslocallydeployedandsubmitJobstothesystemastheywouldhavebeenprovisionedstatically.
30IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 31: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/31.jpg)
UC:AGraphic Overview
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Future GatewayAPIServer
WP6
WP5
Front-EndPublic IP
Provider
User2)Deploy TOSCAwithVanilla VM/Container
1)StageData
5)Mount
6)AccessWebPortal
Galaxy
4)Install /Configure
WNWNWN …
VirtualElastic Cluster
Orchestrator
IM
OpenNebula
WP4
Other PaaSCore Services
CloudSite
OpenStack
HeatClues
IM
31
TOSCADocuments andDockerfiles perUseCase
INDIGO-DataCloudDocker Hub Organization
Champion+JRA
1.a.1)build,push
1.a.2)Dockerfile(commit)
1.b)AutomatedBuild
![Page 32: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/32.jpg)
ApossiblePhenomenal-INDIGOintegrationscenario
• Phenomenalalreadyrelyonaveryrichset-upexploitingMesos• INDIGOisabletoprovideacustomizableenvironmentwhereanaprioricomplexclustercouldbedeployedinanautomaticway:• UsingaspecificTOSCATemplatebuildwiththeexpertiseoftheINDIGOPaaS developers
• INDIGOcouldprovidetoPhenomenal:• (Automatic)Resourceprovisioningexploitinganykindofcloudenvironment(privateorpublic)
• Reactingonthemonitoringthestatusoftheservicesistantiated• AdvancedandflexibleAAIsolution• Advancedandflexibledatamanagementsolution• Advancedschedulingacrossmanycloudproviderbasedon:
• SLA/QoS,Datalocation,availabilitymonitoringandrankedwithhighlyflexiblerules• Easytousewebinterfacebothfortheendusersandfortheservicesadmin/developers
32IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 33: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/33.jpg)
Phenomenal exploiting INDIGO
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Future GatewayAPIServer
WP6
WP5
MesosMasters
Public IP
Provider
User2)Deploy TOSCAwithVanilla VM/Container
1)StageData
5)Mount
6)AccessMesosServices
Chronos/Marathon
4)Install /Configure
Workers…
VirtualElastic MesosCluster
Orchestrator
IM
OpenNebula
WP4
Other PaaSCore Services
CloudSite
OpenStack
HeatClues
IM
33
TOSCADocuments andDockerfiles perUseCase
INDIGO-DataCloudDocker Hub Organization
Champion+JRA
1.a.1)build,push
1.a.2)Dockerfile(commit)
1.b)AutomatedBuild
Workers
![Page 34: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/34.jpg)
INDIGOFAQ
• HowdoINDIGOachieveresourceredundancyandhighavailability?• Thisisachievedatmultiplelevels:
• atthedatalevel,redundancycanbeimplemented exploitingthecapabilityofINDIGO'sOnedata ofreplicatingdataacrossdifferentdatacenters.
• atthesitelevel, itispossibletoaskforcopiesofdatatobeforexampleonbothdiskandtapeusingtheINDIGOQoS storagefeatures.
• forservices,theINDIGOarchitectureusesMesos andMarathontoprovideautomaticservicehigh-availabilityandloadbalancing.Thisautomationiseasilyobtainableforstateless services; forstatefulservicesthisisapplication-dependent butitcannormallybeintegratedintoMesos through,forexample,acustomframework(examplesofwhichareprovidedbyINDIGO).
• HowdoINDIGOachieveresourcescalability?• Firstofall,wecandistinguishbetweenvertical(scaleup)andhorizontal(scaleout)scalability.INDIGOprovidesboth:• Mesos andMarathonhandleverticalscalabilitybydeployingDocker containerswithanincreasingamountofresources.
• TheINDIGOPaaS OrchestratorhandleshorizontalscalabilitythroughrequestsmadeattheIaaS leveltoaddresourceswhenneeded. 34
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 35: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/35.jpg)
INDIGOFAQ
• HowdoINDIGOachieveresourcescalability?• TheINDIGOsoftwaredoesthisinasmartway,i.e.forexampleitdoesnotlookatCPUloadonly:• InthecaseofadynamicallyinstantiatedLRMS,itchecksthestatusofjobsandqueuesandaccordinglyaddsorremovecomputingnodes.
• InthecaseofaMesos cluster,incasethereareapplicationstostartandtherenofreeresources,INDIGOstartsupmorenodes.ThishappenswithinthelimitsofthesubmittedTOSCAtemplates.Inotherwords,anygivenuserstayswithinthelimitsoftheTOSCAtemplatehehassubmitted;thisistruealsoforwhatregardsaccountingpurposes.
• Howdoyouknowwhenandwhereresourcesareavailable?• WeareextendingtheInformationSystemavailableintheEuropeanGridInfrastructure(EGI)toinformtheINDIGOPaaS orchestratorabouttheavailableIaaSinfrastructuresandabouttheservicestheyprovide.ItisthereforepossiblefortheINDIGOorchestratortooptimallychooseacertainIaaS infrastructuregiven,forexample,thelocationofacertaindataset.
35IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 36: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/36.jpg)
Conclusions
• Firstofficialreleasewillbe:endofJuly
• Thefirstprototypeisalreadyavailable:• Notalltheservicesandfeaturesareavailable• Thisisforinternalevaluation,butalreadysomeservicescouldbetested
• Alotofimportantdevelopmentarebeingcarriedonwiththeoriginaldeveloperscommunitysothatthecodemantenance isnot(only)inourhands
36 IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
![Page 37: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April](https://reader033.vdocuments.mx/reader033/viewer/2022060210/5f04ae517e708231d40f2ce4/html5/thumbnails/37.jpg)
Thankyou
https://www.indigo-datacloud.euBetterSoftwareforBetterScience.
37IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud