archer sp service quarterly report3 1. the service 1.1 service highlights this is the report for the...

15
1 ARCHER SP Service Quarterly Report Quarter 4 2016

Upload: others

Post on 12-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

1

ARCHERSPServiceQuarterlyReport

Quarter42016

Page 2: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

2

DocumentInformationandVersionHistoryVersion: 1.0Status Final

Author(s): AlanSimpson,AnneWhiting,AndyTurner,MikeBrown,StephenBooth,JoBeech-Brandt

Reviewer(s) AlanSimpson

Version Date Comments,Changes,Status Authors,contributors,

reviewers0.1 04/01/17 InitialDraft AnneWhiting0.2 10/01/17 Addedgraphs JoBeech-Brandt0.3 12/01/17 Reviewedandupdated AndyTurner0.4 12/01/17 Updatedafterreview AnneWhiting0.5 12/01/17 Reviewedandupdated AlanSimpson0.6 12/01/17 Updatedafterreview AnneWhiting1.0 13/01/17 VersionforEPSRC AlanSimpson

Page 3: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

3

1. TheService1.1ServiceHighlightsThisisthereportfortheARCHERSPServicefortheReportingPeriods:October2016,November2016andDecember2016• Utilisationonthesystemduring16Q4was94%ascomparedto95%in16Q3.Thecontinuedhigh

utilisationoftheservicesupportstheneedforongoinginvestmentinHPC.• CrayKNLSystemwasmadeavailabletousersinOctober:

o TheCray12-nodeXC40KNLsystemwasmadeavailabletouserson20October2016.SAFE

functionalitywasmodifiedtoallowuserstorequestaccesstotheKNLsystem.Thisisanoveldevelopmentsystemdesignedtoallowexperimentationandtechnologyevaluation.

o ThesystemremainsfreelyavailabletoalluserswithnokAUusagelimittoencourageuseandexperimentation.

o Duringtheimplementationandconfigurationofthesystem,muchusefulexperiencewasgainedofCLE6,whichwillsupporttheeventualupgradeofARCHERtoCLE6.

o Thereare188useraccountsontheKNLasof4January2017.3589jobsweresubmittedtotheKNLin16Q4using3540kAUs.TheKNLutilisationwas47%forthequarter.

• WehaveworkedwiththeCSEteamtoenabletheuploadofdatafromCrayALPSandRURtotheSAFEtoprovidericherdataontheuseofARCHER.Thiswillhelpunderstanduserrequirementsfortheserviceandplanforfutureservices.ThedatanowavailableinSAFEincludes:memoryusage,energyuse,applicationuse,process-placementinformation.

• TopreventunauthorisedaccesstouserdataduetotheworldwideLinuxsecurityvulnerability

(“DirtyCOW”),useraccesstoARCHERwasremovedbetween24and26October2016.MoredetailsareinSection2.3.3.

• Thefirst2stepsofthe3-stageprocesstoupgradetoCLE5.2UP04havebeensuccessfully

completed.TestingisunderwayontheTDStoensureasmoothtransitionfortheusercommunity.

• SAFEfunctionalityhasjustbeenintroducedtoallowuserstoregisterpublicationsagainst

projects,makingiteasierforthemtorecordpublicationsinResearchFish.Thisfunctionalitywillalsoprovideausefulinputtothebenefitsrealisationdatacollected.

• TheStage1documentreviewelementoftheexternalISO9001:2015audithasbeenpassed.

1.2ForwardLook• ThefinalstageoftheCLEupgradeto5.2UP04willtakeplaceon25January2017.

• FurtherworkwillcontinuetoprepareARCHERtoupgradetoCLE6,deliveringthenew

functionalitythatthisbringsandensuringthecontinuedfullsupportofthesystem’soperatingsystembyCray.

• Aspartofimprovingsecurityforusersaswellasinvestigatingpossiblefutureservice

requirements,wearecurrentlytestingtheuseoftwo-factorauthenticationforARCHERaccess

Page 4: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

4

usingtheGoogleauthenticator.Itishopedthatthisfunctionalitycanbemadeavailableasanoptionforusersearlyin2017.

• ThenextARCHERChampionsWorkshopwilltakeplaceinLeedson10February,co-locatedwith

HPC-SIGfollowingthedecisionforthemeetingstobebi-annualanddistributedaroundtheUK.

• WorkwillcontinuestoprepareforthefullISO9001:2015certificationauditwhichistakingplace20–21February2017.

• TheAnnualUserSurveywillbereleasedtousersatthebeginningofJanuary.Theresultsfromthe

surveywillbecirculatedandpublishedalongwiththeQ22017report.• Wewillrepeatthescheduleranalysisthatweundertookthroughoutsummer2016toensurethat

theschedulerconfigurationremainswellsuitedtotheARCHERworkload.

Page 5: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

5

2.ContractualPerformanceReportThisisthecontractualperformancereportfortheARCHERSPService.

2.1ServicePointsandServiceCreditsTheServiceLevelsandServicePointsfortheSPservicearedefinedasbelowinSchedule2.2.• 2.6.2-PhoneResponse(PR):90%ofincomingtelephonecallsansweredpersonallywithin2

minutesforanyServicePeriod.ServiceThreshold:85.0%;OperatingServiceLevel:90.0%.• 2.6.3-QueryClosure(QC):97%ofalladministrativequeries,problemreportsandnonin-depth

queriesshallbesuccessfullyresolvedwithin2workingdays.ServiceThreshold:94.0%;OperatingServiceLevel:97.0%.

• 2.6.4-NewUserRegistration(UR):ProcessNewUserRegistrationswithin1workingday.Definitions:OperatingServiceLevel:TheminimumlevelofperformanceforaServiceLevelwhichisrequiredbytheAuthorityiftheContractoristoavoidtheneedtoaccounttotheAuthorityforServiceCredits.ServiceThreshold:Thistermisnotdefinedinthecontract.Ourinterpretationisthatitreferstotheminimumallowedservicelevel.Belowthisthreshold,theContractorisinbreachofcontract.NonIn-Depth:Thistermisnotdefinedinthecontract.OurinterpretationisthatitreferstoBasicquerieswhicharehandledbytheSPService.ThisincludesallAdminqueries(e.g.requestsforDiskQuota,AdjustmentstoAllocations,CreationofProjects)andTechnicalQueries(Batchscriptquestions,highleveltechnical‘HowdoI?’requests).Queriesrequiringdetailedtechnicaland/orscientificanalysis(debugging,softwarepackageinstallations,codeporting)arereferredtotheCSETeamasIn-Depthqueries.ChangeRequest:Thistermisnotdefinedinthecontract.TherearetimeswhenSPreceivesrequeststhatmayrequirechangestobedeployedonARCHER.Theserequestsmaycomefromtheusers,theCSEteamorCray.ExamplesmayincludethedeploymentofnewOSpatches,thedeploymentCraybugfixes,ortheadditionofnewsystemssoftware.SuchchangesaresubjecttoChangeControlandmayhavetowaitforaMaintenanceSession.Thenatureofsuchrequestsmeansthattheycannotbecompletedin2workingdays.

2.1.1ServicePointsInthepreviousServiceQuartertheServicePointscanbesummarisedasfollows:

Period Oct16 Nov16 Dec16 16Q4Metric Service

LevelServicePoints

ServiceLevel

ServicePoints

ServiceLevel

ServicePoints

ServicePoints

2.6.2–PR 100% -5 100% -5 100% -5 -15

2.6.3–QC 99.0% -2 99.9% -2 99.8% -2 -62.6.4–UR 1WD 0 1WD 0 1WD 0 0Total -7 -7 -7 -20ThedetailsoftheabovecanbefoundinSection2.2ofthisreport.

Page 6: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

6

2.1.2ServiceFailuresTherewerenoSEV1ServiceFailuresduringthisQuarter.TheSEV3“DirtyCOW”incidentisdiscussedinSection2.3.3.DetailsofplannedmaintenancesessionscanbefoundinSection2.3.2.

2.1.3ServiceCreditsAstheTotalServicePointsarenegative(-21),noServiceCreditsapplyin16Q4.

2.2DetailedServiceLevelBreakdown

2.2.1PhoneResponse(PR)

Oct16 Nov16 Dec16 16Q4PhoneCallsReceived 22(6) 25(5) 33(5) 80(16)Answered2Minutes 22 25 33 80ServiceLevel 100.0% 100.0% 100.0% 100.0%

Thevolumeoftelephonecallsremainedlowin16Q4.Ofthetotalof80callsreceivedabove,only16wereactualARCHERusercallsthateitherresultedinqueriesoranswereduserquestionsdirectly.

2.2.2QueryClosure(QC)

Oct16 Nov16 Dec16 16Q4Self-ServiceAdmin 689 646 358 1693Admin 162 189 146 497Technical 24 27 32 83TotalQueries 875 862 536 2273TotalClosedin2Days 862 860 532 2254ServiceLevel 99.5% 99.8% 99.3% 99.2%

TheabovetableshowsthequeriesclosedbySPduringtheperiod.InadditiontotheAdminandTechnicalqueries,thefollowingChangeRequestswereresolvedin16Q4:

Oct16 Nov16 Dec16 16Q4ChangeRequests 2 4 1 7

Page 7: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

7

2.2.3UserRegistration(UR)

Oct16 Nov16 Dec16 16Q4NoofRequests 93 69 56 218ClosedinOneWorkingDay 90 69 56 215AverageClosureTime(Hrs) 1.2 0.6 0.4 0.8AverageClosureTime(WorkingDays)

0.1 0.1 0.1 0.1

ServiceLevel 1WD 1WD 1WD 1WDNote:InOctober,therewere3userregistrationsrequests(outofthetotalof93)thattookbetween1WDand2WDtoprocessduetothe“DirtyCOW”outage.However,theaveragetimetakenforuserregistrationsduringthismonthwasstilllessthan2hours.Toavoiddoublecounting,theserequestsarenotincludedintheabovemetricsfor“AdminandTechnical”QueryClosure.

2.3AdditionalMetrics2.3.1TargetResponseTimesThefollowingmetricsarealsodefinedinSchedule2.2,buthavenoServicePointsassociated.

TargetResponseTimes1 Duringcoretime,aninitialresponsetotheuseracknowledgingreceiptofthequery2 ATrackingIdentifierwithin5minutesofreceivingthequery3 DuringCoreTime,90%ofincomingtelephonecallsshouldbeansweredpersonally(notby

computer)within2minutes4 DuringUKofficehours,allnontelephonecommunicationsshallbeacknowledgedwithin1

Hour

1–InitialResponseThisissentautomaticallywhentheuserraisesaquerytotheaddresshelpdesk@archer.ac.uk.Usersmaychoosenottoreceivesuchemailsbymailingsupport@archer.ac.uk.

2–TrackingIdentifierThisissentautomaticallywhentheuserraisesaquerytotheaddresshelpdesk@archer.ac.uk.Usersmaychoosenottoreceivesuchemailsbymailingsupport@archer.ac.uk.ThetrackingidentifierissetintheSAFEregardlesswhichoptiontheuserselects.

3–IncomingCallsThesearecoveredintheprevioussectionofthereport.ServicePointsapply.

4-QueryAcknowledgementAcknowledgmentofthequeryisdefinedaswhentheHelpdeskassignsthenewincomingquerytotherelevantServiceProvider.Thisshouldhappenwithin1workinghourofthequeryarrivingattheHelpdesk.TheHelpdeskprocessedthefollowingnumberofincomingqueriesduringtheServiceQuarter:

Oct16 Nov16 Dec16 16Q4CRAY 4 13 23 40ARCHER_CSE 116 126 27 269ARCHER_SP 1317 1186 746 3249

Page 8: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

8

TotalQueriesAssigned 1437 1325 796 3558TotalAssignedin1Hour 1437 1325 796 3558ServiceLevel 100% 100% 100% 100%

TheServiceDeskassignsqueriestoallgroupssupportingtheservicei.e.SP,CSEandCray.Theabovetableincludesquerieshandledbytheothergroupssupportingtheserviceaswellasinternallygeneratedqueriesusedtomanagetheoperationoftheservice.

2.3.2MaintenanceAchangeinthemaintenancearrangementswasagreedwiththeAuthorityduringthisquarter.Thereisnowasingledayeachmonth(fourthWednesdayofeachmonth)thatismarkedasafullmaintenancesessionforamaximumof8hourstaken.Thereisanadditional“at-risk”sessionthatisscheduledforthesecondWednesdayofeachmonth.Thisreducesthenumberofsessionstaken,whichthenreducesuserimpactsincethejobsrunningontheservicehavetobedraineddownonceandnottwice.ItalsoeasestheplanningfortrainingcoursesrunningonARCHER.SuchMaintenancePeriodsaredefinedas“PermittedMaintenance“andrecordedintheMaintenanceSchedule.A6-monthforwardplanofmaintenancehasbeenagreedwiththeAuthority.Wherepossible,SPwillperformmaintenanceonan‘At-risk’basis,thusmaximisingtheAvailabilityoftheService.ThefollowingplannedmaintenancetookplaceintheServiceQuarter.

Date Start End Duration Type Notes Reason26/10/16 0800 1229 4hrs29

minsFullOutage EPSRCApproved

0800–2000Patchingforsecurityvulnerability

16/11/16 0800 1835 10hrs35mins

FullOutage EPSRCApproved0800–2000

UpgradetoCLE5.2UP04

23/11/16 0900 1534 6hrs34mins

FullOutage EPSRCApproved0900–1700

UpgradetoCLE5.2UP04

2.3.3“DirtyCOW”IssueTopreventunauthorisedaccesstouserdataduetotherecentworldwideLinuxsecurityvulnerability(https://en.wikipedia.org/wiki/Dirty_COW,http://www.bbc.co.uk/news/technology-37728010),useraccesstoARCHERwasremovedbetween24and26October2016.Onceavendorfixwasreceivedtopreventthevulnerability,thiswastested,appliedandaccesstotheservicerestoredasquicklyaspossible.

2.3.3QualityTokensNoqualitytokenshavebeenreceivedthisquarter.

Page 9: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

9

3.ServiceStatisticsThissectioncontainsstatisticsontheARCHERserviceasrequestedbyEPSRC,SACandSMB.

3.1UtilisationUtilisationoverthequarterwas94%.TheplotbelowshowsasteadyincreaseinutilisationoverthelifetimeoftheservicetoDec2015andsincethentheservicehaseffectivelybeenoperatingatmaximumcapacityasshownbythesteadyutilisationvalue:

TheutilisationbytheResearchCouncils,relativetotheirrespectiveallocations,ispresentedbelow.ThisbarchartshowstheusageofARCHERbythetwoResearchCouncilspresentedasapercentageofthetotalResearchCouncilallocationonARCHER.ItcanbeseenthattheEPSRCutilisationexceededtheir77%targetthisquarterandwas81%,whereasNERCutilisationwas17%withtheirtargetbeing23%.

Page 10: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

10

ThecumulativeallocationutilisationforthequarterbytheResearchCouncilsisshownbelow:

ThecumulativeallocationutilisationforthequarterbyEPSRCbrokendownbydifferentprojecttypes(seebelow)showsthatthemajorityofusagecomesfromthescientificConsortia(asexpected)withsignificantusagefromresearchgrants,ARCHERLeadershipprojectsandARCHERRAPprojects.ThetimesusedbyInstantAccessprojects,trainingprojectsandgeneralserviceusageareverysmall.

Page 11: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

11

3.2SchedulingCoefficientMatrixThecolourinthematrixindicatesthevalueoftheSchedulingCoefficient.Thisisdefinedastheratioofruntimetoruntimepluswaittime.Hence,avalueof1(green)indicatesthatajobranwithnotimewaitinginthequeue,avalueof0.5(paleyellow)indicatesajobqueuedforthesameamountoftimethatitran,andanythingbelow0.5(orangetored)indicatesthatajobqueuedforlongerthanitran.

3.3AdditionalUsageGraphsThefollowingchartsprovidedifferentviewsofthedistributionofjobsizesonARCHER.TheusageheatmapbelowprovidesanoverviewoftheusageonARCHERoverthequarterfordifferentjobsizes/lengths.ThecolourintheheatmapindicatesthenumberofkAUexpendedforeachclass,andthenumberintheboxisthenumberofjobsofthatclass.

Page 12: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

12

AnalysisofJobSizes

Thefirstgraphshowsthat,intermsofnumbers,thereareasignificantnumberofjobsusingnomorethan256cores.However,thesecondgraphrevealsthatmostofthekAUswerespentonjobsbetween257coresand8192cores.ThenumberofkAUsusediscloselyrelatedtomoneyandshowsbetterhowtheinvestmentinthesystemisutilised.

Page 13: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

13

AnalysisofJobsLength

Fromthefirstgraph,itwouldappearthatthesystemisdominatedbyshortjobs.However,thesecondgraphshowsthatactualusageofthesystemismorespreadanddominatedbyjobsofupto27hourswithasecondpeakforjobsat48-51hours.

Page 14: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

14

CoreHoursperJobAnalysis

Page 15: ARCHER SP Service Quarterly Report3 1. The Service 1.1 Service Highlights This is the report for the ARCHER SP Service for the Reporting Periods: October 2016, November 2016 and December

15

Appendix–InfrastructurereportThereisnothingtoreportregardinginfrastructureworkwithinthisquarter.