scientific computer center - dfn · 2018. 3. 27. · 16/03/2018 – “externe dienste in der...

25
Clouds and the modernized role of the Scientific Computer Center University of Freiburg, eScience dept. Dirk von Suchodoletz 16/03/2018 – “Externe Dienste in der DFN-Cloud”

Upload: others

Post on 12-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Clouds and the modernized role of the Scientific Computer Center

University of Freiburg, eScience dept. Dirk von Suchodoletz

16/03/2018 – “Externe Dienste in der DFN-Cloud”

Page 2: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Structure of the talk

Virtualization and clouds – paradigm shift for (university) computer centers

Open (and hidden) agenda for “doing cloud” ViCE project and (large scale) research

infrastructures Organizational challenges, cooperation Technical challenges OpenStack in operation Sizing considerations Costs and cost recovery aka “business model”

Page 3: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Challenges of CC operations

From the drivers of technology to the driven Today‘s challenges of (university) computer centers

Very diverse scientific communities and broad set of software, tool demands

Different, contradicting demands regarding software environments

Short notice demands for hardware to be used at least for five years (amortization of equipment in economic terms)

Personnel to operate all the (small, diverse) hardware servers expensive

Save on money and hardware resources, most resources underuntilized most of the time

Save on rackspace and energy – the computer center has a significant energy bill each year

Page 4: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Open and hidden agendas

Technological changes offer chances to rethink stuff Typical open agenda – why to think about and

implement a cloud strategy Standardize a standard business (servers of various

flavours) Fast start for new projects Efficient use of resources (not necessarily single servers,

but projects often shorter than write-down of hardware, scale up and down with demand, …)

Recalibration, redistribution, splitting of tasks (regarding machine management, monitoring, ...)

Complement established virtualization solutions (like ESX, Xen or alike)

Avoid vendor lock-ins, multi-vendor-strategy

Page 5: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Open and hidden agendas

Hidden agenda Re-calibration of the own strategy Regain control over hardware infrastructure (server

rooms, rack space, monitoring, energy, ...) Offer (hardware) resources without actually having them Rethink internal structures, prepare your personnel for

future challenges Re-establish the computer center as the entity to ask for

IT resources Push security, data protection policies etc. Be cool again :-)

...

Page 6: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Organizational challenges

Challenging to develop a cloud strategy individually

Cooperation is key Days of the full-service-stack

computer centers are mostly over Tradition of cooperative infrastructure

projects in Baden-Wuerttemberg As bwHPC state-sponsored:

Personnel + Hardware Support structural change

Page 7: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Research infrastructure: bwCloud

OpenStack based state wide self service cloud to cater for scientific, computer center services and student use (science, operations, education)

State wide cooperation of university computer centers in Karlsruhe, Ulm, Mannheim and Freiburg

Page 8: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

bwCloud Project(s)

CC UFR part of the 2 + 3 cloud introduction and service creation project

4 partners in first phase (till 2016) Project management in Mannheim Definition of software stack to use, complementary to

traditional virtualization tech (like ESX, HyperV, XEN) 4 + 1 E13 FTE in first, 5 + 1 FTE in second period Keystone / bwServices App → Freiburg only

bwCloud sites: Coordinated independence Same OpenStack version, but free how to deploy base

system Common ticket queue Shared documents like “Fair-use” A bit federally blown-up structure

Page 9: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

bwCloud SCOPE

Single socket AMD hardware procured last year (~0.9 Mio) 256 GByte RAM per compute node, 2* 10GbE Single vendor, same type of hardware Getting installed at the moment

Site # Compute (Nova)

# Storage Headnodes

(Ceph)

# JBODs Capacity JBODs (summed up)

Freiburg 27 4 2 504 TB (brutto)

Karlsruhe 27 4 2 504 TB (brutto)

Mannheim 27 4 2 504 TB (brutto)

Ulm 36 4 2 1.200 TB (brutto)

Total 116 16 8 2016 TB (brutto)

Page 10: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

bwCloud SCOPE

Page 11: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Users of the bwCloud

Rising number of users during project period Not all users necessarily active

426204265042681427114274242773428014283242862428934292342954429854301543046430764310743138431420

200

400

600

800

1000

1200

136 167 217 275 289353 375 424 464

549 592 631 654 680 732874 915 976 983

173 198255

301 305374 400

454

571

672723 722

757807

880

9761029 1023 1017

Anzahl reg. NutzerAnzahl laufende Instanzen

An

zah

l

An

zah

l

Page 12: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Users of the bwCloud

Users matched to educational institutions Significant numbers at the main bwCloud sites But: E.g. Furtwangen quite active Wide range of uses (but no statistics)

020406080

100120140160180200

1 1 2 2 3 4 4 5 6 7 8 8 12 14 15 18 1931

50

8498

120130

166175

Anzahl reg. Projekte (= Nutzer_innen)

An

zah

l

Page 13: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

bwCloud partitioning (=regions)

Two regions: Local (university internal), external customers/DMZ

Automatic assignment derived from user attributes

Local region Allow IP addresses of the own university’s core network Allow easier access to certain (internal) resources like

mounts of the Isilon storage system etc.

External customers Optimally DMZetted Student VMs, external users from all over the state

without privileged requirements

→ Main feature: State-wide Self-service infrastructure

Page 14: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Technical challenges

OpenStack dynamic, non-trivial piece of software Commercial support available, costly You want to have the expertise nevertheless

Proper sizing of the cloud is challenging (demand vs. actual usage, how to scale up, ...)

How to handle access to certain resources Relevant storage resources of projects

Assignment of IPv4 networks Still very relevant IPv6 desirable in the long run, but Reachability often unclear, e.g. from mobile telephony

networks, local WLAN, etc. Not every site has proper/complete IPv6

connectivity/strategy yet

Page 15: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Technical challenges

Original setup – hyper converged

Experience taught that too many CEPH nodes hurt

Difficult to bring down single, multiple compute nodes if distributed CEPH running

Caching could be centralized

Compute nodes PXE booted and dynamically configured

Page 16: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

bwCloud network

Hardware: single/dual 10 GbE connection of each cloud host

Internal (management) network is of the 10.X.Y.Z range, coordinated with the other eScience infrastructures

Various models of IP assignment (per project, ...)

IPv4 Challenge to provide enough IPv4 at the other sites Reservation for 4096 addresses in the 192.52.?? range

IPv6 (long run) IPv6 as an alternative, but Reachability unclear, e.g. from mobile telephony networks,

local WLAN, etc. Not every site has proper/complete IPv6 connectivity/strategy

yet

Page 17: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Freiburg strategy

Unified high level infrastructure strategy for the relevant compute-heavy research infrastructures

Page 18: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Sizing considerations

Load of cloud hardware difficult to predict Demand is influenced by available sizing, operation

modes, flexibility Difficult to distinguish (in)active users precisely Have enough spare capacity for upcoming research

projects

How to define flavours? CPU core overbooking up to 1:16 RAM cannot be (safely) overbooked → 1:1.2 Storage should not be an issue, performance not yet

clear for the long run (HDD with SSD cache)

Page 19: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Sizing considerations

Lot’s of students from various universities (of applied sciences)

Small projects, courses, teaching

Larger requirements negotiated individually

Flavour name Number of vCPUs RAM (in MB) „Popularity” Number of running VMs in

certain flavour

M1.small 2 2048 1069

M1.medium 2 4096 889

M1.nano 1 512 825

M1.tiny 1 1024 514

M1.large 4 8192 398

M1.xlarge 8 16384 301

Most popular flavours

Page 20: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Costs and cost recovery

Costs a tricky topic in university (public service) computer centers

Services of CC usually provided free of charge How to compete reasonably with commercial services? Don’t be to shy: Even if you (your customers)

understood mobile telephony plans does not mean to understand Amazon pricing

Funding institutions start to rephrase requirements Re-train reviewers At the moment: Mostly models where research

groups bring project money in

Page 21: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Cost base / comparison

Variants of VMs Tiny Nano ...

Challenges

“Business model” for extended demands Accounting and billing

Page 22: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Costs and cost recovery

Flavour name Number of vCPUs RAM (in MB)

Nano 1 512

Tiny 1 1024

Small1 2 2048

Small2 2 4096

Medium1 4 8192

Medium2 4 16384

Large1 8 16384

Large2 8 32768

Large2 8 65536

Free of charge:Special shibboleth Entitlement „bwCloud-Basic“ in bwIDM

Potentially chargeableEntitlement „bwCloud-Extended“

Page 23: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Preliminary conclusions

Re-socialize the “Gollum administrators” (my precious server/hardware/software/...)

Refocus on applications and users instead on hardware and basic infrastructure

Permanent learning is key

Fast recovery instead of complex HA or redundancy

View servers as herds of cattle instead of unique pets – should be easily replaceable

Recentralized procurement of standard equipment Contradict the myth of the unemployed

administrator There is still too much to do even after getting rid of all

hardware

Page 24: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

Preliminary conclusions

Existing significant (research) infrastructures attract additional money

Research groups start to trust the computer center again

Entrusting precious project money and accepting that others might use the resource too (of course they profit from administration and additional services like procurement, expertise, ...)

Funny things will happen: Ranging from *Coin Mining to TOR nodes, open DNSes …

But: Don't blame the cloud for novel problems (you might have had anyway)

Long term perspective to be proven, supporting project runs till the end of 2019

Page 25: Scientific Computer Center - DFN · 2018. 3. 27. · 16/03/2018 – “Externe Dienste in der DFN-Cloud ... Costs and cost recovery aka “business model ... CC UFR part of the 2

The team is key: HPC, Cloud and ViCE people bwCloud (FR): Manuel Messner, Eric Rasche (de.NBI) bwCloud project management (thanks for some slide

content!): Janne Schulz (Mannheim) bwHPC (FR): Bernd Wiebelt, Michael Janczyk bwLehrpool / ViCE: J. Bauer, J. Vollmer, Ch. Hauser, S.

Rettberg, Chr. Rössler

Project description: https://www.alwr-bw.de/kooperationen/vice https://www.bw-cloud.org

Thank you / Questions!?