scientific computer center - dfn · 2018. 3. 27. · 16/03/2018 – “externe dienste in der...
TRANSCRIPT
Clouds and the modernized role of the Scientific Computer Center
University of Freiburg, eScience dept. Dirk von Suchodoletz
16/03/2018 – “Externe Dienste in der DFN-Cloud”
Structure of the talk
Virtualization and clouds – paradigm shift for (university) computer centers
Open (and hidden) agenda for “doing cloud” ViCE project and (large scale) research
infrastructures Organizational challenges, cooperation Technical challenges OpenStack in operation Sizing considerations Costs and cost recovery aka “business model”
Challenges of CC operations
From the drivers of technology to the driven Today‘s challenges of (university) computer centers
Very diverse scientific communities and broad set of software, tool demands
Different, contradicting demands regarding software environments
Short notice demands for hardware to be used at least for five years (amortization of equipment in economic terms)
Personnel to operate all the (small, diverse) hardware servers expensive
Save on money and hardware resources, most resources underuntilized most of the time
Save on rackspace and energy – the computer center has a significant energy bill each year
Open and hidden agendas
Technological changes offer chances to rethink stuff Typical open agenda – why to think about and
implement a cloud strategy Standardize a standard business (servers of various
flavours) Fast start for new projects Efficient use of resources (not necessarily single servers,
but projects often shorter than write-down of hardware, scale up and down with demand, …)
Recalibration, redistribution, splitting of tasks (regarding machine management, monitoring, ...)
Complement established virtualization solutions (like ESX, Xen or alike)
Avoid vendor lock-ins, multi-vendor-strategy
Open and hidden agendas
Hidden agenda Re-calibration of the own strategy Regain control over hardware infrastructure (server
rooms, rack space, monitoring, energy, ...) Offer (hardware) resources without actually having them Rethink internal structures, prepare your personnel for
future challenges Re-establish the computer center as the entity to ask for
IT resources Push security, data protection policies etc. Be cool again :-)
...
Organizational challenges
Challenging to develop a cloud strategy individually
Cooperation is key Days of the full-service-stack
computer centers are mostly over Tradition of cooperative infrastructure
projects in Baden-Wuerttemberg As bwHPC state-sponsored:
Personnel + Hardware Support structural change
Research infrastructure: bwCloud
OpenStack based state wide self service cloud to cater for scientific, computer center services and student use (science, operations, education)
State wide cooperation of university computer centers in Karlsruhe, Ulm, Mannheim and Freiburg
bwCloud Project(s)
CC UFR part of the 2 + 3 cloud introduction and service creation project
4 partners in first phase (till 2016) Project management in Mannheim Definition of software stack to use, complementary to
traditional virtualization tech (like ESX, HyperV, XEN) 4 + 1 E13 FTE in first, 5 + 1 FTE in second period Keystone / bwServices App → Freiburg only
bwCloud sites: Coordinated independence Same OpenStack version, but free how to deploy base
system Common ticket queue Shared documents like “Fair-use” A bit federally blown-up structure
bwCloud SCOPE
Single socket AMD hardware procured last year (~0.9 Mio) 256 GByte RAM per compute node, 2* 10GbE Single vendor, same type of hardware Getting installed at the moment
Site # Compute (Nova)
# Storage Headnodes
(Ceph)
# JBODs Capacity JBODs (summed up)
Freiburg 27 4 2 504 TB (brutto)
Karlsruhe 27 4 2 504 TB (brutto)
Mannheim 27 4 2 504 TB (brutto)
Ulm 36 4 2 1.200 TB (brutto)
Total 116 16 8 2016 TB (brutto)
bwCloud SCOPE
Users of the bwCloud
Rising number of users during project period Not all users necessarily active
426204265042681427114274242773428014283242862428934292342954429854301543046430764310743138431420
200
400
600
800
1000
1200
136 167 217 275 289353 375 424 464
549 592 631 654 680 732874 915 976 983
173 198255
301 305374 400
454
571
672723 722
757807
880
9761029 1023 1017
Anzahl reg. NutzerAnzahl laufende Instanzen
An
zah
l
An
zah
l
Users of the bwCloud
Users matched to educational institutions Significant numbers at the main bwCloud sites But: E.g. Furtwangen quite active Wide range of uses (but no statistics)
020406080
100120140160180200
1 1 2 2 3 4 4 5 6 7 8 8 12 14 15 18 1931
50
8498
120130
166175
Anzahl reg. Projekte (= Nutzer_innen)
An
zah
l
bwCloud partitioning (=regions)
Two regions: Local (university internal), external customers/DMZ
Automatic assignment derived from user attributes
Local region Allow IP addresses of the own university’s core network Allow easier access to certain (internal) resources like
mounts of the Isilon storage system etc.
External customers Optimally DMZetted Student VMs, external users from all over the state
without privileged requirements
→ Main feature: State-wide Self-service infrastructure
Technical challenges
OpenStack dynamic, non-trivial piece of software Commercial support available, costly You want to have the expertise nevertheless
Proper sizing of the cloud is challenging (demand vs. actual usage, how to scale up, ...)
How to handle access to certain resources Relevant storage resources of projects
Assignment of IPv4 networks Still very relevant IPv6 desirable in the long run, but Reachability often unclear, e.g. from mobile telephony
networks, local WLAN, etc. Not every site has proper/complete IPv6
connectivity/strategy yet
Technical challenges
Original setup – hyper converged
Experience taught that too many CEPH nodes hurt
Difficult to bring down single, multiple compute nodes if distributed CEPH running
Caching could be centralized
Compute nodes PXE booted and dynamically configured
bwCloud network
Hardware: single/dual 10 GbE connection of each cloud host
Internal (management) network is of the 10.X.Y.Z range, coordinated with the other eScience infrastructures
Various models of IP assignment (per project, ...)
IPv4 Challenge to provide enough IPv4 at the other sites Reservation for 4096 addresses in the 192.52.?? range
IPv6 (long run) IPv6 as an alternative, but Reachability unclear, e.g. from mobile telephony networks,
local WLAN, etc. Not every site has proper/complete IPv6 connectivity/strategy
yet
Freiburg strategy
Unified high level infrastructure strategy for the relevant compute-heavy research infrastructures
Sizing considerations
Load of cloud hardware difficult to predict Demand is influenced by available sizing, operation
modes, flexibility Difficult to distinguish (in)active users precisely Have enough spare capacity for upcoming research
projects
How to define flavours? CPU core overbooking up to 1:16 RAM cannot be (safely) overbooked → 1:1.2 Storage should not be an issue, performance not yet
clear for the long run (HDD with SSD cache)
Sizing considerations
Lot’s of students from various universities (of applied sciences)
Small projects, courses, teaching
Larger requirements negotiated individually
Flavour name Number of vCPUs RAM (in MB) „Popularity” Number of running VMs in
certain flavour
M1.small 2 2048 1069
M1.medium 2 4096 889
M1.nano 1 512 825
M1.tiny 1 1024 514
M1.large 4 8192 398
M1.xlarge 8 16384 301
Most popular flavours
Costs and cost recovery
Costs a tricky topic in university (public service) computer centers
Services of CC usually provided free of charge How to compete reasonably with commercial services? Don’t be to shy: Even if you (your customers)
understood mobile telephony plans does not mean to understand Amazon pricing
Funding institutions start to rephrase requirements Re-train reviewers At the moment: Mostly models where research
groups bring project money in
Cost base / comparison
Variants of VMs Tiny Nano ...
Challenges
“Business model” for extended demands Accounting and billing
Costs and cost recovery
Flavour name Number of vCPUs RAM (in MB)
Nano 1 512
Tiny 1 1024
Small1 2 2048
Small2 2 4096
Medium1 4 8192
Medium2 4 16384
Large1 8 16384
Large2 8 32768
Large2 8 65536
Free of charge:Special shibboleth Entitlement „bwCloud-Basic“ in bwIDM
Potentially chargeableEntitlement „bwCloud-Extended“
Preliminary conclusions
Re-socialize the “Gollum administrators” (my precious server/hardware/software/...)
Refocus on applications and users instead on hardware and basic infrastructure
Permanent learning is key
Fast recovery instead of complex HA or redundancy
View servers as herds of cattle instead of unique pets – should be easily replaceable
Recentralized procurement of standard equipment Contradict the myth of the unemployed
administrator There is still too much to do even after getting rid of all
hardware
Preliminary conclusions
Existing significant (research) infrastructures attract additional money
Research groups start to trust the computer center again
Entrusting precious project money and accepting that others might use the resource too (of course they profit from administration and additional services like procurement, expertise, ...)
Funny things will happen: Ranging from *Coin Mining to TOR nodes, open DNSes …
But: Don't blame the cloud for novel problems (you might have had anyway)
Long term perspective to be proven, supporting project runs till the end of 2019
The team is key: HPC, Cloud and ViCE people bwCloud (FR): Manuel Messner, Eric Rasche (de.NBI) bwCloud project management (thanks for some slide
content!): Janne Schulz (Mannheim) bwHPC (FR): Bernd Wiebelt, Michael Janczyk bwLehrpool / ViCE: J. Bauer, J. Vollmer, Ch. Hauser, S.
Rettberg, Chr. Rössler
Project description: https://www.alwr-bw.de/kooperationen/vice https://www.bw-cloud.org
Thank you / Questions!?