Content
Ø SWITCH and SWITCHengines at a glance Konrad Jaggi Jens-Christian Fischer
SWITCH Ø Use of SWITCHengines for Big Data Anthony Strittmatter
University of St. Gallen Ø SCALE-UP Project Patrik Schnellmann
SWITCH Ø How to get my Big Data cluster Piyush Harsh
ZHAW ICClab Ø How to use SWITCHengines @HSG Christian Lazur
University of St. Gallen Ø Discussion and Hands-on
How can I use SWITCHengines
Order a NEW SWITCHengine@HSG: Step 1: Sign up for Cloud Services
https://cloud-id.switch.ch/ Step 2: Write an eMail to [email protected]
The eMail must contain a valid billing adress. Step 3: You will receive an eMail with a link to your SWITCH-
engines end-user portal. Click on it. Step 4: Log-in and start your engines.
Today: Only steps 3 and 4 are relevant J.
Costs
Cost of a «running» SWITCHengine: - CPU - RAM - Disk Storage - IP-Adress
When your SWITCHengine is «shut down»: - Disk Storage - IP-Adress
Item Pricing (Current as of Feb. 22nd 2017) Item CHF / day CHF / month CHF / year CPU Core 0.4932 15.00 180.00
RAM (1 GB) 0.2466 7.50 90.00
Disk Storage (1 GB) 0.0015 0.04583 0.55 SSD Storage (1 GB) 0.0066 0.20 2.40 IPv4 Address 0.0274 0.83 10.00
https://www.switch.ch/engines/
Item CHF / day CHF / month CHF / year
Small (1 CPU, 1 GB RAM, 20 GB Disk, 1 IPv4) 0.80 24.25 291.00
Medium (2 CPU, 2 GB RAM, 20 GB Disk, 1 IPv4) 1.54 46.75 561.00
Compute Intensive (8 CPU, 16 GB RAM, 50 GB Disk, 1 IPv4) 7.99 243.13 2’917.50
High IOPS DB Server (4 CPU, 32 GB RAM, 200 GB SSD, 1 IPv4) 11.10 337.50 4’050.00
For Beginners…
1. Einrichten einer SWITCHengine http://www.intranet.unisg.ch/~/media/Internet/Content/Dateien/Intranet/Services_Richtlinien/IT/Services/Anleitung%20SWITCHengine%20Einrichten.pdf?fl=de
2. Tipps und Tricks http://www.intranet.unisg.ch/~/media/Internet/Content/Dateien/Intranet/Services_Richtlinien/IT/Services/Tipps%20und%20Tricks%20f%c3%bcr%20die%20t%c3%a4gliche%20Arbeit.pdf?fl=de
© 2017 SWITCH | 1
[email protected] [email protected]
Universität St. Gallen, 27. April 2017
How can SWITCHengines boost your research?
Big Data and Faster Research
© 2017 SWITCH
• SWITCH • SCALE-UP project
• SWITCHengines
• Use Cases
• Roadmap for next months
Agenda
2
© 2017 SWITCH
Foundation Purpose
Excerpt from the Deed of Foundation Berne, 22 October 1987
“The foundation has as its objective “to create, promote and offer the necessary basis for the effective use of modern methods of telecomputing in teaching and research in Switzerland, to be involved in and to support such methods. It is a non-profit foundation that does not pursue commercial aims.”
© 2017 SWITCH
Video Management
Collaboration Procurement
Infrastructure & Data Services
Network
Registry
Trust & Identity
Security
Integrated Offer
© 2017 SWITCH
Infrastructure and Data Services
SWITCH made – Swiss made • Swiss law and data location • In accordance to the need of – and
controlled by – the institutions • Flexible usage and charging model • Simple administration; integrated
into the academic network of SWITCH; security and identity services included
• Support for academic use cases • Created together with you
© 2017 SWITCH
Goals SCALE-UP • Create academic services on the cloud infrastructure • User group in focus are researchers and lecturers Duration • August 2015 – December 2017
Project partners • 8 project partners from universities
Funding • Co-funded by the program “Scientific Information” of
swissuniversities with matching funds of the institutions
The SCALE-UP project
6
© 2017 SWITCH
• Research in the Cloud • Classroom in the Cloud • Big Data Analytics • Statistical Workbench • Scientific Data Pools • Collaborative Apps • Container Technologies • Reporting / Accounting / Billing • VM Management Tools • Virtual Private Cloud • Marketplace
Topics
8
© 2017 SWITCH
Goal • Learn how to use Big Data analysis tools • Provision a ready to use Hadoop and Apache Spark
cluster.
Work Package Lead • Piyush Harsh, ZHAW Sandbox on SWITCHengines with Zeppelin, Spark, jupyter: https://help.switch.ch/engines/documentation/switch-official-images/zeppelin/
Big Data Analytics
9
© 2017 SWITCH
Goal • Store large quantities of data • Manage your data sets and share them with other
researchers
Work Package Lead • Sofiane Sarni, EPFL
Examples • 1000 genomes, Common crawl, Google ngrams See http://datasets.cloud.switch.ch
Scientific Data Pools
10
© 2017 SWITCH
Goal • Integrate SWITCHengines virtual machines into the
campus network • Access easily to systems behind the firewall • Profit from scalability and redundancy
Work Package Lead • Tom Schönenberger, FHS St.Gallen
• Currently in beta phase – we are looking for institutions who want to pilot this service during 2nd half of 2017.
Virtual Private Cloud
11
© 2017 SWITCH
SWITCHengines Status
12
By Joydeep (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
© 2017 SWITCH
Engines across Switzerland
13
UniGE
UNIL
UniBE
UZHETH Zurich
UniFR
ZHAW
EPFL
USI
UniBASUniSG
HSR
SUPSI
NTBBFH
HSLU
FHSGFHNWPHSGPSI
HES-SO
PHZH
FFHS
29+ core institutions 10+ extended entities
© 2017 SWITCH
• At Swiss Universites • Fully integrated into the SWITCH network infrastructure
Datacenters in Zurich and Lausanne
14
© 2017 SWITCH
• Long-term “native” cloud devops know-how (design and operations)
• SWITCHengines in production internally since 2014 • Public services since 2015/2016 • Several SWITCH services run on it • Over 1000 individual users and around 100 research &
education projects online
Current Status
15
© 2017 SWITCH
• The Lausanne datacenter of SWITCHengines was expanded with 16 compute and 16 storage nodes, adding: – 768 vCPU cores for a total of 1216 cores – 4 TB of RAM for a total of 7.4 TB – 768 TB raw storage for a total of 1.5 PB
• The Zurich datacenter added another 16 compute nodes – 768 vCPU cores for a total of 2248 cores – 4 TB of RAM for a total of 12.6 TB
• Plus planning for 3PB in Q2 2017
Expansion of Datacenters
16
© 2017 SWITCH
• 5 “Scientific Information” projects by swissuniversities – DLCM – Data Lifecycle Management – GeoData 4 Swiss Edu – GIS – HEPIA – Simulation – NEICH – High Energy Physics – SCITAS – eScience
• Various SWITCH internal services – SWITCHdrive (52 TB data, 22000 users) – SWITCHfilesender – SWITCHtube…
Use Cases: Projects & National Services
18
© 2017 SWITCH
Use Case: Machine Learning
19
– Anthony Strittmatter, Assistant Professor for Econometrics
We require a secure IT environment which enables us to compute the estimation results in a reasonable amount of time. A big advantage is, that the data remains in Switzerland.
© 2017 SWITCH
Use Case: Geodata
20
– Dirk Engelke, Professor of Spatial Development at HSR
In our research we ask ourselves how to organize the spatial pattern to ensure e.g. a quality of service public in the future…. (by using) SWITCHengines, we can run our systems optimally and have enough power and performance
© 2017 SWITCH
Use Case: Managing Big Datasets
21
– Andri Lareida, Institute for Informatics, University of Zurich
I could imagine going for SWITCHengines …. So we would not have to care about hardware maintenance and still get the compute power when we need it. Furthermore, I could profit from the scalability: With a higher number of machines I get the results for my research much faster!
© 2017 SWITCH
• Local SSD VMs • Upgrades OpenStack Kilo / Liberty / Mitaka • Expansion ZH/LS (total 48 Compute Nodes, 32 Storage Nodes) • Control plane split • IPv6 for private network & VMs • Swift Object Store • Network performance improvements • Sending Bills • Storage improvements: Snapshot creation/deletion • 17 patches upstream • Monitoring / Grafana • Centralized logging (ELK) • FIWARE – cloud as a service • Network redundancy • Upgrade Ceph Storage Software from Hammer -> Jewel
Engineering (2016/2017)
22
© 2017 SWITCH
VMs with Local SSD storage • For High IOPS tasks, you can use VMs that have local
SSD storage (instead of shared Ceph storage) • Size up to 400 GB Containers • For many projects less a question of configuration but of
how many of machines needed • Generally, our users envision docker containers (that then
start subcontainers from the application)
Local SSD storage / Containers
24
© 2017 SWITCH
Reporting • Reporting to customers about usage and costs of VMs and
storage Administration • In 2017, we will be able to delegate admin tasks to the
institutions (Projects / Subprojects / Roles / Groups / Quota Management)
Reporting / Administration
25
Swiss Institute for Empirical Economic Research (SEW)
Prof. Christina Felfe Dr. Alex Krumer
Prof. Michael Lechner
Michael Knaus
Carina Steckenleiter Daniel Goller
Gabriel Okasa
Michael Zimmert
Examples of Research Questions
1. ALMP Evaluations (rich social security data) – Who benefits most from participation? – Selection of sequential training courses?
2. Industrial Organisation (unstructured data scraped from the internet) – Are the trade reactions to the Volkswagen emission scandal
coherent with classical models of adverse selection?
3. Sport Economics (minute-by-minute soccer data) – Prediction of Bundesliga outcome (
www.sew.unisg.ch/soccer_analytics).
Experience with SWITCHEngines
• 32GB RAM, 4 cores, Windows operating system. • Remote desktop connection. • Installation of the statistical software. • SWITCHEngines responsible for maintenance. • SWITCHFilesender & SWITCHDrive.
• According to our experience, the connection with the virtual machine is stable and fast.
• After using SWITCHEngines, the instance can be paused to save costs. But it can be relaunched within 5 minutes.
Possible Current Disadvantages (of SWITCHEngines and Cloud Computing in General)
1. Very expensive (approx. 4,000 CHF/year) – Much computing power for short time period – Flexible working environment – No maintenance
2. Scalability of instance? – Automatic shutdown when CPU is not used? – Automatic (re-)scaling of RAM and cores?
3. Data security? – Security leak when remote connection is used? – Is SWITCHEngines ‘save enough’ to store social-
security data?
Computational Social Science Workshop
• Jointly with University Konstanz • Next date September, 25 • http://bigdata.unisg.ch
Thank you for your attention! [email protected] www.anthonystrittmatter.com
What is DISCO?
Cluster administration is hard!● Hadoop + Spark has hundreds of configuration parameters● Takes time to master them!
DISCO provides a one stop platform for provisioning BigData clusters on-demand➔ Create clusters on demand, and set the resource size as per your needs➔ Use it, analyze data, evaluate results➔ Once done, delete it - minimize your costs!➔ Optimized defaults based on hortonworks, cloudera and Amazon implementations
Ideally suited for★ Students, researchers, lecturers.★ Executing transfer projects, consultancies, etc.★ Analytics and ML Labs - clusters shared with teams of students
What is DISCO?
★ Cloud Orchestration framework for deploying Distributed Computing clusters on SWITCHengines
★ Automatic one-click provisioning of Big Data processing frameworks★ Dashboard tailored to professor - student interaction
○ But you can generalize the roles - admin, user-groups!★ Easily extensible if new components are needed★ Privileged end user (professor) can select from a list what is to be installed
and make the cluster available to group of students
Frameworks
● Currently, the following Big Data processing frameworks can be provisioned automatically via DISCO:
… but new components can be added in just minutes
Ex. Use Case: ZHAW (Prof. Kurt Stockinger)
Disco being used to provision and manage the data-analytics clusters for lab exercises
● Professor: Dr. Kurt Stockinger● Name of the course: Information Engineering 2● Number of students enrolled: 35● Tools used in the lab: Apache Spark + Hadoop (DataFrames, RDD, machine
learning, HDFS)
Demo
● Walk through the professor’s interface● Walk through the student’s interface● Key entities in DISCO
○ Infrastructure○ Clusters○ Groups
Further information
● https://icclab.github.io/disco/ ● Wiki: https://github.com/icclab/disco/wiki