zoe - swarming spark applications
TRANSCRIPT
Zoe: Swarming Spark applications
Daniele VenzanoResearch Engineer, EURECOM
2
My background
Software engineering (2010)• Linux embedded systems, kernel drivers,
graphical interfaces
Research (2012)• Code analysis, OpenFlow, automatic bug
detection
More research (now)• Virtualization, networking, distributed systems
performance
3
DSG and EurecomResearch center on the French Riviera
Like this?
4
DSG and EurecomResearch center on the French Riviera
Or more like this?
5
DSG and EurecomEngineering research center
• Academic research in telecommunication, multimedia, networks and security
• Close ties with local and international companies
Distributed Systems Group• Focusing on data-intensive applications (so called “big data”)
at all levels• Performance impact of virtualization, storage and network technologies (that’s
me!)• Data processing frameworks (Hadoop, Spark)• Machine learning algorithms
6
Docker at the Distributed Systems GroupStarted investigating Docker in 2012
•Virtualization platform for Big Data research
Summer 2015•Built Swarm cluster•Planning to shift from VMs to Containers for most use cases
Bigfoot project
7
Use casesInternally at Eurecom:
• Laboratory sessions for Data Science course• ~100 students, fixed configuration, throw-away environments• Academic research• very dynamic loads, all kinds of software combinations, higher priorities near
deadlines
Companies have similar use cases• Production jobs• Fixed configuration, periodic executions• Research teams
Smart airports
Power loadforecasting
Customer locationforecasting
8
The last 3 years: OpenStack + SaharaPublic/private cloud with VM-based virtualizationWe contributed Spark support to SaharaUsers can create clusters on-demand
Assumes infinite resourcesSlow
•Create an HDFS+Spark cluster: 5 to 10 minutes•Swarm takes a few seconds for the same task
Supporting new services/versions requires code changes
Users makestatic allocations
9
Why build on top of Docker and Swarm?Swarm has a simple, documented API
Start solving our problem immediatelyPackaging software is very easy
Freedom to experimentFast deployments
No static allocation, automatic resizingSwarm does only one thing and does it well
10
ZoeApplication scheduler on top of Swarm
Queues requests when resources are scarceUsers can submit their own applications
And create their own container images!Dynamically resizes active applications
Free unused resources to speed-up other appsCan coexist with other Swarm users
MSC ZoeLaunch: August 2015Tonnage: 197,362tCapacity: 19,224 TEULength: 395.4 mEngine: 83,800 HPCrew: 22
11
What is a Zoe application?
12
Zoe architecture
Zoe scheduler Swarm
Images fromprivate registry
or Docker Hub
Monitoring data
Users submitapplicationdescriptions
Zoe schedulesrequests
13
Automatic resize of running applications
Volumes
Data layer
Applications
Example: a data layer is not needed if there are no usersData is kept in volumesThe data layer can be restarted when needed
14
Examples of scheduling policiesFIFO – First In First OutPriority based
Researchers near deadlines have more priorityFits nicely the Swarm priority model
DeadlineFinish this work by 3 p.m.Streaming analysis latency must be less than 200ms
Size-basedRun first the smallest applicationsNeed to know the runtime in advance
15
Zoe implementationTwo client implementations
Web interfaceCommand line for scripting
Simple FIFO schedulerDocker images for Spark, HDFS, iPython and Spark
notebooksOpen source on GitHub, images available on the Docker Hub
16
Zoe - futureSet date: March 2016 version 1.0Big plans for Zoe
One full-time programmerCompanies we spoke to, all, are very interested
Features for 1.0 and after:Create Zoe applications with more and more servicesAutomatic resizing of applicationsUse the new volume managementMonitoringAdvanced scheduling
17
Using Docker Swarm for data-intensive apps
L2 networking for Docker containersService discovery via DNS
Docker bridge
eth0
eth1
Docker bridge
eth0
eth1
What about Swarm 1.0 multi-host networking?-We need hostnames to be visible from outside-Will run measurements on overlay network performance
c1
c2
c3
c4
18
Key takeaways1. Zoe is a data-intensive application scheduler that targets
data scientists and private clouds
2. It is very easy to build cloud applications on top of Swarm
3. Data-intensive frameworks like Spark can run easily and efficiently on top of Swarm
4. Network between Docker containers on different hosts can be made transparent
Thank you!Daniele Venzanohttp://[email protected]