florin dinu t. s. eugene ng rice university
DESCRIPTION
Synergy2Cloud: Introducing Cross- Sharing of Application Experiences Into the Cloud Management Cycle. Florin Dinu T. S. Eugene Ng Rice University. Multi-Tenant Clouds . Resource contention Performance variation Failures. App1. App2. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/1.jpg)
Florin Dinu T. S. Eugene NgRice University
Synergy2Cloud: Introducing Cross-Sharing of Application Experiences
Into the Cloud Management Cycle
![Page 2: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/2.jpg)
2
Multi-Tenant Clouds
• Resource contention
• Performance variation
• Failures
Challenging to ensure good application performance
App1 App2
![Page 3: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/3.jpg)
3
Our Position: Cross-Sharing Experiences
Red would benefit if Green shared info about the failure
?
!
![Page 4: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/4.jpg)
4
Ensuring Performance with Good DecisionsGood provisioning decisions
Need information to guide provisioning decisions
App1: How to best scale next?
![Page 5: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/5.jpg)
5
Ensuring Performance with Good Decisions
Detect node failure & restart tasks
Need information to guide runtime decisions
Good runtime decisions
![Page 6: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/6.jpg)
6
• Generating an efficient execution plan [NSDI 12]
• Scheduling a known execution plan [SOSP 09]
• Runtime decisions
• Stragglers [OSDI 08, OSDI 10]
• Managing traffic [SIGCOMM 2011]
• Failures [HPDC 12]
All these decisions can benefit from extra information
Ensuring Performance with Good Decisions
![Page 7: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/7.jpg)
7
Obtaining Information Today: Operator
App 1App 2
Operator monitoring can be useful but has important shortcomings
Careful on resources
Application agnostic data
![Page 8: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/8.jpg)
8
Obtaining Information Today: Apps
Application monitoring also has several limitations
Limited view of environment
Limited to owned resources
Limited by scale of app
![Page 9: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/9.jpg)
9
Commonality among cloud apps
HARDWARE
OS
LARGE SCALE FWK
Virtualized. Few instance types available
Few OS image typesavailable
e.g. MapReduce. Many apps built on top
Unique opportunity in the cloud
![Page 10: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/10.jpg)
10
Cross-Sharing Application ExperiencesSharing
Benefiting from information shared by others
I need to scale out
fast
![Page 11: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/11.jpg)
11
Today: Rudimentary Sharing
Too slow. Requires human involvement. Needs to be automatic.
![Page 12: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/12.jpg)
12
Hurdle to Overcome: Isolation
Isolation impedes sharing and measurement
![Page 13: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/13.jpg)
13
But Isolation is Important
• More predictable performance• Improved security
• Today, we are striving to isolate at all levels– Network [NSDI 11, CONEXT 10, Xen]– Storage [VMWare]– CPU [Xen]– Caches [CCSW 09]
Is complete isolation the way to go?
![Page 14: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/14.jpg)
14
Cross-Sharing Application Experiences
Examples Incentives Challenges
![Page 15: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/15.jpg)
15
Cross-Sharing Example: Performance
Shared performance information may help scheduling
Dedicated Storage
Different cloud
Latency
Thro
ughp
ut
![Page 16: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/16.jpg)
16
Cross-Sharing Example: Scalability
Sharing can inform scaling decisions
• Red needs to scale fast
• No time for testing
• Which VM type to use?
• Information from green may be useful?
![Page 17: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/17.jpg)
17
Cross-Sharing Example: Failures
• Detect compute node failures
• Green deployed at larger scale
• Green detects failure faster
• Red has few vantage points
• Red can benefit from green’s experience
Some applications are better suited to discover failures
?
!
![Page 18: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/18.jpg)
18
Incentives for the Operator
• 1. App performance = cloud performance• 2. Apps = huge monitoring platform• 3. $$$• 4. If you can’t beat them, join them
Use
3rd partyShare
![Page 19: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/19.jpg)
Incentives for the Applications
• 1. More info => better decisions • 2. Obtain info quickly – no trial and error needed• 3. Info about resources that they don’t use• 4. Some apps better suited than others
!?
![Page 20: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/20.jpg)
20
Addressing Challenges
• Increasing information expressiveness
• Verifying information authenticity
• Establishing similarity between applications
Synergy between applications and operator
![Page 21: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/21.jpg)
21
Challenge – Increasing Expressiveness
Another
Data Center
Limited infrastructure knowledge
Path is congested
![Page 22: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/22.jpg)
22
Challenge – Increasing Expressiveness
Operator should provide abstract location information
Effects of isolation – Info meaningul in context of red only
100% CPU usage
Empty slots
What does 100% CPU
mean for me?
![Page 23: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/23.jpg)
23
Challenges - AuthenticityStorage
I have 20 instances connecting
simultaneously to storage and my throughput is….
• Verify where possible (ownership)
• Cull outliers with statistics
• Filter incoming sources of shared data
Fake Performance Info vs Bad Performance Info
![Page 24: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/24.jpg)
24
Challenges – Establishing Similarity
HARDWARE
OS
LARGE SCALE FWK
ALLOCATION
50 vs 3 VMs
Impact runtime
Use delayed ACK?
Configuration, code tweaks
• Difficult in the general case
• Not all layers impact every shareable information (compute failures)
• Detect paths in the layers that impact the shared information
• Encode similarity between paths (e.g. hash codes)
Small changes in the code
![Page 25: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/25.jpg)
25
Synergy2Cloud.com
Our first step for making cross-sharing a reality
![Page 26: Florin Dinu T. S. Eugene Ng Rice University](https://reader035.vdocuments.mx/reader035/viewer/2022062501/568166a7550346895dda9943/html5/thumbnails/26.jpg)
26
Conclusion
• Our position: cross-sharing experiences a promising direction
• Win-win for both applications and operator
• Not trivial - several interesting challenges ahead