cs 294-42: project suggestions
DESCRIPTION
CS 294-42: Project Suggestions. September 14, 2011. Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/). Projects. This is a project oriented class Reading papers should be means to a great project not a goal in itself! Strongly prefer groups of two - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/1.jpg)
1
CS 294-42: Project Suggestions
Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/)
September 14, 2011
![Page 2: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/2.jpg)
Projects
This is a project oriented class Reading papers should be means to a great
project not a goal in itself! Strongly prefer groups of two
Perfectly fine to have the same project at cs262 Today, I’ll present some suggestions
But, you are free to come up with your own proposal
Main goal: just do a great project2
![Page 3: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/3.jpg)
Where I’m Coming From?
Key challenge: maximize economic value of data, i.e., Extract value from data while reducing costs (e.g.,
storage, computation)
3
![Page 4: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/4.jpg)
Where I’m Coming From? Tools to extract value from big-data
Scalability Response time Accuracy
Provide high cluster utilization for heterogeneous workloads Support diverse SLAs Predictable performance Isolation Consistency 4
![Page 5: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/5.jpg)
Caveats Cloud computing is HOT, but lot of NOISE!
Not easy to differentiate between narrow engineering solutions
and fundamental tradeoffs predict the importance of the problem you solve
Cloud computing it’s akin Gold Rush!
5
![Page 6: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/6.jpg)
Background: Mesos Rapid innovation in cloud computing
No single framework optimal for all applications Running each framework on its dedicated cluster
Expensive Hard to share data
6
Dryad
Pregel
CassandraHypertable
Need to run multiple frameworks on same clusterNeed to run multiple frameworks on same cluster
![Page 7: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/7.jpg)
Background: Mesos – Where We Want to Go
HadoopHadoop
PregelPregel
MPIMPIShared cluster
Today: static partitioning Mesos: dynamic sharinguniprogramming multiprogramming
![Page 8: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/8.jpg)
Background: Mesos – Solution
Mesos is a common resource sharing layer over which diverse frameworks can run
8
NodeNode NodeNode
HadoopHadoop
NodeNode NodeNode
MPIMPI…MesosMesos
![Page 9: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/9.jpg)
Background: Workload in Datacenters
Frontend (Web-servers, dabses)
Decision-driven processes
Exploratory queries (e.g., Dremel)
Production jobs (e.g., compute summaries)
Analytics jobs
9
High Low
Interactive(low-latency)
Batch
Priority
Response
![Page 10: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/10.jpg)
Datacenter OS: Resource Management, Scheduling
10
![Page 11: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/11.jpg)
Hierarchical Scheduler (for Mesos)
Allow administrators to organize into groups Provide resource guarantees per group Share available resources (fairly) across groups
Research questions Abstraction (when using multiple resources)? How to implement using resource offers? What policies are compatible at different levels in the
hierarchy?
11
![Page 12: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/12.jpg)
Cross Application Resource Management
An app uses many services (e.g., file systems, key-value storage, databases, etc)
If an app has high priority and the service it uses doesn’t, the app SLA (Service Level Agreement) might be violated
Research questions Abstraction, e.g., resource delegation, priority
propagation? Clean-slate mechanisms vs. incremental deployability This is also highly challenging in single node OSes!
12
![Page 13: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/13.jpg)
Resource Management using VMs
Most cluster resource managers use Linux containers (e.g., Mesos) Thus, schedulers assume no task migration
Research questions: Develop scheduler for VM environments (e.g., extend
DRF) Tradeoffs between migration, delay, and preemption
13
![Page 14: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/14.jpg)
Task Granularity Selection (Yanpei Chen)
Problem: number of tasks per stage in today’s MapRed apps (highly) sub-optimal
Research question: Derive algorithms to pick the number of tasks to
optimize various performance metrics, e.g., utilization, response time, network traffic
subject to various constraints, e.g., capacity, network
14
![Page 15: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/15.jpg)
Resource Revocation
Which task we should revoke/preempt? Two questions
Which slot has least impact on the giving framework? Is the slot acceptable to receiving framework?
Research questions Identify feasible slot for receiving framework with least
impact on giving framework Light-weight protocol design
15
![Page 16: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/16.jpg)
Control Plane Consistency Model What type of consistency is “good-enough” for
various control plane functions File system metadata (Hadoop) Routing (Nicira) Scheduling Coordinated caching …
Research question What are trade-off between performance and
consistency? Develop generic framework for control plane
16
![Page 17: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/17.jpg)
Decentralized vs. Centralized Scheduling Decentralized schedulers
E.g., Mesos, Hadoop 2.0 Delegate decision to apps (i.e., frameworks, jobs) Advantages: scale and separation of concerns (i.e., apps know
the best where and which tasks to run) Centralized schedulers
Knows all app requirements Advantages: optimal
Research challenge: Evaluate centralized vs. decentralized schedulers Characterize class of workloads for which decentralized
scheduler is good enough
17
![Page 18: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/18.jpg)
Opportunistic Scheduling
Goal: schedule interactive jobs (e.g., <100ms latency)
Existing schedulers: high overhead (e.g., Mesos needs to decide on every offer)
Research challenge: Tradeoff between utilization and response time Evaluate hybrid approach
18
![Page 19: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/19.jpg)
Background: Dominant Resource Fairness
Implement fair (proportional) allocation for multiple types of resources
Key properties Strategy proof: users cannot get an advantage by
lying about their demands Sharing incentives: users are incentivized to share a
cluster rather than partitioning it
19
![Page 20: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/20.jpg)
DRF for Non-linear Resources/Demands
DRF assume resources & demands are additive E.g., task 1 needs (1CPU, 1GB) and task 2 needs
(1CPU, 3GB) both tasks need (2CPU, 4GB) Sometime demands are non-linear
E.g., shared memory Sometime resources are non-linear
E.g., disk throughput, caches Research challenge:
DRF-like scheduler for non-linear resources & demands (could be two projects here!)
20
![Page 21: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/21.jpg)
DRF for OSes
DRF designed for clusters using resource offer mechanism
Redesign DRF to support multi-core OSes
Research questions: Is resource offer best abstraction? How to best leverage preemption? (in Mesos tasks
are not preempted by default) How to support gang scheduling?
21
![Page 22: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/22.jpg)
Storage & Data Processing
22
![Page 23: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/23.jpg)
Resource Isolation for Storage Services
Share storage (e.g., key-value store) between Frontend, e.g., web services Backend, e.g., analytics on freshest data
Research challenge Isolation mechanism: protect front-end performance
from back-end workload
23
![Page 24: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/24.jpg)
“Quicksilver” DB Goal: interactive queries with bounded error on
“unbounded” data Trade between efficiency and accuracy Query response time target: < 100ms
Approach: random pre-sampling across different dimensions (columns)
Research question: given a query and an error bound, find Smallest sample to compute result Sample minimizing disk (or memory) access times (Talk with Sameer, if interested)
24
![Page 25: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/25.jpg)
Split-Privacy DB (1/2)
25
Partition data & computation Private Public (stored on cloud)
Goal: use cloud without revealing the computation result
Example: Operation f(x, y) = x + y, where
x: private y: public
Pick random number a, and compute x’ = x + a compute f(x’, y) = r’ = x’ + y recover result: r = r’ – a = (x’ – a) + y = x + y
Private DB Public DB
fprivate fpublic
result
![Page 26: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/26.jpg)
Split-Privacy DB (2/2)
26
Partition data & computation Private Public (stored on cloud) Example: patient data (private), public clinical and
genomics data sets Goal: use cloud without revealing the
computation result Research questions:
What types of computation can be implemented? Any more powerful than privacy-preserving
computation / Data Mining?
Private DB Public DB
fprivate fpublic
result
![Page 27: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/27.jpg)
RDDs as an OS Abstraction Resilient Data Sets (RDDs)
Fault-tolerant (in-memory) parallel data structures Allows Spark apps to efficiently reuse data
Design cross-application RDDs Research questions
RDD reconstruction (track software and platform changes)
Enable users to share intermediate results of queries (identify when two apps compute same RDD)
RDD cluster-wide caching
27
![Page 28: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/28.jpg)
Provenance-based Efficient Storage (Peter B and Patrick W)
Reduce storage by deleting data that can be recreated Generalization of previous project
Research challenges: Identify data that can deterministically recreated and the
code to do so Use hints?
Tradeoff between re-creation and storage May take into account access patter, frequency, performance
28
![Page 29: CS 294-42: Project Suggestions](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814e8d550346895dbc331f/html5/thumbnails/29.jpg)
Very-low Latency Streaming
Challenge: straglers, failures Approaches to reduce latency:
Redundant computations Speculative execution
Research questions Theoretical trade-off between response time and
accuracy? Achieve target latency and accuracy, while minimizing
the overhead 29