cs 294: big data system research: trends and challenges · big data first papers: » 2003: the...
TRANSCRIPT
![Page 1: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/1.jpg)
1
CS 294: Big Data System Research: Trends and Challenges
Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (http://www.cs.berkeley.edu/~istoica/classes/cs294/15/)
![Page 2: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/2.jpg)
Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper
Today every major system & networking conference has Big Data sessions
![Page 3: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/3.jpg)
Big Data Impact Already helped create new business Already helped disrupt existing businesses » Retail » Rental » Taxi » home appliances » …
![Page 4: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/4.jpg)
Big Data Stack
Data Processing Layer
Resource Management Layer
Storage Layer
![Page 5: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/5.jpg)
Hadoop Stack
Data Processing Layer
Resource Management Layer
Storage Layer
… Hadoop MR
Hive PigImpala Storm
Hadoop Yarn
HDFS, S3, …
![Page 6: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/6.jpg)
The Berkeley AMPLab January 2011 – 2017 » 8 faculty » > 40 students » 3 software engineer team
Organized for collaboration
3 day retreats (twice a year)
lgorithms
achines eople
220 campers (100+ companies)
AMPCamp3 (August, 2013)
![Page 7: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/7.jpg)
The Berkeley AMPLab Governmental and industrial funding:
Goal: Next generation of open source data analytics stack for industry & academia: Berkeley Data Analytics Stack (BDAS)
![Page 8: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/8.jpg)
BDAS Stack
Data Processing Layer
Resource Management Layer
Storage Layer
Mesos
Spark
SparkStreaming Shark SQL
BlinkDBGraphX
MLlib
MLBase
HDFS, S3, … Tachyon
![Page 9: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/9.jpg)
Mesos
HDFS, S3, … Tachyon
Spark
SparkStreaming Shark SQL
BlinkDBGraphX
MLlib
MLBase
BDAS & Hadoop fitting together
Hadoop Yarn
HDFS, S3, …
![Page 10: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/10.jpg)
Mesos
HDFS, S3, … Tachyon
How do BDAS & Hadoop fit together?
Hadoop Yarn
HDFS, S3, …
Spark
SparkStreaming Shark SQL
BlinkDBGraphX
MLlib
MLBaseSpark Straming Shark
SQL
Graph X ML
library
BlinkDB MLbase
Spark Hadoop MR
Hive PigImpala Storm
![Page 11: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/11.jpg)
Mesos
HDFS, S3, … Tachyon
How do BDAS & Hadoop fit together?
Hadoop Yarn
HDFS, S3, …
Spark Straming Shark
SQL
Graph X ML
library
BlinkDB MLbase
Spark Hadoop MR
Hive PigImpala Storm
![Page 12: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/12.jpg)
This Class Learn about state-of-art research in Big Data Work on an exciting project
Hopefully start next generation of impactful projects
![Page 13: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/13.jpg)
13
Grading Project: 60% Class presentations: 40% » Around 2 papers per student » See Randy’s guidelines for leading discussion on
papers • http://bnrg.eecs.berkeley.edu/~randy/Courses/
CS294.F07/LeadingPapers.pdf
![Page 14: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/14.jpg)
Administrative Information Class website: http://www.cs.berkeley.edu/~istoica/classes/cs294/15/
Office Hours (Soda 465D): » TBA
Create an (anonymized) blog account for paper reviews if you don’t have one yet (e.g., www.blogger.com) » Sent me an e-mail by Monday, August 31, with your blog url » Preferred e-mail for the class e-mail list
14
![Page 15: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/15.jpg)
15
Papers Is the problem real? What is the solution’s main idea (nugget)? Why is solution different from previous work? » Are system assumptions different? » Is workload different? » Is problem new?
Does the paper (or do you) identify any fundamental/hard trade-offs?
![Page 16: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/16.jpg)
16
Papers (cont’d) Do you think the work will be influential in 10 years? » Why or why not?
Predicting the future hard, but worth a try » Look at past examples for inspiration
![Page 17: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/17.jpg)
17
Streaming Over TCP Countless papers: » Why cannot be done… » New protocols to do it…
Today » Virtually all streaming over TCP » Trend to stream over HTTP!
![Page 18: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/18.jpg)
18
Why did it Succeed?
![Page 19: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/19.jpg)
19
Multicast Countless papers: » Why world will come to a standstill without multicast… » New protocols to do it…
Today » Multicast is used only in enterprise settings at best » Overlay multicast widely used in the Internet
• CDN based, e.g., WorldCup, March Madness, Iinagurations, ...
• P2P, mostly popular outside US (e.g., China)
![Page 20: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/20.jpg)
20
Why Did it Fail?
![Page 21: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/21.jpg)
21
Shared Memory Countless papers: » How shared memory simplifies programming
parallel computers » Many, many systems proposed and build
Today: » Message passing (MPI) took over as the de facto
standard for writing parallel applications
![Page 22: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/22.jpg)
22
Why Did it Fail?
![Page 23: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/23.jpg)
23
Network Computer Big in 90s » Promoted by an alliance of Sun, Oracle, Acorn
Promise: many of advantages of cloud computing » Easy to manage » Application sharing » …
Failed miserably
![Page 24: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/24.jpg)
24
Why Did it Fail?
![Page 25: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/25.jpg)
Coming Back: ChromeOS Will it succeed this time?
25
![Page 26: CS 294: Big Data System Research: Trends and Challenges · Big Data First papers: » 2003: The Google file system paper » 2004: The MapReduce paper Today every major system & networking](https://reader036.vdocuments.mx/reader036/viewer/2022070801/5f026f697e708231d4044046/html5/thumbnails/26.jpg)
26
What are Hard/Fundamental Tradeoffs?
Brewer’s CAP conjecture: “Consistency, Availability, Partition-tolerance”, you can have only two in a distributed system In a in-order, reliable communication protocol cannot minimize overhead and latency simultaneously Hard to simultaneously maximize evolvability and performance