l13:distributed stream processingcds.iisc.ac.in/wp-content/uploads/l13.storm_.pdf · 2018-01-04 ·...

26
Indian Institute of Science Bangalore, India भारतीय विान संथान बंगलौर, भारत Department of Computational and Data Sciences ©Yogesh Simmhan & Partha Talukdar, 2016 This work is licensed under a Creative Commons Attribution 4.0 International License Copyright for external content used with attribution is retained by their original authors L13:Distributed Stream Processing Yogesh Simmhan 30 Mar, 2016 SE256:Jan16 (2:1)

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

Indian Institute of ScienceBangalore, India

भारतीय विज्ञान संस्थान

बंगलौर, भारत

Department of Computational and Data Sciences

©Yogesh Simmhan & Partha Talukdar, 2016This work is licensed under a Creative Commons Attribution 4.0 International LicenseCopyright for external content used with attribution is retained by their original authors

L13:Distributed Stream Processing

Yogesh Simmhan3 0 M a r , 2 0 1 6

SE256:Jan16 (2:1)

Page 2: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Announcements Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by midnight

‣ NO Extension will be given!

Assignment B submission instructions posted‣ There will not be any demo. We should be able to run the

program ourselves. The report should be self-contained.Equal weightage will be given to the program and the report.

‣ You will be evaluated on correctness of the program and its outputs, its speed/scalability, accuracy of the results, and analysis of the algorithm/results/scalability in the report.

Final Exam on Apr 27, Wed in the AM Project topics posted

‣ Topic proposal by Apr 4‣ Project due by Apr 28‣ Project demos on Apr 30

30-Mar-16 2

Page 3: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Stream are Commonplace (too)

Web & Social Networks‣ Twitter, Facebook, Internet packets

Cybersecurity‣ Telecom call logs, financial transactions, Malware

Internet of Things‣ Smart Transport/Power/Water networks

‣ Smart watch/phone/TV/…

30-Mar-16 3

Page 4: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

IISc Smart Campus: Water Management Plan pumping operations for reliability

‣ Avoid underflow/overflow of water‣ 12 hrs to fill a large OHT, scarcity in summer weeks

Provide safer water‣ Leakages, contamination from decades old network

Reduce water usage for sustainability‣ IISc average: 400 Lit/day, Global standard: 135 Lit/day‣ Lack of visibility on usage footprint, sources‣ Opportunities for water harvesting, recycling

Lower the cost‣ Reduce cost for water use & electricity for pumping

30-Mar-16 4

Page 5: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

Over Head Tank (OHT)

Ground Level Reservoir (GLR)

BWSSB Main Inlet

OHT 8

GLR 13

Inlet 4+3

IISc Campus• 440 Acres, 8 Km Perimeter• 50 buildings: Office, Hotel,

Residence, Stores• 10,000 people• Water Use: 40 Lakh Lit/Day• 10MW Power Consumed

30-Mar-16 5

Page 6: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data SciencesOver Head Tanks (OHT)

TPH (near Mechanical) JNT Auditorium Chemical Stores Opposite to CENSE

Opposite to NESARA

Behind old C-Mess

Opposite to Cense (new)

E Type Quarters 30-Mar-16 6

Page 7: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Custom Level + Quality Sensor

Fig. 7. System block diagram

Fig. 8. Scatter plot of measured vs actual distance

Fifty readings are taken every 60cm, and histograms are

plotted at each distance. Finally, all the histograms are merged

into a single scatter plot as shown in figure 8. The spots in this

graph indicate the concentration of readings in an area. Clearly,

accuracy is better at shorter distances. We determined that our

sensor has an accuracy within ±1.5% of full scale, with improved

accuracy at shorter distances. Further experiments at a

swimming pool indicated the sensors performed with the same

accuracy as that observed with the Aluminium board. A picture

of the installation of the sensor on a tank is shown in figure 8.

The system is installed in a Ground Level Reservoir (GLR)

in the campus, with a repeater placed on a nearby Overhead Tank

(OHT) and an access point setup a few hundred metres away.

The data collected every minute from the tank is encapsulated

into packets and transmitted to the repeater, which then forwards

it to the access point. The AP unpacks the received packet and

uploads the level data online to a Google App Engine

application. Figure 10 shows the collected data for half a day. A

few glitches in the data can be seen that can be smoothed after

acquisition. Jitter in readings is only 1-2cm.

Fig. 9. Installation on a tank

Fig. 10. Level data collected from a tank

VII. CONCLUSIONS

We have described an IoT system consisting of custom

sensors interconnected via a sub-GHz based wireless network.

The sensor sub-system uses ultrasonic ranging to determine

water levels in large tanks in the distribution system. Optimised

circuitry and algorithm allowed achieving a range of up to 10m

with an accuracy within ±1.5% of full scale, using low cost

transducers. Initial deployment results are encouraging and as a

next step, we will instrument all the tanks. A further addition is

to include control valves so that we can create a smart

distribution system which distributes just enough water to each

tank to satisfy the local demands.

ACKNOWLEDGMENT

We thank Mr. Alok Rawat for the design of the enclosure.

We thank Mr. Sheetal Kumar and Ms. Anjana of Dept. of Civil

Engineering, Indian Institute of Science, for discussions. We

thank the Robert Bosch Centre for Cyber Physical Systems at

Indian Institute of Science, Bangalore, for funding the project.

REFERENCES

[1] UNDP. Human Development Report, Beyond Scarcity: Power,

Poverty and the Global Water Crisis. United Nations

Development Programme, 2006.

[2] UNDP. Human Development Report, Sustaining Human

Progress: Reducing Vulnerability and Building Resilience.

United Nations Development Programme, 2014.

30-Mar-16 7

Page 8: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Backend

30-Mar-16

ECE BuildingW

W

8

Page 9: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Aadhaar Enrolment

Input is stream of identity enrolment packets

Output is a UIDAI ID (success) or rejection

Each task tagged with Latency (ms)

Each edge tagged with Selectivity‣ Input:output rate, probability of path taken

30-Mar-16Benchmarking Fast Data Platforms for the ‘Aadhaar’ Biometric Database, Shukla, et al, Workshop on Big Data Benchmarking (WBDB), 2015 9

Page 10: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Aadhaar Authentication

30-Mar-16Benchmarking Fast Data Platforms for the ‘Aadhaar’ Biometric Database, Shukla, et al, Workshop on Big Data Benchmarking (WBDB), 2015 10

Page 11: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Stream Processing Continuous DataflowsHow do you “compose” analytics that run

continuously over streaming data?

Application defined as Directed Acyclic Graph (DAG)‣ Vertices are Tasks

‣ Edges are streams

‣ Streams carry tuples/events (name:value) or messages(opaque)

‣ Tasks process one or more messages/tuples and generate zero or more messages/tuples

‣ Message routing?

30-Mar-16 11

Page 12: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Event Processing Programming Models Query Based

‣ Complex Event processing‣ SQL like languages

Programming APIs

Queries or the Programs run on a continuous stream, unlike Hadoop where your data is static for the Batch processor

Need to address diverse streams – Unbounded sequence of events

ExamplesVideo Camera framesTweetsLaser scans from a robotLog data

© Programming Models for IoT and Streaming Data, Qiu

Page 13: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Distributed Stream Processing Systems

Aurora – Early Research System

Borealis – Early Research System

Apache Storm

Apache S4

Apache Samza

Google MillWheel

Amazon Kinesis

LinkedIn Databus

Facebook Puma/Ptail/Scribe/ODS

Azure Stream Analytics

© Programming Models for IoT and Streaming Data, Qiu

Page 14: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Apache StormSee Nathan Marz’s Slides

http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html

30-Mar-16 14

Page 15: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Reliable Processing

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 15

Page 16: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Reliable Processing

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 16

Page 17: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Storm Cluster View

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 17

Page 18: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Fault Tolerance

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 18

Page 19: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Fault Tolerance

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 19

Page 20: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Fault Tolerance

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 20

Page 21: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Fault Tolerance

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 21

Page 22: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Fault Tolerance

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 22

Page 23: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Parallelism

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 23

Page 24: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Parallelism

© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 24

Page 25: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Other Streaming Platforms

Apache Spark Streaming

IBM InfoSphere Streams

Yahoo S4

30-Mar-16 25

Page 26: L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 · Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by

CDS.IISc.in | Department of Computational and Data Sciences

Reading

Ankit Toshniwal, et al. Storm@twitter. In ACM SIGMOD, 2014

Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters, Zaharia, et al, USENIX HotCloud, 2012, https://www.usenix.org/conference/hotcloud12/workshop-program/presentation/zaharia

Leonardo Neumeyer, et al, S4: Distributed Stream Computing Platform. In ICDMW 2010

30-Mar-16 26