iot big data ingestion and processing in hadoop by silver spring networks

33
© 2016 Silver Spring Networks. All rights reserved. 1 Silver Spring Networks Greg Brosman Product Manager SilverLink Data Platform

Upload: apache-apex

Post on 16-Apr-2017

1.201 views

Category:

Software


0 download

TRANSCRIPT

Page 1: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 1

Silver Spring NetworksGreg BrosmanProduct ManagerSilverLink Data Platform

Page 2: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 2

Silver Spring Networks• Silver Spring Networks helps global utilities and cities

connect, optimize, and manage smart energy and smart city infrastructure

• Over 22 million connected devices• 200B records read per year• 2 million remote operations per year

IntegrateRenewables

EngageCustomers

Improve Operational Efficiency

Improve Reliability

Manage Peak

AutomateMeasurement

ImproveEnergy Efficiency

Reduce Truck Rolls for Device Maintenance

Page 3: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 3

More Devices, More Data

• How can we do more with our network?- We deployed a network to support meter reading. It works

great, but we’re ready for the next thing to leverage these investments

• How do we manage these new devices and make all this data accessible and secure?- There are lots of opportunities to enhance our service by

making use of advanced analytics, but we can’t get the data to the right people

• How can we reduce the cost, time, and pain of integrating with 3rd party apps?- The ecosystem of 3rd party apps is growing, but need a scalable

way to connect apps with data

Managing the volume, variety, and velocity of data

Page 4: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 4

SilverLink Data Platform

• Automatically ingest smart grid data

• Enrich data with valuable context

• Enable real-time and batch applications

• Archive raw and enriched data

• Connect apps through standard APIs

• Explore data through BI tool integrations

Seamlessly connecting apps with sensor data

Security & API Management

Storage & BatchReal-Time

Data Ingestion

Data Sources

SilverLink Data Platform

ApplicationsSilver Spring

Networks Apps3rd Party

Apps

In-HouseApps

Devices

Silver SpringNetworks Data

UtilityData

3rd PartyData

Page 5: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 5

Starfish

• A Worldwide Wireless IPv6 Network Service for the IoT. Starfish enables cities, utilities, enterprises, and developers to connect and manage a new generation of intelligent devices

• Focus areas include water, energy, food, traffic, transportation and safety

• 2016 Global IoT Hackathon Series: an opportunity to develop and test innovations and collaborate with leading IoT technologists

Building a new ecosystem of IoT services

Page 6: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 6

IOT Big Data Ingestion & Processing in HadoopDarin NeeSilver Spring Networks

Page 7: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 7

• Context & scope of our use case• Tour a DataTorrent app we built• Some technical hurdles & solutions we came up with• Q & A

Agenda

Page 8: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 8

• Sensor reads• Meter register reads & interval data• Threshold events, traps• Device metadata

Kinds of Data

Page 9: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 9

• NICs collect data from meters• Head end software poll NICs• Some data sent asynchronously to head end• Agents send data to SilverLink• Data processing using DataTorrent + more• Data consumed via APIs and SQL

Data Flow

Page 10: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 10

• Encryption of data at rest & in-transit• Ranger & Knox• Custom requirements to satisfy local laws• Auditing• No data leakage across tenants• Not enough to be secure – need to prove it

Security

Page 11: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 11

• Shared resources to cut costs• Customers with millions of devices, and pilots with a handful of

them• Centralized management of software & operations• Challenge in selling shared anything to our customers

Multi-Tenancy

Page 12: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 12

• 23 million network endpoints in service today• Up to 96 intervals a day• Each interval has 4 channels• So, approximately 8 billion intervals per day• Keep this data forever• Also, 100 million events a day• And, sensors that can collect data every 10s• 19.4 GB per million meters per day• ½ TB per day

Scalability

Page 13: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 13

• Clustering• Automated Fail-overs• Rolling upgrades

High Availability & Disaster Recovery

Page 14: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 14

• HDFS• Kafka• DataTorrent• Elasticsearch• OpenTSDB & HBase• Oozie• Hive• Mule• Apigee• Tableau

Tech Architecture

Page 15: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 15

• Management UI Console• Malhar Library + Java• Support• Rapid Development• Stats, Operability, Auto-Scaling

Why DT?

Page 16: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 16

• Resilient operators (availability)• Easily partition operators (scalability)• Any java programmer can build a simple app• Facilitate management hand-off to operations• Easy to detect failures with UI and stats

Strengths

Page 17: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 17

• No “back pressure”• If container crashes with OOM, it restores container to OOM state• No good way to stop an app and save context• Can be difficult to navigate logs

Our focus areas for improvement

Page 18: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 18

Example DT App: AMM Export Ingestion

Page 19: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 19

Example App: AMM Export Ingestion

• Scans last 2 days’ HDFS directories• Emits filenames• Too fast!

Input Operator

Page 20: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 20

Example App: AMM Export Ingestion

• Parses different types• Emits avro tuples• XML parsing can be slow• File & tuple sizes vary a lot

AMM File Reader

Page 21: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 21

Example App: AMM Export Ingestion

• Adds metadata to every tuple• External dependency on elasticsearch• Uses a thread pool since one YARN container too

big for a single client

Enricher

Page 22: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 22

Example App: AMM Export Ingestion

• Normalizes tuples across schema versions• Outputs many tuples from one

Avro Converter

Page 23: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 23

Example App: AMM Export Ingestion

• Writes avro tuples to HDFS files• Names output files by date, input file, part, etc.• HDFS can be slow – another external dependency• Container death causes rewriting of tuples

Enriched Persister

Page 24: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 24

Example App: AMM Export Ingestion

• Embedded instance of OpenTSDB• External dependency on HBase• Slow during metric creation and Hbase Region

splits

TSDB Writer

Page 25: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 25

AMM Export IngestionContinuing to extend the DAG with new operators

Page 26: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 26

• The classic YARN application solution is to spin up more containers

• Not so simple due to external dependencies, and,• Highly variable loads

- Tuple mix- Tuple size- Kind of tuple

• Buffering tuples in the DAG• Static partitioning means the DAG has to be slow• Throughput: how many tuples operator can emit per window• We need dynamic throughput management

Scalability & Throughput

Page 27: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 27

Throughput ManagementWe use a Stats Listener to “auto-tune” the throughput rate

Page 28: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 28

Throughput Management

• Any pair of logical operators• Adjusts upstream operator throughput every N

windows• Scales it by a factor based on downstream operator

backlog threshold levels• A lagging correction since based on operator stats

from prior windows• Observed overall processing rate across DAG oscillates• Control theory says this is not going to work since it

will never converge to a reasonable value

First implementation

Page 29: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 29

Throughput Management

• Compute a backlog• Try to maintain a target backlog that is a multiple

of the downstream operator processing rate• Problem: starvation

- Stats not reported when throughput set to zero- Solution 1: small, positive min throughput- Solution 2: fractional/probabilistic emit

Second implementation

Page 30: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 30

Throughput Management

• Operators don’t run out of memory and crash• Overall throughput across the DAG is much higher• Can adapt to a wide mix of loads• General enough that we are using it in all our apps• We ingested 4 multi-month pilot datasets

successfully• Reduced the time it takes to ingest 1 day’s worth

of data from 1½ hrs to 15 min• Hands off, automated tuning

Successes

Page 31: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 31

Throughput Management

• Throughput management is based on tuple count and not all tuples are the same

• Garbage Collection causes uneven performance• Slow to converge• Hard to test and debug

Remaining problems

Page 32: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 32

• Persist processed state for files & Kafka messages- Save Kafka offsets in ZooKeeper- Rename input files to .processed

• Checkpoint Listener- Wait to persist state until tuple fully transits DAG- Prevent loss of data

• However, some tuples get processed twice• Suspend script

- Use REST API to set a flag on Input Operator- Wait until no more activity

Stopping DAGs

Page 33: IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

© 2016 Silver Spring Networks. All rights reserved. 33

• Hadoop 2.3.0• DataTorrent 3.1.1

Versions