how to setup stream analytics for iot - mobodexter · 2019. 12. 25. · 6 azure create a stream...
TRANSCRIPT
1
How to Setup Stream Analytics for IoT
Whitepaper
Website: www.mobodexter.com www.paasmer.co
2
Table of Contents
1. What is Stream Analytics 3
2. AWS 5
3. Azure 6
4. Google 12
5. Comparison 15
6. About 16
3
What is Stream Analytics
Stream analytics or Streaming analytics typically means making analytically informed decisions in milliseconds, while examining many thousands of events per second, generated from many millions of devices which can also be enriched by many other disparate sources of data.
Stream analytics is important for institutions and individuals alike. We need to know what is happening now and not miss out on anything important. An event
with a particular machine at my productions plant or someone breaking and entering my home is of importance to me now and not later as this helps me in immediately initiating remedial actions based on events.
IoT is a typical use case for Stream Analytics as we have millions of things generating many million events which need to analyze on the fly and make informed choices either automatically or by human intervention.
4
What is Stream Analytics
A Streaming analytics platform has the following features Data or events are analyzed in almost
real time. They may be routine monitoring,
counting, alerting and reporting of data.
They may also include this filtered data or enriched data to be fed into complex decision making systems for training and predictive analytics.
Every incoming event is distinctly
processed. Events may be stored for future usage. Immediate actions are possible after
processing of events, albeit simple
actions like sending alerts, mails, streaming etc.
Advantages of Streaming Analytics Business value of data diminishes with
age. With streaming analytics, an immediate action based on data is possible.
Immediate threats to life, infrastructure is drastically reduces.
Predictive maintenance to cut future losses.
Now let’s looks at the available options to perform streaming analytics. We have options from all three major IoT platform providers AWS, Azure and Google. In this How to blog series, I shall be concentrating on how to setup Stream analytics on all 3 major IoT platform providers.
5
AWS
Logon to the AWS Kinesis Console and configure an Input Stream Select a Kinesis stream or Kinesis Firehose delivery stream as input. Amazon Kinesis Analytics ingests the data, automatically recognizes standard data formats, and suggests a schema. You can refine this schema, or if your input data is unstructured, you can define a new schema using our intuitive schema editor.
Application code:
Write your own Application with SQL code
queries to process the streaming data using
the Kinesis Analytics SQL editor and built-
in templates, and test it with live streaming
data.
Configure an Output Stream
Lastly, point to the destinations where you
want the results loaded. Amazon Kinesis
Analytics integrates out-of-box with Kinesis
Streams and Kinesis Firehose so it’s easy to
send processed results to Amazon S3,
Amazon Redshift, Amazon Elasticsearch
Service, or your own custom destination.
6
Azure
Create a Stream Analytics job
1) In the Azure portal, click the plus sign and then type STREAM ANALYTICS in the text window to the right. Then select Stream Analytics job in the results list.
2) Enter a unique job name and verify the subscription is the correct one for your job. Then
either create a new resource group or select an existing one on your subscription.
3) Then select a location for your job. For speed of processing and reduction of cost in data
transfer selecting the same location as the resource group and intended storage account
is recommended.
Note
You should create this storage account only once per region. This storage will be shared across
all Stream Analytics jobs that are created in that region.
7
Azure
4) Check the box to place your job on your dashboard and then click CREATE.
5) You should see a 'Deployment started...' displayed in the top right of your browser window.
Soon it will change to a completed window as shown below.
8
Azure
Create an Azure Stream Analytics query
After your job is created it's time to open it and build a query. You can easily access your job by
clicking the tile for it.
In the Job Topology pane click the QUERY box to go to the Query Editor. The QUERY editor
allows you to enter a T-SQL query that performs the transformation over the incoming event
data.
\
9
Azure
Query: Archive your raw data
The simplest form of query is a pass-through query that archives all input data to its designated
output. Download the sample data file from GitHub to a location on your computer.
1) Paste the query from the PassThrough.txt file.
2) Click the three dots next to your input and select Upload sample data from file box.
10
Azure
3) A pane opens on the right as a result, in it select the HelloWorldASA-InputStream.json
data file from your downloaded location and click OK at the bottom of the pane.
4) Then click the Test gear in the top left area of the window and process your test query
against the sample dataset. A results window will open below your query as the
processing is complete.
dexter.com
11
Azure
Query: Filter the data based on a condition
Let’s try to filter the results based on a condition. We would like to show results for only those
events that come from “sensorA.” The query is in the Filtering.txt file.
Note that the case-sensitive query compares a string value. Click the Test gear again to execute the query. The query should return 389 rows out of 1860 events.
12
Devices
Devices, or things, are physical devices that
interact with the world and collect data. In
general, they can be considered in two
groups: constrained and standard devices.
Constrained devices can be very small and
have very few resources in terms of
compute, storage, and so on. They might be
able to communicate only through networks
that are unable to reach Cloud Platform
directly, such as over Bluetooth Low Energy
(BLE). Standard devices more likely
resemble small computers. They can route
data directly over networks to Cloud
Platform. In order for the data from
constrained devices to reach Cloud
Platform, they need to go through some
form of gateway device.
13
G00gle
Data import
Data import is the process of sending
information from devices to Cloud Platform
services. There are different import targets,
depending on whether that information is
data about the environment or operational
data about the device and the IoT
infrastructure.
Cloud Pub/Sub
Google Cloud Pub/Sub is a messaging
system that can act like a shock absorber,
both for incoming data streams as well as
changes to application architecture. Even
standard devices can have limited ability to
store and retry sending telemetry data.
Cloud Pub/Sub provides a globally durable
messaging service. The service scales to
handle data spikes that can occur when
swarms of devices respond to events in the
physical world, and buffers these spikes
from applications monitoring the data. By
using topics and subscriptions, you can
allow different functions of your application
to subscribe to device-related streams of
data without updating the primary data-
import target. Cloud Pub/Sub also natively
connects to other Cloud Platform services,
gluing together data import, data pipelines,
and storage systems.
Pipelines
Pipelines manage data after it arrives on
Cloud Platform, similar to how parts are
managed on a factory line. This includes
tasks such as:
Transforming data. You can
convert the data into another format,
for example, converting a captured
device signal voltage to a calibrated
unit measure of temperature
Aggregating and computing data.
By combining data you can add
checks, such as averaging data across
multiple devices to avoid acting on a
single, spurious, device or to ensure
you have actionable data if a single
device goes offline. By adding
computation to your pipeline, you can
apply streaming analytics to data while
it is still in the processing pipeline.
Enriching data. You can combine
the device-generated data with other
metadata about the device, or with
other datasets, such as weather or
traffic data, for use in subsequent
analysis.
Moving data. You can store the
processed data in one or more final
storage locations.
14
Cloud Dataflow
Google Cloud Dataflow is built to perform
all of these pipeline tasks on both batch and
streaming data. With native connectors to
both Cloud Pub/Sub and a variety of
eventual storage destinations, or sinks,
Cloud Dataflow is a fully managed multitool
for data processing.
A Dataflow program expresses a data
processing pipeline, from start to finish. It
used Java or Python SDK for creating,
transforming and writing to a pipeline.
More information can be found here.
Create a Pipeline object.
Use a Read or Create transform to
create one or more PCollections for
your pipeline data.
Apply transforms to each
PCollection. Transforms can change,
filter, group, analyze, or otherwise
process the elements in a PCollection.
Each transform creates a new output
PCollection, to which you can apply
additional, transforms until processing
is complete.
Write or otherwise output the final,
transformed PCollections.
Run the pipeline.
15
Comparison
Based on the ease of setup creation for stream analytics, AWS wins hands down as it is the easiest to setup as compared to Azure or GCP. Azure offers some additional benefits like connectors to Data Lake Insights. GCP offers a programmatic method of setting up and would be more like by programmers and Developers.
16
About
MoboDexter, Co-Founded by Ex-Intel veterans in 2013 is based out in New York, Bangalore and
Singapore. We are rapidly establishing itself as an innovative platform leader in the world of
enterprise Internet of Things. In the booming and evolving Internet of Things market,
MoboDexter has created a unique IoT platform to enable businesses to build their IoT products
and solutions. PAASMER is a software suite that bundles all the elements needed to
connect sensors, gateways, mobile application, cloud and analytics to develop,
build and deploy connected IoT products quickly and efficiently. PAASMER’s end
goal is to enable Artificial intelligence to “Things” so that Things are enabled with their own
intelligence to act in the best interest of the user. Hence Machine learning and Deep learning are
integral choices in the platform for our clients to leverage.
The unique aspects of PAASMER platform that differentiates our platform from other IoT
Platforms in the market are
Best In Class High Speed Edge Database
Innovative Edge Analytics
Modular Edge OS
Innovative Edge & Cloud Security
Dynamic Cloud Management
MoboDexter is advised by Gartner Inc. In a recent Gartner survey, top 4 verticals seeing steep
growth in IoT implementations are HealthCare, Connected Home, M2M & Retail. These are the
same 4 verticals that are growth focus for PAASMER and has signed up clients across the world
in each of these verticals. Our client implementations case studies are here.
Raconteur Online wrote - “MoboDexter’s IoT Platform as a Service, named PAASMER, and
has been built with an inside-out approach from gateway upwards or downwards that makes
it more versatile and flexible to integrate than existing platforms”
For more information visit: - www.mobodexter.com, www.paasmer.co.
Follow Us:-