idc techscape: internet of things analytics and

37
December 2016, IDC #US41841116 IDC TechScape IDC TechScape: Internet of Things Analytics and Information Management Maureen Fleming Stewart Bond Carl W. Olofson David Schubmehl Dan Vesset Chandana Gopal Carrie Solinger IDC TECHSCAPE FIGURE FIGURE 1 IDC TechScape: Internet of Things Analytics and Information Management — Current Adoption Patterns Note: The IDC TechScape represents a snapshot of various technology adoption life cycles, given IDC's current market analysis. Expect, over time, for these technologies to follow the adoption curve on which they are currently mapped. Source: IDC, 2016

Upload: others

Post on 16-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IDC TechScape: Internet of Things Analytics and

December 2016, IDC #US41841116

IDC TechScape

IDC TechScape: Internet of Things Analytics and Information Management

Maureen Fleming Stewart BondCarl W. Olofson David SchubmehlDan Vesset Chandana GopalCarrie Solinger

IDC TECHSCAPE FIGURE

FIGURE 1

IDC TechScape: Internet of Things Analytics and Information Management —Current Adoption Patterns

Note: The IDC TechScape represents a snapshot of various technology adoption life cycles, given IDC's current market analysis.

Expect, over time, for these technologies to follow the adoption curve on which they are currently mapped.

Source: IDC, 2016

Page 2: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 2

IN THIS STUDY

Implementing the analytics and information management (AIM) tier of an Internet of Things (IoT)

initiative is about the delivery and processing of sensor data, the insights that can be derived from that

data and, at the moment of insight, initiating actions that should then be taken to respond as rapidly as

possible. To achieve value, insight to action must fall within a useful time window. That means the IoT

AIM tier needs to be designed for the shortest time window of IoT workloads running through the end-

to-end system. It is also critical that the correct type of analytics is used to arrive at the insight.

Over time, AIM technology adopted for IoT will be different from an organization's existing technology

investments that perform a similar but less time-sensitive or data volume–intensive function.

Enterprises will want to leverage as much of their existing AIM investments as possible, especially

initially, but will want to adopt IoT-aligned technology as they operationalize and identify functionality

gaps in how data is moved and managed, how analytics are applied, and how actions are defined and

triggered at the moment of insight. This IDC TechScape covering IoT AIM is designed to help:

Enterprises learn more about the newer AIM technologies that support IoT

Align these technologies with an enterprise's technology risk profile to determine what is ready to adopt and what should be monitored

Gain a better understanding of where an IoT team will need to create skills and competencies as it plans to adopt newer AIM technologies

TECHNOLOGY MARKERS OF MOMENTUM

The AIM tier of IoT encompasses the following:

Model discovery, training and design, and the appropriate infrastructure for managing the data associated with these major activities

Software used in production to collect and deliver data reliably to processing targets

Integration to ensure data is in a format useful to target environments

Database options to support ancillary functions not included in most IoT platforms as well as

used by enterprises to build their own capabilities as needed

Analytical software

Thing registration, state, and device management

Operational intelligence (OI) and monitoring to manage the larger systems and processes of things and related assets

Low-code environments to describe the relationship of events to conditions and actions and to support IoT application development

Refer back to Figure 1, which fits the IoT AIM technologies into the appropriate curves. IoT is an

emerging opportunity, and adoption of both IoT-specific and IoT-generalized AIM technologies for IoT

is also early. We positioned each technology on the curves as an optimization of market adoption and

technology maturity to show relative position as opposed to pure market adoption. If we looked only at

market adoption, the labels would generally be too concentrated in the early sections of the curve to be

legible.

Page 3: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 3

Table 1 organizes AIM technologies into functional areas, the type of curve, and IDC's assessment of

stage of adoption, risk level, speed of adoption, and years to market adoption maturity. IoT AIM

consists of generalized AIM useful for IoT as well as IoT-specific technologies organized into the

following categories:

IoT data collection

IoT data transport

IoT data event services

IoT data services

IoT value-added data services

IoT analytics

IoT conditions and actions

IoT visibility

The descriptions of each technology are listed in the same order as they are presented in Table 1.

TABLE 1

IDC TechScape Technology Markers of Momentum

Technology Curve Type

Stage of

Adoption

Speed of

Adoption Risk Level

Market

Buzz

Years to

Full

Adoption

IoT platform Incremental Evaluate Fast Medium Medium 7

IoT edge data collection

Sensor data collection Incremental Deploy Fast Medium High 3

Historian Incremental Evaluate Medium Low Low 5

IoT data transport

Managed data transport Incremental Test Fast Low Low 2

Streaming data Transformational Test Fast Medium Medium 5

Streaming integration Transformational Evaluate Medium High Low 8

IoT data event services

Thing event store Opportunistic Evaluate Medium Medium Medium 5

Thing registry and device

management

Incremental Deploy Fast Low Medium 3

Thing state machine Transformational Test Fast Medium Medium 5

IoT data services

Dynamic data management Incremental Deploy Fast Medium Medium 5

Graph database Transformational Test Slow Medium Low 10

Page 4: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 4

TABLE 1

IDC TechScape Technology Markers of Momentum

Technology Curve Type

Stage of

Adoption

Speed of

Adoption Risk Level

Market

Buzz

Years to

Full

Adoption

Hadoop Incremental Deploy Medium Medium Medium 5

In-memory data processing Transformational Deploy Fast High High 5

In-memory relational Incremental Deploy Medium Low Medium 5

Open data platform Incremental Evaluate Medium High Medium 6

IoT value-added data services

Blockchain Transformational Evaluate Slow High High 10

Data as a service Transformational Evaluate Fast Medium Medium 5

IoT analytics

Rich media analytics Opportunistic Deploy Fast Medium High 10

Statistical analysis Incremental Deploy Fast Medium Low 5

Streaming analytics Transformational Evaluate Medium Medium Medium 5

Supervised machine learning Incremental Evaluate Fast Medium High 10

Unsupervised machine learning Transformational Evaluate Medium Medium Medium 15

IoT conditions and actions

Low-code rules Incremental Deploy Medium Medium Low 7

Low-code app platform Opportunistic Evaluate Medium Low Medium 5

IoT visibility

Operational intelligence Opportunistic Evaluate Medium Medium Low 7

Source: IDC. 2016

Page 5: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 5

IoT Platform

FIGURE 2

IoT Platform Markers of Momentum

Source: IDC, 2016

IoT platforms are a collection of core software components required to support IoT workloads. This

includes:

Registering and connecting devices to the network

Maintaining sensor state data associated with each device

Analytics

Device management

Application development

Security

Many of the IoT platforms are offered as cloud software or related sets of IoT services, while others

can be deployed on-premises in a datacenter or at the edge.

Examples of products include Amazon's AWS IoT Platform, Bosch IoT Suite, Cisco Jasper, GE

Digital's Predix, IBM Watson IoT Platform, Microsoft Azure IoT Suite, Oracle IoT Cloud, PTC's

ThingWorx, and SAP Hana Cloud Platform IoT services.

Pros:

Is a relatively straightforward way to launch an IoT experiment or initiative

Speeds up the process of operationalizing IoT workloads

Cons:

Locks into a single vendor for core IoT workload functions

Is not comprehensive and will require interoperability with missing pieces of an end-to-end middle tier

Page 6: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 6

IoT Edge Technology

Sensor Data Collection

FIGURE 3

Sensor Data Collection Markers of Momentum

Source: IDC, 2016

Sensor data collection edge technology does exactly what its name implies: collects data from

sensors. The data collected is persisted in memory or on disk until such time as it is converted as

needed, analyzed, filtered, and forwarded via data transport technology. If a historian is also in use,

the data may be persisted for a longer period of time to facilitate transaction management and/or

replay capabilities.

Sensor data collection software — whether it is embedded or installed in a gateway device or offered as

standalone server software or virtual machine software — requires the ability to capture data

transmitted by sensors over a variety of protocols, transform into a format that can be transmitted over

the internet or back into the originating protocol, and provide reliability mechanisms to request sensor

data retransmission and security to prevent unauthorized access and untrusted delivery of data and

may require filtering to reduce outbound data volumes.

While there are IoT cloud services that directly collect sensor data, they require that transmission uses

open application or messaging protocols, such as MQTT, HTTP, or AMQP. For that reason, we

classify them as IoT data streaming services in the data transport section.

Depending on requirements, there is often a need to collect sensor data from a mobile edge, such as

vehicles. When the edge is mobile, data may be collected using specialized embedded devices, such

as National Semiconductor cRIO, or purpose built by the manufacturer. In that case, communications

between the embedded device and a central aggregation source may require different networking or

purpose-built communications systems. Embedded sensor data collection and specialized network

communications are outside the scope of this IDC TechScape.

Examples of sensor data collectors include Intel's Wind River Intelligent Device Platform XT,

MathWorks' ThingSpeak, PTC Kepware's KEPServerEX, and MuleSoft's Anypoint.

Pros:

Decouples sensors from central data processing applications

Provides a level of data persistence at the edge on which edge analytics can be performed

for faster response at the edge

Page 7: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 7

Improves qualities of service in low and interrupted bandwidth environments

Cons:

Requires compute and storage capabilities at the edge

Increases latency of data from the sensors to a central processing facility

Historian

FIGURE 4

Historian Markers of Momentum

Source: IDC, 2016

A historian maintains a local collection of sensor data and persists the information in storage for

analysis and reporting, for transactional integrity, or for replay scenarios. Monitoring and reporting

software accesses the data to provide situational awareness. In some industries, such as

manufacturing, historians are mature and have been used for a long time to collect and process data

into a time series.

There is some debate about whether the edge-specific historian will become obsolete in favor of the

data being managed in the cloud. Whether historians continue as a permanent store for single location

use cases depends on where it is least expensive or simplest to maintain, when there is a narrow time

window, and when network conditions are unreliable. Where speed is of benefit, historians may need

to be upgraded to support low-latency use cases. That said, a historian can itself become a thing that

is accessible from a central source, and data from historians can be transferred on a regular basis to a

centralized data store for use in training and discovery.

Process historians are available from a range of vendors, including OSIsoft, Siemens, Honeywell, and

GE, that sell manufacturing technology. Cisco acquired ParStream for its analytics database that can

be deployed at edge locations. IBM offers Informix edge.

Pros:

Event history is persisted at the edge, so if there is an issue with data transmission, it can

be recovered.

Historical data at the edge can be analyzed for historical trends in isolation.

Data can be replayed in test or simulation environments.

Cons:

Storage capacity is required at the edge.

Storage capacity can be complex to set up and manage.

Page 8: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 8

IoT Data Transport Technology

Managed Data Transport

FIGURE 5

Managed Data Transport Markers of Momentum

Source: IDC, 2016

Managed data transport technology picks data up from files or databases populated by the IoT

collector or historian and subsequently sends the files to the target central data processing facility.

Managed data transport technology is more likely to be used:

To support applications where batch or microbatch frequencies meet the IoT data latency requirements of the solution

As a rudimentary bridge between the collector and streaming data technologies, where incompatibility is an issue or where decoupling of the two components is desired

To periodically send data from the historian to the data stores used for the discovery and training required for analytics

We use the term managed data transport because there are underlying choices about what technology

to use. It is common to use managed file transfer (MFT) software and also reasonable to use the

extract, transform, and load (ETL) technology. A file sync and share service can also be used in some

applications.

Examples of software vendors and products in this category include but are not limited to Attunity's

MFT and Replicate; Axway; IBM's Sterling MFT, Aspera, and Datastage; Informatica PowerCenter;

Box; Dropbox; and Egnyte.

Pros:

Managed data transport technology is relatively mature, and many organizations with IoT projects are likely to already have ETL, MFT, or file sync and share in their portfolio. The

issue is implementing agents at the stationary or mobile edge to handle secure transport.

Managed data transport technology can also facilitate data transport from the edge to

target processing facilities if data streaming technology is not available or feasible.

Managed data transport technology can be used to decouple the collector from transport,

offering a higher quality of service in situations where network connectivity is low or unstable.

Page 9: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 9

Cons:

Batch or microbatch will increase data latency between the edge and target processing facilities.

Central processing facilities may need to accommodate spikes of activity with each batch, depending on data volume.

Decoupling may help to one end, but it implicitly adds another component to the solution that will need to be monitored, managed, and maintained.

Depending on the software used for the implementation, it may require a heavier footprint implying sufficient processing and persistence capacity at the edge.

Streaming Data

FIGURE 6

Streaming Data Markers of Momentum

Source: IDC, 2016

Streaming data is the transport that facilitates the flow of data from a source to a target or, in some

cases, multiple targets. Streaming data software transports data that is generated continuously and

transmitted simultaneously in small sizes (order of kilobytes). Transmission is handled by messaging

technology, by specialized agents that forward data, or through APIs that continuously post data to

deliver to a target and in some cases, by application-level coordination of communication using lower-

level protocols, such as HTTP or MQTT. Some solutions handle streaming sensor events directly from

the sensor client through a gateway to the targets and from a central source through a forwarder to the

sensor client. Other solutions pick up sensor events from the collector, which has already converted

the protocol to an IP-compatible format. The messaging that transports data streams also may serve

as the queuing mechanism at the target to receive and queue data from multiple data streams.

Many organizations have already adopted streaming data for IoT. We list it as transformational

because it is a core component of an event-driven architecture, which in its entirety is considered

transformational.

Page 10: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 10

Examples of technology in this category include but are not limited to messaging software such as

Apache's ActiveMQ, Apache Kafka, MQTT-S, RabbitMQ, and ZeroMQ. Software for posting sensor

events via REST APIs include Google Apigee Link and Red Hat 3Scale. IBM offers Bluemix Message

Hub to connect its IoT platform to IBM's Hadoop Bluemix service.

Pros:

Message queuing technologies offer higher quality of service levels over base transport protocols such as HTTP and MQTT.

Message queueing technologies are not new and as such have a lower level of risk associated with them.

Cons:

HTTP and MQTT methods can result in tightly coupled systems, requiring the source to maintain history in the event of data transmission issues. Sending applications will need to manage potential breaks in network connectivity.

Message queuing services add another layer of complexity into the end-to-end solution deriving additional requirements for monitoring, management, and maintenance.

Message queueing services may insert additional latency into the data transmission.

Streaming Integration

FIGURE 7

Streaming Integration Markers of Momentum

Source: IDC, 2016

Streaming integration technologies are used to provide intermediary functionality between the edge

and central processing facilities. Intermediary functionality may be required to perform protocol

conversion, data normalization, and/or filtering. Streaming integration technology sits between the

collector and the data stream, such as an API gateway into the data stream or a change-data-capture

component listening to a collector's local database. It could also be a component that intercepts

messages from the stream or the target message queue, processes the data, and forwards or puts the

transformed data back into the stream.

Page 11: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 11

Examples of software vendors and products in this category include but are not limited to Apache NiFi,

Hortonworks Dataflow, Informatica PowerCenter Real-Time Edition, Oracle Goldengate,

StreamAnalytix, StreamSets Data Collector, and Striim.

Pros:

Streaming integration is useful if transport and/or data protocol conversion is required

between the edge and the stream.

Streaming integration can also be useful to filter, normalize, and reduce the volume of

data, relieving the pressure on stream bandwidth and central processing capacities.

Much of the functionality is borrowed from existing segments of data and application

integration software markets, so there is low risk associated with the technology.

Cons:

Streaming integration adds more components in the end-to-end solution, resulting in more points of failure that need to be monitored, managed, and maintained, and can increase

latency in the data transmission process.

IoT Data Event Services

Thing Event Store

FIGURE 8

Thing Event Store Markers of Momentum

Source: IDC, 2016

Event stores capture and organize sensor data, adding to the store when new sensor data is delivered.

A key attribute is the creation timestamp of the sensor event. Event stores are also created when

streaming analytics is deployed. The event store can be queried by end users, applications, and time

series–based analytical software. Event stores can also be used to backstream for testing and auditing

purposes. Event stores are offered by some vendors as part of their IoT portfolio and also can be

implemented using an in-memory time series database, data grid, or a general-purpose database that

supports time series.

Page 12: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 12

Examples of IoT-optimized products include GE Digital's Predix Time Series, InfluxData, and Basho's

Riak TS. General-purpose databases that support time series, usually with a special index and logic

that can perform time series analytical functions, include the DataStax version of Cassandra (DSE),

Clusterpoint, and SAP HANA.

Pros:

Is highly efficient for low-latency systems

Is part of an IoT event–driven architecture that simplifies access to the times series of sensor data

Can replace or serve as the data management aspect of a historian to provide similar benefits

Cons:

Not broadly deployed

Increases complexity

Thing Registry and Device Management

FIGURE 9

Thing Registry and Device Management Markers of Momentum

Source: IDC, 2016

A thing registry is a database of IoT. Things are the devices that are part of the IoT network. Each

thing registered has an ID, name, and properties or attributes that are used to connect, collect

information, and manage devices. Information about hardware and firmware version levels, install

date, maintenance dates, and other static information about each thing are typically collected in the

registry. Location of the thing may be static if it is fixed, but devices that are mobile would more likely

have location as part of the thing state model.

Thing registry is core to an IoT program, enabling connectivity, support of applications, and device

monitoring and management. Some organizations also build their own registry using a database. A

graph database, for example, is useful for registering a thing and its relationships.

Device management supports bulk operations related to devices, provides diagnostic information, and

handles device actions, such as delivering and installing updates. While device management is not

technically part of event data services, it is paired with the thing registry and thing state machine and

made sense to keep these together.

Page 13: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 13

Examples of products include Amazon's AWS Thing Registry, Bosch's IoT Things and IoT Remote

Manager, GE Digital's Predix Edge Manager, IBM Watson IoT Platform Foundation Device

Management, Microsoft Azure's IoT Hub Device Identity Registry, and PTC's ThingWorx Foundation.

Pros:

A registry provides a central repository of things connected to the network.

A registry can be used for analytics of lifetime, runtimes, service history, and inventories.

A registry can be used to identify location for MRO.

Cons:

The registry will need to be maintained, and unless the things themselves are providing the data for the attributes in the registry, manual maintenance could become

overwhelming.

Thing State Machine

FIGURE 10

Thing State Machine Markers of Momentum

Source: IDC, 2016

A thing state machine maintains the current status of a thing's sensors. While the thing registry

maintains static information about a thing, the state model maintains the current status of information.

Depending on the complexity, a thing state model may also consist of a series of state models.

Depending on product capabilities, state machines can consist of direct sensor readings as well as

calculated — or derived — state. This derived state may also use analytics to arrive at the state, for

example, scoring the status of a derived property in the state model. Using an event-driven

architecture built around publish-and-subscribe provides a way for multiple thing state models to

subscribe to the same sensor data event, depending on the use case. State models may also be

propagated from edge to cloud and across clouds. Ultimately, state models provide the thing state data

required for custom and packaged IoT-related applications.

Not all IoT platforms have a state model construct and may choose to store state data in a time series

database or a relational database. Depending on the complexity of the use case, enterprises may

choose to build their own state models using NoSQL database technology.

Page 14: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 14

Example of products include Amazon's Device Shadows for AWS IoT, PTC's ThingWorx Thing Model,

and Salesforce's Thunder and IoT Cloud.

Pros:

The thing state model is an important asset in an event-driven architecture and for low-code environments, particularly for application development and where nontechnical subject matter experts (SMEs) are developing condition detection and response logic.

The thing state model makes it easier to distribute sensor data to all systems that need the data, particularly in decentralized systems where the design of the system has multiple

tiers managed by different vendors or products, such as an edge tier or a middle tier for machine-specific use cases or an interaction tier for customer experience–centric use cases, where there is an advantage in splitting up the design based on assets required in

each tier.

Cons:

Not all IoT platforms have this capability and may require internal skills to develop and manage on an ongoing basis.

Not all organization working on IoT projects are structuring around events and may be more comfortable using more familiar databases.

IoT Data Services

Dynamic Data Management

FIGURE 11

Dynamic Data Management Markers of Momentum

Source: IDC, 2016

A dynamic data management system can accept data without requiring that the structure and elements

of the data be defined in advance. These include scalable data collection managers (the most common

being Hadoop) and dynamic DBMSs. Because they do not require the use of SQL, dynamic DBMSs

are sometimes called NoSQL database systems. There are two categories of dynamic DBMS:

Semischematic, where the data may be governed by a schema, but one is not required (Any

data may be entered into the database that conforms to the general data format of the DBMS if no schema is present. If a schema is present, it governs the data and optimizes database operation on that basis.)

Nonschematic, where no schema is required, and any data conforming to the general format of the database may be added

Page 15: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 15

The resulting collection of data may end up being rationalized under a schematic structure (in the case

of semischematic), mapped on the basis of field names and values or simply accessed by means of

key-value pairs. Types of dynamic data management systems include:

Document-oriented database systems: Document-oriented database systems manage data blocks containing fields that are identified according to a generally accepted document format. The two most common such formats are Extensible Markup Language (XML) and JavaScript

Object Notation (JSON). Examples of products include Amazon DynamoDB, Couchbase, IBM Cloudant, and MongoDB.

Key accessible database systems: Key accessible databases are nonschematic and store data in a way that supports random retrieval by key value or retrieval in key-value order. They are not true database management systems because they merely facilitate the storage and

retrieval of data according to certain optimized techniques but do not actually manage the database per se — the applications do that. Examples of products include Amazon SimpleDB, Apache HBase, Basho's Riak, and Oracle NoSQL Database.

This category also includes graph databases and Hadoop, which are covered separately.

Pros:

Faster, more flexible way to manage data, particularly data structures that change rapidly or do not lend themselves to an RDBMS

Low-latency response times

High scalability

Cons:

This technology can't be used for applications that query using SQL.

There are skills gaps compared with SQL-based systems.

Graph Database

FIGURE 12

Graph Database Markers of Momentum

Source: IDC, 2016

Graph DBMS software manages data as graph structures. These contain objects sometimes called

"nodes" or "vertices" with recursive attributed relationships, sometimes called "edges." The attributes

of the objects and relationships are called "properties." Unlike a fully schematic database, the structure

of a graph database is derived from the relationship structure that is found in the instance data.

Page 16: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 16

Graph databases are used to capture and analyze extremely complex relationship instance structures.

For example, a thing registry could logically be built in a graph database to make it easier to show

relationships between things and networks of things as well as data flows. Graph databases are also

used to support some types of machine learning.

Graphs are especially useful for discovering previously unknown or little understood relationships.

These relationships can include those arising from behavioral patterns or coincident patterns of

change. With respect to connected devices, these could be such things as tracking customers through

shopping areas using their cell phone location data and correlating this tracking data with that of others

to find useful patterns.

Another example comes from the automotive industry where new cars are heavily instrumented,

regularly transmitting data about the condition of the engine and various wear on parts of the vehicle.

Combine that with geospatial data, and the geolocation data from vehicles with coincident data about

weather and traffic conditions, and it becomes possible to find patterns of relationships between

engine and drivetrain wear, fuel consumption, and various combinations of weather (hot versus cold

and dry versus wet) and traffic (heavy versus light). These patterns, in turn, may be analyzed to a level

of detail that can better inform maintenance service intervals for specific locales and even future

design changes.

Examples of products include Neo Technology's Neo4j, IBM's Bluemix Graph, Blazegraph, Ontotext

Graph DB, OrientDB, Objectivity's ThingSpan (formerly known as InfiniteGraph), and DataStax's DSE

Graph.

Pros:

Unlike other NoSQL DBMSs, a graph DBMS is driven by instance relationships and so makes analysis of patterns and combinations of relationships relatively easy and fast. Unlike an RDBMS, which requires data to conform to a fixed relationship structure, a

graph database reveals the relationships inherent in the data, with very little preparation ahead of the data load.

Because actions and consequences in a complex system generally result in changes to data relationship patterns, graph databases can help drive machine learning and other AI-related operations.

Cons:

Because graph databases can make no assumptions about relationships and patterns of relationships in the data, preloading query optimization is not possible. This is different from an RDBMS, where the relationship structures are fixed in the schema, so query plans

are typically optimized. This means that the work of graph databases must be focused on situations where relationship pattern discovery is primary; it is not a substitute for anRDBMS. Because of the overhead involved in relationship management, it is also not a

substitute for the relatively simple object-by-object processing of a document-orienteddatabase system (e.g., JSON or XML).

Not all graph databases are good at all graph workloads. Some graph databases do text graphing well but fall down with large volume object graphs. Some graph databases are better for relationship traversal (such as finding all objects with at least a fifth-degree

relationship to a given object), while others are good at statistical patterns based on large numbers of related objects.

Page 17: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 17

This area is still evolving. There is no one standard graph query language (such as SQL for relational), although TinkerPop is emerging as a framework, and Gremlin is its

language. Neo4j offers a language called Cypher. SPARQL is sometimes used for graphs that represent semantic information structures. GraphQL is a graph data access method that uses a RESTful API, though its name would suggest a query language. There are

various efforts under way to develop a common query language.

Hadoop

FIGURE 13

Hadoop Markers of Momentum

Source: IDC, 2016

Apache Hadoop is a cluster-based platform for the ingesting and processing of large volumes of data

using a massively parallel processing (MPP) approach. It exists through a group of closely related

Apache open source projects that provide software to manage the cluster and handle the consolidation

of result data across the cluster and various administrative functions. Closely related to Apache

Hadoop are HDFS, which acts as a cluster-based file system, and HBase, which runs on top of HDFS

and acts as a key-value store (a simple NoSQL database that randomly stores and retrieves blocks of

data based on unique key-value pairs). Also, commonly used in this context is Apache Hive, a facility

for defining the data in HBase for retrieval using standard SQL.

The normal mode of processing data, especially new data, in Hadoop is a programming technique

called MapReduce. For IoT and machine learning cases, MapReduce has fallen out of favor as more

users are turning to the high-speed in-memory processing of Spark, either coding natively or in

conjunction with a query processing layer such as Spark SQL. Apache Spark is described in the data

services section under in-memory data processing.

Hadoop is commonly used in the following ways:

As an initial ingest engine, accepting data as well as ordering, filtering, and formatting it and then delivering a subset for further processing either in HDFS or on another platform

For the one-time or limited frequency analysis of very large amounts of data

For the long-term storage of data that ought to be retained but is accessed only occasionally

As a clearinghouse or transformation platform as data is moved from system to system, sometimes as a substitute or replacement for an extract, transform, and load facility

As a combination of the aforementioned bullets, commonly called a "data lake"

Page 18: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 18

Apache Hadoop may be downloaded and used directly from the Apache website, but this requires

considerable technical expertise and a willingness on the part of the enterprise to act as its own

software tech support organization. Most enterprises choose instead to use a commercial packaged

distribution of Hadoop, which comes with advanced management tools, professional support, and

regular software updates ready to install.

Examples of commercial Hadoop distributions include Cloudera Enterprise, Hortonworks Data

Platform (HDP), and MapR Converged Data Platform (which includes an indexed file system called

MapR-FS and its companion NoSQL DBMS MapR-DB as substitutes for HDFS and HBase). Also, IBM

bundles Hadoop into IBM BigInsights, Oracle bundles it in Oracle Big Data Appliance (OBDA), and

similarly, Microsoft offers HDInsight. Amazon offers an AWS-optimized variant called Elastic

MapReduce (EMR).

Pros:

Is ultimately flexible and scalable; can accept any data of any size because the processing

details depend on code.

Is cost effective as a storage platform for huge amounts of searchable data, which is

particularly useful for IoT long-term storage of sensor event data

Supports IoT discovery and training, which is critical to the ultimate success of IoT projects

but is not part of an IoT platform

Cons:

Hadoop applications must be coded. There is no schema and no optimizer. The user is responsible for the maintenance of the system and must do work that DBMSs normally do,

such as data structure management and access optimization.

This is a batch-oriented system, so real-time processing of streaming data is not possible.

Where streaming data is involved, it is usually a companion to some stream data processing engine, serving as a back-end storage facility for later processing of historical data after the fact.

Hadoop in its native form is not suitable for random data update and so should not be considered for transaction processing.

In-Memory Data Processing

FIGURE 14

In-Memory Data Processing Markers of Momentum

Source: IDC, 2016

Page 19: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 19

In-memory data processing platforms enable large-scale data-centric operations to be carried out

entirely in memory, without reference to storage. This sometimes takes the form of loading the data

from some source (such as a database) into memory and maintaining it there for analytic query

processing. It can also take place by managing the data in memory as a database, using snapshots

and logs, or replication to prevent data loss in case of system failure.

The most common of the former type of in-memory data processing platform is Apache Spark. This

facility is run on a cluster, holds data in memory, and performs MPP-based queries on the data. It is

optimized for speed. Spark is most commonly deployed on a Hadoop cluster, using the HDFS (or

HBase) layer for its storage, but it is also run on top of the wide column database, Apache Cassandra,

and can even run on its own clusters. This last configuration is becoming more and more common on

AWS, where it uses the S3 layer for its storage.

Spark is popular for data operations on large data collections where an outcome is expected

immediately or nearly immediately or to speed up time-consuming analytics training. This contrasts it

with Hadoop MapReduce, which is not typically used for interactive query because of the batch nature

of its processing. Spark is also used to collect streaming data, making it available for nearly immediate

use.

Examples of in-memory data processing include Apache Spark, Apache Flink, Apache Ignite,

Databricks, and GridGrain. In addition to the Hadoop distributors listed in the Hadoop section, there

are many commercial Spark distributions. Databricks is a pure-play Spark distributor.

Pros:

In-memory data processing is much faster than Hadoop MapReduce and is assuming increasing amounts of the latter's workloads.

Spark has a range of other projects and a growing ecosystem around it that are designed to add value and functionality to the basic platform. These include MLib for machine

learning, GraphX for graph support, Spark Streaming for streaming data ingestion, and Spark SQL. There are also examples of using Spark in combination with GPU to speed up model training, particularly for highly complex use cases.

Cons:

Like Hadoop, Spark and similar products require a lot of hand coding to make solutions work.

This category is still evolving. Spark, in particular, is evolving rapidly, and new versions are not always fully compatible with previous versions, which means that some adaptation of applications to successive versions of Spark may be necessary.

Page 20: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 20

In-Memory Relational

FIGURE 15

In-Memory Relational Markers of Momentum

Source: IDC, 2016

The in-memory relational technology is found in memory-optimized RDBMSs (i.e., they are optimized

for the management of data in memory as opposed to in storage). Some of these databases are

designed mainly for transaction processing, some mainly for analytical processing, and some do both.

Typically, the analytic RDBMSs in this category are columnar, and most use a compression technique

that not only saves memory but ensures that the data is organized optimally for query processing by

enabling the use of the entire microprocessor data cache to be used with data test operators (e.g.,

equals, not equals, greater than, and less than). This makes the use of single instruction multiple data

(SIMD) operations possible, greatly increasing processing speed. RDBMSs that mainly process

transactions typically hold the data in rows. Those that handle mixed workloads may hold some data in

rows, some in columns, or in some cases, other formats designed to minimize instructions and

memory access.

Some of these in-memory relational databases can accept streaming data at speed, allowing queries

that include current and previously collected data to execute on a very timely basis. Other databases

are simply designed to process transactions very quickly or support complex queries very quickly. All

of these RDBMSs use various techniques including persistent transaction logging and snapshotting to

ensure recoverability so that data loss is no more a concern with them than with storage-based

RDBMSs.

Examples of in-memory RDBMSs include Altibase, deepSQL, MemSQL, Oracle TimesTen, SAP

HANA, and VoltDB.

Pros:

SQL is the most commonly understood query language in the IT world, and these products are optimized for it.

In-memory relational technology delivers speed with structure in a familiar format.

Cons:

Requires the data to conform to the schema of the database, so it is really only usable where the data is well understood and its format does not change much

Requires systems with large amounts of memory, which could be a cost concern

Page 21: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 21

Open Data Platform

FIGURE 16

Open Data Platform Markers for Momentum

Source: IDC, 2016

In IoT, the open data platform is a combination of technologies, integrated together, that enable the

management of and access to relevant IoT and enterprise data regardless of where it sits and what its

format is. The components of such a platform usually include a data integration engine capable of

dynamic data integration (rather than batch data integration), composite data frameworks for

federation and virtualization, data transports, connectors to established databases, stream processing

for incoming streaming data, and a central processing engine, which increasingly are being housed

within Hadoop.

Examples of the use of such a platform in an IoT context could include such things as an automotive

service center comparing readings from sensors in your vehicle with data regarding recent problems

and service to determine whether a service call is in order, a bank with a smart app on your phone

comparing movements and purchases with your general purchase pattern to determine if there is a risk

that your phone was stolen, and a utility company comparing patterns of electricity usage from smart

meter data with historical patterns to determine if changes in the distribution of power on the grid are

warranted.

The open data platform is synonymous with the term unified data platform. In some cases, streaming

data and streaming integration software are used in conjunction with broad database and data

management capabilities to offer a comprehensive open data platform. Enterprises may also choose to

build their own from the four core major components.

Examples of products that deliver such functionality include Teradata with its Unified Data Architecture

(UDA), Informatica with the Informatica Platform, PluralSoft with a unified data architecture focused on

healthcare, and IBM Watson Data Platform.

Pros:

Ensures data consistency and provides access to data and the ability to find what you need when you need it

Offers an opportunity to provide business context for IoT data through integration of in-motion and at-rest data

Page 22: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 22

Cons:

Open data platforms are early in their development and deployment. Many of these offer a set of capabilities that need to be assembled for technical and business use cases, and assembly may not be trivial. As these platforms become more widely used, standard architectures and

best practices will emerge, but for now, this represents a high-risk component.

Nontrivial assemblies lead to complex monitoring, management, and maintenance.

IoT Value-Added Data Services

Blockchain

FIGURE 17

Blockchain Markers of Momentum

Source: IDC, 2016

Blockchain provides a decentralized chain of trust for transactions against an object. Blockchain

originates from bitcoin, and many of the first applications of blockchain technologies are focused on

financial services: payments, equities, and money transfers. However, blockchain can be applied

beyond financial transaction use cases to provide a chain of trust for any type of transaction against

any type of object — real or virtual. The value of the blockchain is that it can be trusted, and it is

distributed, not centralized, providing full provenance of the data on the chain.

Blockchain in IoT can be used to validate that data being received from a thing is actually from that

thing and not an imposter. Likewise, instructions from a source to update a thing can also use

blockchain for validation. Blockchain can also be used to represent the most recent state of a thing,

potentially as an alternative to the thing registry and state model because the blockchain keeps an

immutable record of the history of the thing, and could represent the current state. Every trusted

application that needs access to data about the thing will have a local copy of the thing's chain. When

new blocks are added, the distributed chain is also updated. However, these are still speculations on

how the technology could be applied. There is a lot of work and innovation yet to happen before the

most appropriate use cases of blockchain in IoT emerge.

Blockchain in IoT is still very much in its infancy, although some vendors are releasing technology

building blocks, including the IBM Watson IoT Platform and Chronicled, which has launched a

Ethereum IoT registry based on blockchain. Stock.it is a start-up at the intersection of blockchain and

IoT applications in the sharing economy.

It is also not clear whether the blockchain technology used in bitcoin will be exactly the same used in

IoT use cases or whether the term will be closer to a term used to represent an ultra-secure method for

Page 23: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 23

guaranteeing decentralized data integrity. For example, Ericsson's data integrity service is based on a

keyless signature infrastructure (KSI) that offers similar guarantees and has similar constructs. GE and

Ericsson offer this in the GE Predix catalog as a blockchain-enabled service, which digitally signs and

verifies data to assure that configurations, firmware, and data have not been compromised in addition

to a blockchain-like chain of custody. This doesn't appear to be based on a bitcoin proof of work.

Pros:

Data in a blockchain is tamper proof and does not exist in a single location, so it cannot be maliciously modified.

There is no single thread of communication that can be intercepted, preventing man-in-the-middle attacks from occurring.

Blockchain technology may enable autonomous functioning of smart things without the need for a centralized authority.

Every participant in the chain has the most recent version of the truth and state of each thing.

Cons:

Latency of transaction validation in a blockchain network is still very high because every

actor needs to agree the transaction is valid.

Integration of blockchain technologies with legacy systems in existing organizations may

be daunting.

There is still a lot of research and development happening with blockchain, making this a

high-risk technology at this point.

Data as a Service

FIGURE 18

Data-as-a-Service Markers of Momentum

Source: IDC, 2016

Data as a service (DaaS) represents the data and/or content that is produced or derived as a by-

product of the usual economic activity in commercial and public sectors. These data assets may be in

form of raw data or as various value-added content such as lists, data feeds, scores, algorithms,

recommendations, or benchmarks. DaaS are consumed to improve various types of analytics, with the

ultimate goal of improving the quality of decisions. DaaS also represents the opportunity for

organizations to sell their own data to third parties — either raw data or as a component that enhances

exiting products and services.

Page 24: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 24

For example, inclusion of weather or location data — two ubiquitous DaaS options — can enhance

predictive asset maintenance or logistics optimization processes. Organizations providing DaaS

include those in commercial enterprises and government agencies that generate the original raw data

and companies that locate, extract, mine, aggregate, enrich, and/or curate data for resale. There is a

broad range of data providers, brokers, and marketplaces.

In IoT, there are a handful of general-purpose DaaS, such as weather and location data, but there are

also many other specialized, industry-specific, and business process–specific data services. Examples

of data services include GE SmartSignal, Michelin solutions, Volkswagen Car-Net, IBM's The Weather

Company, Pirelli, MyJohnDeere.com, and Verizon's Precision Market Insights.

Pros:

As consumers of external (third-party) data, organizations can enhance their analytic models with the availability of more data and augment their things master data.

As producers of data or various derived value-added content, organizations have the opportunity to monetize such data assets either directly (by selling data to third parties) or

indirectly (by incorporating data into other services they provide).

Cons:

Use of external data can create additional challenges in data integration and data integrity management.

Monetizing one's data is a complex task that requires creation of a strategy and specific plans for packing, pricing, and ongoing maintenance and delivery of such data products.

IoT Analytics

Rich Media Analytics

FIGURE 19

Rich Media Analytics Markers of Momentum

Source: IDC, 2016

Rich media analytics solutions identify objects, entities, events, attributes, or patterns of behavior

(including temporal and special events either in real time or post event) through the detection,

determination, and analysis of video and image data. Use cases for these solutions include security,

object identification, video monitoring/tracking, image search, automatic alerting, forensic analysis,

image categorization, pattern, image, and shape recognition.

Page 25: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 25

The amount of rich media data that needs to be analyzed and understood is increasing exponentially

with growth of the internet and mobile devices that capture images and videos on a more or less

constant basis. However, IDC estimates that much of this data is useless unless some type of

analytics is applied to it.

The market and opportunities for image and video analytics is growing significantly. Many

organizations would like to be able to monetize images for ecommerce. In addition, there is increased

interest in automated solutions for video surveillance — of human and nonhuman activity.

Organizations are also looking at using video and images as part of the data needed to understand

and improve customer experiences, along with social media data, geolocation information, and

transactional sales data. Video data and video surveillance are being used in a variety of ways by

many different organizations. Governments and enterprises are primary users of image and video

analytics today.

Companies offering image and video analytics include Hitachi, Fujitsu, NEC, Sony, JustVisual, HPE,

IBM, Clarifai, Cortexica, Ramp, Aventure, IntelliVision, 3VR, Accenture, and ObjectVideo.

Pros:

The exponentially increasing amount of image and video content offers an opportunity to apply rich media analytics technology to extract valuable information and knowledge.

Image and video analytics can add another dimension to text-based knowledge in diverse areas from healthcare to terrorism investigation to Internet of Things.

Image and video analytics can provide real-time feedback and information for cognitive decision making in areas such as robotics, drones, and driverless vehicles.

Cons:

This area is still emerging, and identifying objects, patterns, and visual cues correctly can be prone to errors depending on the algorithms and tools used.

Many of these tools use extensive amounts of machine learning, which is highly processing intensive.

Relating entities and objects from video to textual records and information can be challenging.

Statistical Analysis

FIGURE 20

Statistical Analysis Markers of Momentum

Source: IDC, 2016

Page 26: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 26

Statistical analytics software includes packages that use a range of statistical techniques to create,

test, and execute models on analyzing IoT data. This genre falls into the advanced and predictive

analytics software segment of business intelligence and analytics tools. Sample techniques used

include descriptive and predictive analysis, regression, and clustering.

Statistical analytics is used to discover relationships in data and make predictions that are hidden, not

apparent, or too complex to be extracted, or when there is not enough data for other types of modeling.

An example of use case in IoT would be in predictive maintenance, where analysis of sensor data

would provide predictions on components that will be in imminent need of maintenance.

Most statistical analytics packages use programming languages that might be proprietary or open

source or a combination. Most packages also include a graphical user interface that allows analysts to

interact with the software and build models with no or minimal programming.

Examples of products include SAS Analytics and SAS Enterprise Miner, IBM SPSS, SAP Predictive

Analytics and SAP Infinite Insight, and Oracle Data Mining. SPSS is a component of IBM Watson IoT

Platform. In addition, open source modeling languages like R are commonly used by advanced data

analysts.

Pros:

This technology can be used where large gaps exist in data models or where data models

are incomplete.

Data can be easily imported from excel files or other formats.

A variety of statistical techniques for analyzing data can be used. Most packages allow power users to use programming languages for complex analyses that cannot be done

with graphical user interfaces.

Cons:

These tools can be fairly complex to use. Users need to be sophisticated in concepts of statistics and data mining and programming in order to be able to take full advantage of

the capabilities of these tools.

Statistical packages typically cannot account for all factors that might affect an outcome,

especially those that cannot be expressed as structured data.

These tools are not suitable for data manipulation or data preparation. They assume that

the data is cleansed, validated, and prepared, and hence bad data will result in poor predictions.

Page 27: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 27

Streaming Analytics

FIGURE 21

Streaming Analytics Markers of Momentum

Source: IDC, 2016

Streaming analytics continuously evaluates and correlates events to detect anomalies and conditions

requiring further action. Events are received and correlated one at a time in real time or in

microbatches, and the logic typically involves a time window. A condition model managed within the

event processing engine describes:

The relationship of two or more events to each other

The relationship of two or more data elements within a single event

The relationship of a new event and the math or logic that should be applied to the event

The comparison of a desired state and the current state

A simple correlation example is evaluating the newest vibration sensor data event associated with a

machine to a threshold level while more complex models ingest streams of data from multiple sources

to identify more complex patterns that may be more appropriate for systems of things rather than

individual things.

Models can include calculations of statistical probabilities and use of algorithms, rulesets, or code that

describe a condition. Models can describe the presence or an absence of a pattern as well as time

logic between data events within a time window. Streaming analytics in runtime tends to be publish-

and-subscribe, with nodes listening for the results of a previous node. This software often constructs a

series of nodes that handle individual processing steps, such as ingesting a sensor event and

correlating it to others, and if the node has an output (a derived event), it serves as input to the next

node or any node subscribing to that output. The next node may enrich the derived event by looking up

customer information or thing information, with that output serving as input to the next node, which

may apply a rule or algorithm to determine the next best action.

The term streaming analytics was first used as a subcategory of complex event processing (CEP).

Today, the term is used synonymously with CEP. There are a variety of software products that handle

streaming analytics, both open source and proprietary. Many value-added offerings are extending

Spark and Spark Streaming.

Page 28: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 28

Examples of products in this category include Apache Kafka Streams, Apache Storm, Apache Spark

Streaming, AWS Kinesis, IBM Streams, Microsoft Azure Stream Analytics, Salesforce Thunder, SAS

Event Stream Processing, SQLStream, and TIBCO's Streambase and BusinessEvents.

Pros:

Can be used when requirements call for low-latency detection of conditions, particularly

under high data volume conditions

A central component of event-driven design that is oriented to decision support and

decision automation

Can be used in a compact way at the edge compared with other analytical techniques

Can plug in machine learning as part of a stream, supporting hybrid cognitive/programmatic use cases

Can be used for preprocessing events that need to be correlated before moving to a different analytical environment

Cons:

Some organizations may opt not to use when they have many applications that rely on mature data management systems. Instead, the organizations will improve their data refresh rates and forego event-driven design.

Popular open source–based streaming analytics software is less mature, missing many of the key elements that are present in proprietary streaming analytics.

There is a scarcity of developer skills in use of streaming analytics.

Supervised Learning

FIGURE 22

Supervised Learning Markers of Momentum

Source: IDC, 2016

Supervised machine learning begins with examples of training data paired with identifying labels (e.g.,

right or wrong and positive or negative) selected from the categories to be learned. Using these pairs

of example data and labels ("training data"), the system learns parameters of statistical models that it

can then generalize to unlabeled examples of data items that were not seen in the training data ("test

data"). In most cases, the learned models improve over time via a feedback loop that adjusts the

model parameters to better reflect additional sets of training or production data. The performance of a

learned model can be measured by simple prediction accuracy or by the particular business metric the

learned model is designed to support. Performance depends on the degree to which the training data

matches the real world, the choice of algorithm, the algorithm's parameters, and the quantity of data.

Page 29: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 29

Companies like IBM, IPsoft, Wipro, Intel's Saffron Technologies, Infosys, CognitiveScale, and Tata

Consultancy Services include machine learning capabilities in their cognitive system platforms that

allow developers and enterprises to build cognitively enabled "smart" applications that learn over time.

In addition, vendors such as Google, Amazon, Microsoft, and Skytree offer commercial machine

learning libraries as standalone tools. There are also many free and open source machine learning

packages, including Apache Spark's MLlib, which is designed to make machine learning easy and

useful inside the popular Apache Spark framework for cluster computing. In addition, Microsoft

recently open sourced its distributed machine learning library, DMTK, under an MIT License.

Additional open source software includes Waikato Environment for Knowledge Analysis (Weka) and

Massive Online Analysis (MOA) from the University of Waikato and H2O.

Deep learning is a particular type of supervised machine learning based on neural network algorithms,

which has seen recent commercial success. Google released its second-generation deep learning

library, TensorFlow, to open source. Other open source deep learning libraries include Caffe from the

University of California, Berkeley; Theano from the University of Montreal; and Torch from Idiap, used

extensively by Google and Facebook as well as Weka and H2O.

Pros:

Supervised learning algorithms can learn quickly from examples and self-correct when changing trends are reflected in new sets of labeled data.

Advances in computing power and ever expanding sources of data make advanced algorithms possible.

Lots of investment by vendors and venture capital firms are leading to rapid progress.

Cons:

Finding or creating the required labeled data is costly and difficult.

A wide range of options make vendor selection tricky. It can go from being quite inexpensive with open source to very expensive with large vendor offerings. The less costly options will entail lots of internal resources to make it work.

As advanced as these products have become, there are still challenges in achieving objectives when there are large numbers of variables and interdependencies for a

particular decision.

Subject matter experts are needed to assist with the initial and ongoing review of data

training sets, which may prove costly and time consuming.

The bias-variance trade-off requires balanced learning algorithms based on the amount of

available data and the discernable complexity of the function to be learned to automatically adjust the bias-variance trade-off.

Page 30: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 30

Unsupervised Learning

FIGURE 23

Unsupervised Learning Markers of Momentum

Source: IDC, 2016

Unsupervised machine learning is another variation of machine learning where algorithms detect and

discern attributes and features without the benefit of labeled training data. Some algorithms cluster

data into meaningful groups by finding centers of data density. Other unsupervised algorithms use

dimensionality reduction techniques (like singular value decomposition) to uncover the essential

attributes of the data without requiring a human to define those attributes in advance. This is

particularly useful for "unstructured" data, such as images or text, where an underlying structure can

be automatically inferred, enabling other algorithms to leverage the data.

Unsupervised learning algorithms based on clustering, dimensionality reduction, and neural networks

can be found in most major commercial and open source packages, including Apache Mahout,

Gensim, Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI), and

word2Vec, to name a few. These algorithms are used in clustering challenges with unstructured data

such as image categorization, text analytics, or speech recognition. Companies offering unsupervised

learning capabilities include Numenta, Nervana Systems, Loop AI Labs, Luminoso, Clarifai, H2O.ai,

and MetaMind.

For specific IoT use cases, IBM offers Streaming Analytics Service and Watson Machine Learning

Service, which provides more than 200 algorithms for analytics and detecting in streaming IoT data.

Pros:

Unsupervised machine learning does not need labeled training data, enabling faster implementation.

It requires little or no user intervention.

It provides low cost of ownership from reduced staffing and hardware requirements.

Self-learning systems provide automatic system updates.

Cons:

Unsupervised learning needs a lot of data to develop good models.

Training data needs to be well represented.

Training systems risk being over trained.

Page 31: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 31

IoT Conditions and Actions

Low-Code Rules

FIGURE 24

Low-Code Rules Markers of Momentum

Source: IDC, 2016

A major driver of IoT is the ability to connect sensor data to analytics to detect and predict conditions

that warrant a response. The consequences of this shift from reactive problem solving to predictive

problem anticipation are the large volume of decisions that must be made quickly to determine how to

respond. In predictive systems, decision conditions are similar to Big Data. There are volume, velocity,

and variety consequences that we can almost think of as big decisions.

Rules engines and decision services are not widely adopted in IoT, but there is some recognition that

decision automation and decision support is a needed area of investment operationally as well as in

support of sensor-based customer experience initiatives. Rules software abstracts the conditional,

decision-oriented logic from system and application logic. This software is used to create rules that

assign and route work, standardize how decisions are made, and automate decisions.

While this type of reasoning is typically embedded as code in systems, the abstraction improves the

ability to make changes rapidly as situations change. Subject matter experts become the managers of

decision assets rather than developers who, without rules, make changes to conditional logic

embedded in systems through change requests.

Example of low-code rules software includes IBM Bluemix Business Rules, which includes a recipe to

integrate rules with the Watson IoT Platform. Because rules design requires Eclipse, Business Rules is

more oriented toward developers than subject matter experts but is capable of supporting

sophisticated operational requirements. IBM also offers Real-Time Insights as part of its IoT platform.

Sapiens DECISION is a sophisticated horizontal offering that provides a development environment for

SMEs and can be deployed decentrally and also support centralized on-demand decision service.

Salesforce IoT Cloud is in preview. It provides a consumer-grade experience that allows end users to

identify events and map them to conditions and conditions to actions. Red Hat BRMS is based on the

open source Drools.

Pros:

Improves the efficiency of managing decisions as assets in environments where high volumes of decisions need to be made efficiently and rapidly

Works in conjunction with analytics to provide precision in how to route actions

Page 32: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 32

Speeds up development and change management

Cons:

Low-code rules can be complex to manage decisions as assets.

The traditional, more highly adopted rules engines can be too hard to use for nondevelopers and the newer, easier-to-use rules engines may be too simple for

sophisticated use cases.

Low-Code App Platform

FIGURE 25

Low-Code App Platform Markers of Momentum

Source: IDC, 2016

Low-code application platforms combine development and runtime into a single offering. They typically

consist of graphical modeling environments to describe workflows, data objects, and forms; point-and-

click configurations; and relatively simple scripting. These environments are popular for rapid

development as well as development teams that include both business participants and developers.

In IoT, low-code platforms are useful for automating workflows, for building mobile apps, and for

assigning and managing tasks. Moving into the future, the low-code platforms will be useful for

designing and automating interactions used to manage an event-based customer experience.

IoT-specific examples of products include IBM's Node-RED and PTC's ThingWorx Foundation.

Generalized low-code workflow or mobile app environments include Nintex Workflow Cloud, Alfresco

Activiti, Bonitasoft's Bonita BPM, BP Logix, Appian, and Salesforce Lightning.

Pros:

Useful for application design involving collaboration between process experts and developers

Fast development cycles

Ability to provide short-term situational apps

Cons:

May not offer the control developers need for specific use cases

Page 33: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 33

IoT Visibility

Operational Intelligence

FIGURE 26

Operational Intelligence Markers of Momentum

Source: IDC, 2016

Operational intelligence, continuously or in microbatch, captures operational data in near real time,

correlating the data against relationships within the data streams, key performance indicators (KPIs),

service-level agreements (SLAs), or time series data, and delivers the results into a dashboard, ideally

with a drill path to explore root cause. The goal is to spot an operating condition immediately that can

be fixed within a current time window to improve operating performance. Alerts are an important part of

OI, as well as linking alerts to third-party actions that trigger and manage the response.

The idea behind OI is creating business value by identifying and solving one operational problem at a

time to improve profitability.

OI can be tightly focused on a particular process or subprocess within a domain or can be much

broader and span domains. With IoT, sensors associated with things can be monitored in near real

time and trigger an action when an anomaly condition occurs. Broader use of operational intelligence

can monitor networks of things as well as merging data streams from many sources and correlating

and detecting conditions. A cross-domain example is the use of OI to link multiple wind farm sensor

data to a real-time electricity market data feed, providing the opportunity to dial up or down the

production of electricity in real time based on market demand.

The lines between the different intelligence and analytical techniques blur and combine as required. OI

is not necessarily used to predict but instead spot a problem as early in a process as possible to make

it almost seem like a prediction. OI is also moving toward the use of machine learning.

OI products include Splunk when Splunk Forwarder delivers data continuously or in micro batches,

Sight Machine, Vitria IoT Analytics Platform, business activity monitoring products from Software AG,

IBM, and Oracle.

TIBCO Business Events, Software AG's Apama, Apache's Storm and Kafka Streams, SAS Event

Processing, and other one-event-at-a-time streaming analytics tools can also be used for OI in

conjunction with real-time monitoring solutions. These are included under Streaming Analytics in the

IoT Analytics section.

Page 34: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 34

Pros:

Produces a higher-level business view of sensor-supporting operations

Makes it easier to get started in IoT by identifying low-hanging opportunities where the

problems are straightforward to identify and causes are fairly well known

Shifts from reactively responding to problems to proactively identifying them to speed up

resolution

Cons:

Is supplemental rather than core IoT AIM technology

Does not incorporate advanced analytics

TECHNOLOGY ADOPTION OUTLOOK

IoT is the opposite of traditional AIM technology adoption, which involves moving data in batches and

then normalizing and loading the data into target systems. Analytical software is used once the data is

loaded and at rest, typically to produce reports or statistical analysis that help in decision making or for

on-demand decision automation.

IoT AIM is about sensing and responding within a time window, continuously moving and managing

sensor events, and handling large volumes of data, continuous decision automation, and decision

support using analytics and rules. Data must travel from a sensor to edge collection to central

processing where it is normalized and analyzed against some type of prediction model or algorithm to

determine whether further action is required. Once actions are required, response cycle times vary

substantially, but the end-to-end cycle time must be faster than the time window allotted to derive

business benefit.

Four considerations should dominate IoT AIM technology adoption planning:

What is the total time window available to deliver business value when a condition is identified

that requires a response? Adoption of AIM technology is required when time windows are narrower than the cycle time of the end-to-end IoT system.

How good is the prediction or insight from your analytics software? Quality problems occur for a variety of reasons, but noisy predictions and wrong or nonactionable predictions are all expensive. Using the best approach to analytics for a particular problem requires an

assessment of whether there are data gaps that need to be resolved as well as identifying options and experimenting with them prior to adoption. Different techniques may also be required for different workloads or stages within a workload.

How much technical debt are you accumulating by repurposing existing AIM technology or investing in custom development? In the beginning, it makes sense to keep costs low by

leveraging existing AIM technology for an IoT project. But technical debt rapidly accumulates when existing technology doesn't really align with needs and has to be customized or contorted on an ongoing basis to make it work. As IoT initiatives are operationalized, the use

of purpose-built tools is almost always a better path once those tools reach the required level of sophistication.

How do technology choices align with your enterprise's adoption risk profiles? Different organizations have different approaches to risk. When it becomes clear that there is a need to add new functionality or replace nonperforming existing technology, the selection has to align

with the skills of the team implementing and using the technology. We assess the adoption risk

Page 35: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 35

and speed of adoption for each of the 25 technologies highlighted in this IDC TechScape. Planning should take both of those factors into account. If a new technology identified in

Figure 1 has a higher risk than is acceptable to your organization but has a fast rate of market adoption, it is important to begin planning and acquiring skills sooner than later for eventual adoption.

LEARN MORE

Related Research

IDC's Worldwide Software Taxonomy, 2016 (IDC #US41572216, July 2016)

Internet of Things Analytics and Information Management Software Taxonomy, 2016 (IDC #US40708515, December 2015)

IDC TechScape Methodology

Unlike other technology assessment frameworks, the IDC TechScape provides a visual representation

of the process of technology adoption, dividing technologies into three major categories based on their

impact on the organization and assessing their relative maturity within their respective categories. The

study examines particular individual categories and provides additional insights about the speed of

adoption, technology potential for success (risk), and industry hype. Refer back to Figure 1 for the IDC

TechScape for Internet of Things analytics and information management.

The IDC TechScape is a tool for strategic planning and tactical decision making for technology

professionals in IT buyer organizations. This audience may include CIOs and senior technology

professionals, strategists, and IT buyers from IT or from lines of business.

The document's two functions:

Strategic planning tool:

Offers a view into where a technology exists in its overall adoption life cycle. Generally, technologies in the early stages of evaluation and deployment are riskier investments thanthose further along in the adoption life cycle as they are deployed more broadly.

Sorts technologies into three categories that may help organizations make judgements about which technologies might provide the greatest positive impact on their organization.

IT strategists can use this information to prioritize interest in a technology or group of technologies.

Tactical decision-making tool: Because it lays out where a technology exists within its overall adoption life cycle, and a certain level of associated risk may be inferred, an organization can use the IDC TechScape to determine whether or not it should immediately adopt a particular

technology or should wait until the risk of adoption is less.

Page 36: IDC TechScape: Internet of Things Analytics and

©2016 IDC #US41841116 36

IDC TechScape Categories and Definitions: Transformational, Incremental,and Opportunistic

Executives use the IDC TechScape model to:

Inform technology adoption decisions based on organizational appetite for risk and potential for transformational change

Support a decision on when a technology or group of technologies might be ready for adoption, given the purchasing organization's preferred appetite for risk — whether or not an organization should immediately adopt a particular technology or wait until the risk of adoption

decreases

The three types of adoption curves in an IDC TechScape are:

Transformational. These technologies will completely reshape markets and investment

strategies. They may create new business and/or market opportunities and lead to new enterprise and consumer capabilities. They may differ significantly from current technologies and may have mostly unrecognized market impacts/opportunities. Transformational technologies

have already demonstrated that they fundamentally change current best practices.

Incremental. This new generation of technology measurably improves on an existing category

of technologies to deliver better business outcomes. In terms of business processes, technologies deliver small but measurable improvement over current best practices.

Opportunistic. These technologies will grow based on specific use cases, and they have an undetermined or limited capability to improve existing technologies/processes. Their potential changes currently lack a clear impact on current best practices.

Synopsis

Over time, analytics and information management (AIM) technology adopted for IoT will be different

from an organization's existing technology investments that performs a similar, but less time-sensitive

or data volume–intensive function. Enterprises will want to leverage as much of their existing AIM

investments as possible, especially initially, but will want to adopt IoT-aligned technology as they

operationalize and identify functionality gaps in how data is moved and managed, how analytics are

applied, and how actions are defined and triggered at the moment of insight. This IDC TechScape

covering IoT AIM is designed to help:

Enterprises learn more about the newer AIM technologies that support IoT

Align these technologies with an enterprise's technology risk profile to determine what is ready to adopt and what should be monitored

Gain a better understanding of where an IoT team will need to create skills and competencies as it plans to adopt newer AIM technologies

According to Maureen Fleming, vice president for IDC's IoT Analytics and Information Management

research program, "Implementing the analytics and information management tier of an IoT initiative is

about the delivery and processing of sensor data, the insights that can be derived from that data and,

at the moment of insight, initiating actions that should then be taken to respond as rapidly as possible.

To achieve value, insight to action must fall within a useful time window. That means the IoT AIM tier

needs to be designed for the shortest time window of IoT workloads running through the end-to-end

system. It is also critical that the correct type of analytics is used to arrive at the insight."

Page 37: IDC TechScape: Internet of Things Analytics and

About IDC

International Data Corporation (IDC) is the premier global provider of market intelligence, advisory

services, and events for the information technology, telecommunications and consumer technology

markets. IDC helps IT professionals, business executives, and the investment community make fact-

based decisions on technology purchases and business strategy. More than 1,100 IDC analysts

provide global, regional, and local expertise on technology and industry opportunities and trends in

over 110 countries worldwide. For 50 years, IDC has provided strategic insights to help our clients

achieve their key business objectives. IDC is a subsidiary of IDG, the world's leading technology

media, research, and events company.

Global Headquarters

5 Speen Street

Framingham, MA 01701

USA

508.872.8200

Twitter: @IDC

idc-community.com

www.idc.com

Copyright and Trademark Notice

This IDC research document was published as part of an IDC continuous intelligence service, providing written

research, analyst interactions, telebriefings, and conferences. Visit www.idc.com to learn more about IDC

subscription and consulting services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please

contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or [email protected] for information on

applying the price of this document toward the purchase of an IDC service or for information on additional copies

or web rights. IDC and TechScape are trademarks of International Data Group, Inc. IDC TechScape is a

registered trademark of International Data Corporation, Ltd. in Japan.

Copyright 2016 IDC. Reproduction is forbidden unless authorized. All rights reserved.