idc techscape: internet of things analytics and
TRANSCRIPT
December 2016, IDC #US41841116
IDC TechScape
IDC TechScape: Internet of Things Analytics and Information Management
Maureen Fleming Stewart BondCarl W. Olofson David SchubmehlDan Vesset Chandana GopalCarrie Solinger
IDC TECHSCAPE FIGURE
FIGURE 1
IDC TechScape: Internet of Things Analytics and Information Management —Current Adoption Patterns
Note: The IDC TechScape represents a snapshot of various technology adoption life cycles, given IDC's current market analysis.
Expect, over time, for these technologies to follow the adoption curve on which they are currently mapped.
Source: IDC, 2016
©2016 IDC #US41841116 2
IN THIS STUDY
Implementing the analytics and information management (AIM) tier of an Internet of Things (IoT)
initiative is about the delivery and processing of sensor data, the insights that can be derived from that
data and, at the moment of insight, initiating actions that should then be taken to respond as rapidly as
possible. To achieve value, insight to action must fall within a useful time window. That means the IoT
AIM tier needs to be designed for the shortest time window of IoT workloads running through the end-
to-end system. It is also critical that the correct type of analytics is used to arrive at the insight.
Over time, AIM technology adopted for IoT will be different from an organization's existing technology
investments that perform a similar but less time-sensitive or data volume–intensive function.
Enterprises will want to leverage as much of their existing AIM investments as possible, especially
initially, but will want to adopt IoT-aligned technology as they operationalize and identify functionality
gaps in how data is moved and managed, how analytics are applied, and how actions are defined and
triggered at the moment of insight. This IDC TechScape covering IoT AIM is designed to help:
Enterprises learn more about the newer AIM technologies that support IoT
Align these technologies with an enterprise's technology risk profile to determine what is ready to adopt and what should be monitored
Gain a better understanding of where an IoT team will need to create skills and competencies as it plans to adopt newer AIM technologies
TECHNOLOGY MARKERS OF MOMENTUM
The AIM tier of IoT encompasses the following:
Model discovery, training and design, and the appropriate infrastructure for managing the data associated with these major activities
Software used in production to collect and deliver data reliably to processing targets
Integration to ensure data is in a format useful to target environments
Database options to support ancillary functions not included in most IoT platforms as well as
used by enterprises to build their own capabilities as needed
Analytical software
Thing registration, state, and device management
Operational intelligence (OI) and monitoring to manage the larger systems and processes of things and related assets
Low-code environments to describe the relationship of events to conditions and actions and to support IoT application development
Refer back to Figure 1, which fits the IoT AIM technologies into the appropriate curves. IoT is an
emerging opportunity, and adoption of both IoT-specific and IoT-generalized AIM technologies for IoT
is also early. We positioned each technology on the curves as an optimization of market adoption and
technology maturity to show relative position as opposed to pure market adoption. If we looked only at
market adoption, the labels would generally be too concentrated in the early sections of the curve to be
legible.
©2016 IDC #US41841116 3
Table 1 organizes AIM technologies into functional areas, the type of curve, and IDC's assessment of
stage of adoption, risk level, speed of adoption, and years to market adoption maturity. IoT AIM
consists of generalized AIM useful for IoT as well as IoT-specific technologies organized into the
following categories:
IoT data collection
IoT data transport
IoT data event services
IoT data services
IoT value-added data services
IoT analytics
IoT conditions and actions
IoT visibility
The descriptions of each technology are listed in the same order as they are presented in Table 1.
TABLE 1
IDC TechScape Technology Markers of Momentum
Technology Curve Type
Stage of
Adoption
Speed of
Adoption Risk Level
Market
Buzz
Years to
Full
Adoption
IoT platform Incremental Evaluate Fast Medium Medium 7
IoT edge data collection
Sensor data collection Incremental Deploy Fast Medium High 3
Historian Incremental Evaluate Medium Low Low 5
IoT data transport
Managed data transport Incremental Test Fast Low Low 2
Streaming data Transformational Test Fast Medium Medium 5
Streaming integration Transformational Evaluate Medium High Low 8
IoT data event services
Thing event store Opportunistic Evaluate Medium Medium Medium 5
Thing registry and device
management
Incremental Deploy Fast Low Medium 3
Thing state machine Transformational Test Fast Medium Medium 5
IoT data services
Dynamic data management Incremental Deploy Fast Medium Medium 5
Graph database Transformational Test Slow Medium Low 10
©2016 IDC #US41841116 4
TABLE 1
IDC TechScape Technology Markers of Momentum
Technology Curve Type
Stage of
Adoption
Speed of
Adoption Risk Level
Market
Buzz
Years to
Full
Adoption
Hadoop Incremental Deploy Medium Medium Medium 5
In-memory data processing Transformational Deploy Fast High High 5
In-memory relational Incremental Deploy Medium Low Medium 5
Open data platform Incremental Evaluate Medium High Medium 6
IoT value-added data services
Blockchain Transformational Evaluate Slow High High 10
Data as a service Transformational Evaluate Fast Medium Medium 5
IoT analytics
Rich media analytics Opportunistic Deploy Fast Medium High 10
Statistical analysis Incremental Deploy Fast Medium Low 5
Streaming analytics Transformational Evaluate Medium Medium Medium 5
Supervised machine learning Incremental Evaluate Fast Medium High 10
Unsupervised machine learning Transformational Evaluate Medium Medium Medium 15
IoT conditions and actions
Low-code rules Incremental Deploy Medium Medium Low 7
Low-code app platform Opportunistic Evaluate Medium Low Medium 5
IoT visibility
Operational intelligence Opportunistic Evaluate Medium Medium Low 7
Source: IDC. 2016
©2016 IDC #US41841116 5
IoT Platform
FIGURE 2
IoT Platform Markers of Momentum
Source: IDC, 2016
IoT platforms are a collection of core software components required to support IoT workloads. This
includes:
Registering and connecting devices to the network
Maintaining sensor state data associated with each device
Analytics
Device management
Application development
Security
Many of the IoT platforms are offered as cloud software or related sets of IoT services, while others
can be deployed on-premises in a datacenter or at the edge.
Examples of products include Amazon's AWS IoT Platform, Bosch IoT Suite, Cisco Jasper, GE
Digital's Predix, IBM Watson IoT Platform, Microsoft Azure IoT Suite, Oracle IoT Cloud, PTC's
ThingWorx, and SAP Hana Cloud Platform IoT services.
Pros:
Is a relatively straightforward way to launch an IoT experiment or initiative
Speeds up the process of operationalizing IoT workloads
Cons:
Locks into a single vendor for core IoT workload functions
Is not comprehensive and will require interoperability with missing pieces of an end-to-end middle tier
©2016 IDC #US41841116 6
IoT Edge Technology
Sensor Data Collection
FIGURE 3
Sensor Data Collection Markers of Momentum
Source: IDC, 2016
Sensor data collection edge technology does exactly what its name implies: collects data from
sensors. The data collected is persisted in memory or on disk until such time as it is converted as
needed, analyzed, filtered, and forwarded via data transport technology. If a historian is also in use,
the data may be persisted for a longer period of time to facilitate transaction management and/or
replay capabilities.
Sensor data collection software — whether it is embedded or installed in a gateway device or offered as
standalone server software or virtual machine software — requires the ability to capture data
transmitted by sensors over a variety of protocols, transform into a format that can be transmitted over
the internet or back into the originating protocol, and provide reliability mechanisms to request sensor
data retransmission and security to prevent unauthorized access and untrusted delivery of data and
may require filtering to reduce outbound data volumes.
While there are IoT cloud services that directly collect sensor data, they require that transmission uses
open application or messaging protocols, such as MQTT, HTTP, or AMQP. For that reason, we
classify them as IoT data streaming services in the data transport section.
Depending on requirements, there is often a need to collect sensor data from a mobile edge, such as
vehicles. When the edge is mobile, data may be collected using specialized embedded devices, such
as National Semiconductor cRIO, or purpose built by the manufacturer. In that case, communications
between the embedded device and a central aggregation source may require different networking or
purpose-built communications systems. Embedded sensor data collection and specialized network
communications are outside the scope of this IDC TechScape.
Examples of sensor data collectors include Intel's Wind River Intelligent Device Platform XT,
MathWorks' ThingSpeak, PTC Kepware's KEPServerEX, and MuleSoft's Anypoint.
Pros:
Decouples sensors from central data processing applications
Provides a level of data persistence at the edge on which edge analytics can be performed
for faster response at the edge
©2016 IDC #US41841116 7
Improves qualities of service in low and interrupted bandwidth environments
Cons:
Requires compute and storage capabilities at the edge
Increases latency of data from the sensors to a central processing facility
Historian
FIGURE 4
Historian Markers of Momentum
Source: IDC, 2016
A historian maintains a local collection of sensor data and persists the information in storage for
analysis and reporting, for transactional integrity, or for replay scenarios. Monitoring and reporting
software accesses the data to provide situational awareness. In some industries, such as
manufacturing, historians are mature and have been used for a long time to collect and process data
into a time series.
There is some debate about whether the edge-specific historian will become obsolete in favor of the
data being managed in the cloud. Whether historians continue as a permanent store for single location
use cases depends on where it is least expensive or simplest to maintain, when there is a narrow time
window, and when network conditions are unreliable. Where speed is of benefit, historians may need
to be upgraded to support low-latency use cases. That said, a historian can itself become a thing that
is accessible from a central source, and data from historians can be transferred on a regular basis to a
centralized data store for use in training and discovery.
Process historians are available from a range of vendors, including OSIsoft, Siemens, Honeywell, and
GE, that sell manufacturing technology. Cisco acquired ParStream for its analytics database that can
be deployed at edge locations. IBM offers Informix edge.
Pros:
Event history is persisted at the edge, so if there is an issue with data transmission, it can
be recovered.
Historical data at the edge can be analyzed for historical trends in isolation.
Data can be replayed in test or simulation environments.
Cons:
Storage capacity is required at the edge.
Storage capacity can be complex to set up and manage.
©2016 IDC #US41841116 8
IoT Data Transport Technology
Managed Data Transport
FIGURE 5
Managed Data Transport Markers of Momentum
Source: IDC, 2016
Managed data transport technology picks data up from files or databases populated by the IoT
collector or historian and subsequently sends the files to the target central data processing facility.
Managed data transport technology is more likely to be used:
To support applications where batch or microbatch frequencies meet the IoT data latency requirements of the solution
As a rudimentary bridge between the collector and streaming data technologies, where incompatibility is an issue or where decoupling of the two components is desired
To periodically send data from the historian to the data stores used for the discovery and training required for analytics
We use the term managed data transport because there are underlying choices about what technology
to use. It is common to use managed file transfer (MFT) software and also reasonable to use the
extract, transform, and load (ETL) technology. A file sync and share service can also be used in some
applications.
Examples of software vendors and products in this category include but are not limited to Attunity's
MFT and Replicate; Axway; IBM's Sterling MFT, Aspera, and Datastage; Informatica PowerCenter;
Box; Dropbox; and Egnyte.
Pros:
Managed data transport technology is relatively mature, and many organizations with IoT projects are likely to already have ETL, MFT, or file sync and share in their portfolio. The
issue is implementing agents at the stationary or mobile edge to handle secure transport.
Managed data transport technology can also facilitate data transport from the edge to
target processing facilities if data streaming technology is not available or feasible.
Managed data transport technology can be used to decouple the collector from transport,
offering a higher quality of service in situations where network connectivity is low or unstable.
©2016 IDC #US41841116 9
Cons:
Batch or microbatch will increase data latency between the edge and target processing facilities.
Central processing facilities may need to accommodate spikes of activity with each batch, depending on data volume.
Decoupling may help to one end, but it implicitly adds another component to the solution that will need to be monitored, managed, and maintained.
Depending on the software used for the implementation, it may require a heavier footprint implying sufficient processing and persistence capacity at the edge.
Streaming Data
FIGURE 6
Streaming Data Markers of Momentum
Source: IDC, 2016
Streaming data is the transport that facilitates the flow of data from a source to a target or, in some
cases, multiple targets. Streaming data software transports data that is generated continuously and
transmitted simultaneously in small sizes (order of kilobytes). Transmission is handled by messaging
technology, by specialized agents that forward data, or through APIs that continuously post data to
deliver to a target and in some cases, by application-level coordination of communication using lower-
level protocols, such as HTTP or MQTT. Some solutions handle streaming sensor events directly from
the sensor client through a gateway to the targets and from a central source through a forwarder to the
sensor client. Other solutions pick up sensor events from the collector, which has already converted
the protocol to an IP-compatible format. The messaging that transports data streams also may serve
as the queuing mechanism at the target to receive and queue data from multiple data streams.
Many organizations have already adopted streaming data for IoT. We list it as transformational
because it is a core component of an event-driven architecture, which in its entirety is considered
transformational.
©2016 IDC #US41841116 10
Examples of technology in this category include but are not limited to messaging software such as
Apache's ActiveMQ, Apache Kafka, MQTT-S, RabbitMQ, and ZeroMQ. Software for posting sensor
events via REST APIs include Google Apigee Link and Red Hat 3Scale. IBM offers Bluemix Message
Hub to connect its IoT platform to IBM's Hadoop Bluemix service.
Pros:
Message queuing technologies offer higher quality of service levels over base transport protocols such as HTTP and MQTT.
Message queueing technologies are not new and as such have a lower level of risk associated with them.
Cons:
HTTP and MQTT methods can result in tightly coupled systems, requiring the source to maintain history in the event of data transmission issues. Sending applications will need to manage potential breaks in network connectivity.
Message queuing services add another layer of complexity into the end-to-end solution deriving additional requirements for monitoring, management, and maintenance.
Message queueing services may insert additional latency into the data transmission.
Streaming Integration
FIGURE 7
Streaming Integration Markers of Momentum
Source: IDC, 2016
Streaming integration technologies are used to provide intermediary functionality between the edge
and central processing facilities. Intermediary functionality may be required to perform protocol
conversion, data normalization, and/or filtering. Streaming integration technology sits between the
collector and the data stream, such as an API gateway into the data stream or a change-data-capture
component listening to a collector's local database. It could also be a component that intercepts
messages from the stream or the target message queue, processes the data, and forwards or puts the
transformed data back into the stream.
©2016 IDC #US41841116 11
Examples of software vendors and products in this category include but are not limited to Apache NiFi,
Hortonworks Dataflow, Informatica PowerCenter Real-Time Edition, Oracle Goldengate,
StreamAnalytix, StreamSets Data Collector, and Striim.
Pros:
Streaming integration is useful if transport and/or data protocol conversion is required
between the edge and the stream.
Streaming integration can also be useful to filter, normalize, and reduce the volume of
data, relieving the pressure on stream bandwidth and central processing capacities.
Much of the functionality is borrowed from existing segments of data and application
integration software markets, so there is low risk associated with the technology.
Cons:
Streaming integration adds more components in the end-to-end solution, resulting in more points of failure that need to be monitored, managed, and maintained, and can increase
latency in the data transmission process.
IoT Data Event Services
Thing Event Store
FIGURE 8
Thing Event Store Markers of Momentum
Source: IDC, 2016
Event stores capture and organize sensor data, adding to the store when new sensor data is delivered.
A key attribute is the creation timestamp of the sensor event. Event stores are also created when
streaming analytics is deployed. The event store can be queried by end users, applications, and time
series–based analytical software. Event stores can also be used to backstream for testing and auditing
purposes. Event stores are offered by some vendors as part of their IoT portfolio and also can be
implemented using an in-memory time series database, data grid, or a general-purpose database that
supports time series.
©2016 IDC #US41841116 12
Examples of IoT-optimized products include GE Digital's Predix Time Series, InfluxData, and Basho's
Riak TS. General-purpose databases that support time series, usually with a special index and logic
that can perform time series analytical functions, include the DataStax version of Cassandra (DSE),
Clusterpoint, and SAP HANA.
Pros:
Is highly efficient for low-latency systems
Is part of an IoT event–driven architecture that simplifies access to the times series of sensor data
Can replace or serve as the data management aspect of a historian to provide similar benefits
Cons:
Not broadly deployed
Increases complexity
Thing Registry and Device Management
FIGURE 9
Thing Registry and Device Management Markers of Momentum
Source: IDC, 2016
A thing registry is a database of IoT. Things are the devices that are part of the IoT network. Each
thing registered has an ID, name, and properties or attributes that are used to connect, collect
information, and manage devices. Information about hardware and firmware version levels, install
date, maintenance dates, and other static information about each thing are typically collected in the
registry. Location of the thing may be static if it is fixed, but devices that are mobile would more likely
have location as part of the thing state model.
Thing registry is core to an IoT program, enabling connectivity, support of applications, and device
monitoring and management. Some organizations also build their own registry using a database. A
graph database, for example, is useful for registering a thing and its relationships.
Device management supports bulk operations related to devices, provides diagnostic information, and
handles device actions, such as delivering and installing updates. While device management is not
technically part of event data services, it is paired with the thing registry and thing state machine and
made sense to keep these together.
©2016 IDC #US41841116 13
Examples of products include Amazon's AWS Thing Registry, Bosch's IoT Things and IoT Remote
Manager, GE Digital's Predix Edge Manager, IBM Watson IoT Platform Foundation Device
Management, Microsoft Azure's IoT Hub Device Identity Registry, and PTC's ThingWorx Foundation.
Pros:
A registry provides a central repository of things connected to the network.
A registry can be used for analytics of lifetime, runtimes, service history, and inventories.
A registry can be used to identify location for MRO.
Cons:
The registry will need to be maintained, and unless the things themselves are providing the data for the attributes in the registry, manual maintenance could become
overwhelming.
Thing State Machine
FIGURE 10
Thing State Machine Markers of Momentum
Source: IDC, 2016
A thing state machine maintains the current status of a thing's sensors. While the thing registry
maintains static information about a thing, the state model maintains the current status of information.
Depending on the complexity, a thing state model may also consist of a series of state models.
Depending on product capabilities, state machines can consist of direct sensor readings as well as
calculated — or derived — state. This derived state may also use analytics to arrive at the state, for
example, scoring the status of a derived property in the state model. Using an event-driven
architecture built around publish-and-subscribe provides a way for multiple thing state models to
subscribe to the same sensor data event, depending on the use case. State models may also be
propagated from edge to cloud and across clouds. Ultimately, state models provide the thing state data
required for custom and packaged IoT-related applications.
Not all IoT platforms have a state model construct and may choose to store state data in a time series
database or a relational database. Depending on the complexity of the use case, enterprises may
choose to build their own state models using NoSQL database technology.
©2016 IDC #US41841116 14
Example of products include Amazon's Device Shadows for AWS IoT, PTC's ThingWorx Thing Model,
and Salesforce's Thunder and IoT Cloud.
Pros:
The thing state model is an important asset in an event-driven architecture and for low-code environments, particularly for application development and where nontechnical subject matter experts (SMEs) are developing condition detection and response logic.
The thing state model makes it easier to distribute sensor data to all systems that need the data, particularly in decentralized systems where the design of the system has multiple
tiers managed by different vendors or products, such as an edge tier or a middle tier for machine-specific use cases or an interaction tier for customer experience–centric use cases, where there is an advantage in splitting up the design based on assets required in
each tier.
Cons:
Not all IoT platforms have this capability and may require internal skills to develop and manage on an ongoing basis.
Not all organization working on IoT projects are structuring around events and may be more comfortable using more familiar databases.
IoT Data Services
Dynamic Data Management
FIGURE 11
Dynamic Data Management Markers of Momentum
Source: IDC, 2016
A dynamic data management system can accept data without requiring that the structure and elements
of the data be defined in advance. These include scalable data collection managers (the most common
being Hadoop) and dynamic DBMSs. Because they do not require the use of SQL, dynamic DBMSs
are sometimes called NoSQL database systems. There are two categories of dynamic DBMS:
Semischematic, where the data may be governed by a schema, but one is not required (Any
data may be entered into the database that conforms to the general data format of the DBMS if no schema is present. If a schema is present, it governs the data and optimizes database operation on that basis.)
Nonschematic, where no schema is required, and any data conforming to the general format of the database may be added
©2016 IDC #US41841116 15
The resulting collection of data may end up being rationalized under a schematic structure (in the case
of semischematic), mapped on the basis of field names and values or simply accessed by means of
key-value pairs. Types of dynamic data management systems include:
Document-oriented database systems: Document-oriented database systems manage data blocks containing fields that are identified according to a generally accepted document format. The two most common such formats are Extensible Markup Language (XML) and JavaScript
Object Notation (JSON). Examples of products include Amazon DynamoDB, Couchbase, IBM Cloudant, and MongoDB.
Key accessible database systems: Key accessible databases are nonschematic and store data in a way that supports random retrieval by key value or retrieval in key-value order. They are not true database management systems because they merely facilitate the storage and
retrieval of data according to certain optimized techniques but do not actually manage the database per se — the applications do that. Examples of products include Amazon SimpleDB, Apache HBase, Basho's Riak, and Oracle NoSQL Database.
This category also includes graph databases and Hadoop, which are covered separately.
Pros:
Faster, more flexible way to manage data, particularly data structures that change rapidly or do not lend themselves to an RDBMS
Low-latency response times
High scalability
Cons:
This technology can't be used for applications that query using SQL.
There are skills gaps compared with SQL-based systems.
Graph Database
FIGURE 12
Graph Database Markers of Momentum
Source: IDC, 2016
Graph DBMS software manages data as graph structures. These contain objects sometimes called
"nodes" or "vertices" with recursive attributed relationships, sometimes called "edges." The attributes
of the objects and relationships are called "properties." Unlike a fully schematic database, the structure
of a graph database is derived from the relationship structure that is found in the instance data.
©2016 IDC #US41841116 16
Graph databases are used to capture and analyze extremely complex relationship instance structures.
For example, a thing registry could logically be built in a graph database to make it easier to show
relationships between things and networks of things as well as data flows. Graph databases are also
used to support some types of machine learning.
Graphs are especially useful for discovering previously unknown or little understood relationships.
These relationships can include those arising from behavioral patterns or coincident patterns of
change. With respect to connected devices, these could be such things as tracking customers through
shopping areas using their cell phone location data and correlating this tracking data with that of others
to find useful patterns.
Another example comes from the automotive industry where new cars are heavily instrumented,
regularly transmitting data about the condition of the engine and various wear on parts of the vehicle.
Combine that with geospatial data, and the geolocation data from vehicles with coincident data about
weather and traffic conditions, and it becomes possible to find patterns of relationships between
engine and drivetrain wear, fuel consumption, and various combinations of weather (hot versus cold
and dry versus wet) and traffic (heavy versus light). These patterns, in turn, may be analyzed to a level
of detail that can better inform maintenance service intervals for specific locales and even future
design changes.
Examples of products include Neo Technology's Neo4j, IBM's Bluemix Graph, Blazegraph, Ontotext
Graph DB, OrientDB, Objectivity's ThingSpan (formerly known as InfiniteGraph), and DataStax's DSE
Graph.
Pros:
Unlike other NoSQL DBMSs, a graph DBMS is driven by instance relationships and so makes analysis of patterns and combinations of relationships relatively easy and fast. Unlike an RDBMS, which requires data to conform to a fixed relationship structure, a
graph database reveals the relationships inherent in the data, with very little preparation ahead of the data load.
Because actions and consequences in a complex system generally result in changes to data relationship patterns, graph databases can help drive machine learning and other AI-related operations.
Cons:
Because graph databases can make no assumptions about relationships and patterns of relationships in the data, preloading query optimization is not possible. This is different from an RDBMS, where the relationship structures are fixed in the schema, so query plans
are typically optimized. This means that the work of graph databases must be focused on situations where relationship pattern discovery is primary; it is not a substitute for anRDBMS. Because of the overhead involved in relationship management, it is also not a
substitute for the relatively simple object-by-object processing of a document-orienteddatabase system (e.g., JSON or XML).
Not all graph databases are good at all graph workloads. Some graph databases do text graphing well but fall down with large volume object graphs. Some graph databases are better for relationship traversal (such as finding all objects with at least a fifth-degree
relationship to a given object), while others are good at statistical patterns based on large numbers of related objects.
©2016 IDC #US41841116 17
This area is still evolving. There is no one standard graph query language (such as SQL for relational), although TinkerPop is emerging as a framework, and Gremlin is its
language. Neo4j offers a language called Cypher. SPARQL is sometimes used for graphs that represent semantic information structures. GraphQL is a graph data access method that uses a RESTful API, though its name would suggest a query language. There are
various efforts under way to develop a common query language.
Hadoop
FIGURE 13
Hadoop Markers of Momentum
Source: IDC, 2016
Apache Hadoop is a cluster-based platform for the ingesting and processing of large volumes of data
using a massively parallel processing (MPP) approach. It exists through a group of closely related
Apache open source projects that provide software to manage the cluster and handle the consolidation
of result data across the cluster and various administrative functions. Closely related to Apache
Hadoop are HDFS, which acts as a cluster-based file system, and HBase, which runs on top of HDFS
and acts as a key-value store (a simple NoSQL database that randomly stores and retrieves blocks of
data based on unique key-value pairs). Also, commonly used in this context is Apache Hive, a facility
for defining the data in HBase for retrieval using standard SQL.
The normal mode of processing data, especially new data, in Hadoop is a programming technique
called MapReduce. For IoT and machine learning cases, MapReduce has fallen out of favor as more
users are turning to the high-speed in-memory processing of Spark, either coding natively or in
conjunction with a query processing layer such as Spark SQL. Apache Spark is described in the data
services section under in-memory data processing.
Hadoop is commonly used in the following ways:
As an initial ingest engine, accepting data as well as ordering, filtering, and formatting it and then delivering a subset for further processing either in HDFS or on another platform
For the one-time or limited frequency analysis of very large amounts of data
For the long-term storage of data that ought to be retained but is accessed only occasionally
As a clearinghouse or transformation platform as data is moved from system to system, sometimes as a substitute or replacement for an extract, transform, and load facility
As a combination of the aforementioned bullets, commonly called a "data lake"
©2016 IDC #US41841116 18
Apache Hadoop may be downloaded and used directly from the Apache website, but this requires
considerable technical expertise and a willingness on the part of the enterprise to act as its own
software tech support organization. Most enterprises choose instead to use a commercial packaged
distribution of Hadoop, which comes with advanced management tools, professional support, and
regular software updates ready to install.
Examples of commercial Hadoop distributions include Cloudera Enterprise, Hortonworks Data
Platform (HDP), and MapR Converged Data Platform (which includes an indexed file system called
MapR-FS and its companion NoSQL DBMS MapR-DB as substitutes for HDFS and HBase). Also, IBM
bundles Hadoop into IBM BigInsights, Oracle bundles it in Oracle Big Data Appliance (OBDA), and
similarly, Microsoft offers HDInsight. Amazon offers an AWS-optimized variant called Elastic
MapReduce (EMR).
Pros:
Is ultimately flexible and scalable; can accept any data of any size because the processing
details depend on code.
Is cost effective as a storage platform for huge amounts of searchable data, which is
particularly useful for IoT long-term storage of sensor event data
Supports IoT discovery and training, which is critical to the ultimate success of IoT projects
but is not part of an IoT platform
Cons:
Hadoop applications must be coded. There is no schema and no optimizer. The user is responsible for the maintenance of the system and must do work that DBMSs normally do,
such as data structure management and access optimization.
This is a batch-oriented system, so real-time processing of streaming data is not possible.
Where streaming data is involved, it is usually a companion to some stream data processing engine, serving as a back-end storage facility for later processing of historical data after the fact.
Hadoop in its native form is not suitable for random data update and so should not be considered for transaction processing.
In-Memory Data Processing
FIGURE 14
In-Memory Data Processing Markers of Momentum
Source: IDC, 2016
©2016 IDC #US41841116 19
In-memory data processing platforms enable large-scale data-centric operations to be carried out
entirely in memory, without reference to storage. This sometimes takes the form of loading the data
from some source (such as a database) into memory and maintaining it there for analytic query
processing. It can also take place by managing the data in memory as a database, using snapshots
and logs, or replication to prevent data loss in case of system failure.
The most common of the former type of in-memory data processing platform is Apache Spark. This
facility is run on a cluster, holds data in memory, and performs MPP-based queries on the data. It is
optimized for speed. Spark is most commonly deployed on a Hadoop cluster, using the HDFS (or
HBase) layer for its storage, but it is also run on top of the wide column database, Apache Cassandra,
and can even run on its own clusters. This last configuration is becoming more and more common on
AWS, where it uses the S3 layer for its storage.
Spark is popular for data operations on large data collections where an outcome is expected
immediately or nearly immediately or to speed up time-consuming analytics training. This contrasts it
with Hadoop MapReduce, which is not typically used for interactive query because of the batch nature
of its processing. Spark is also used to collect streaming data, making it available for nearly immediate
use.
Examples of in-memory data processing include Apache Spark, Apache Flink, Apache Ignite,
Databricks, and GridGrain. In addition to the Hadoop distributors listed in the Hadoop section, there
are many commercial Spark distributions. Databricks is a pure-play Spark distributor.
Pros:
In-memory data processing is much faster than Hadoop MapReduce and is assuming increasing amounts of the latter's workloads.
Spark has a range of other projects and a growing ecosystem around it that are designed to add value and functionality to the basic platform. These include MLib for machine
learning, GraphX for graph support, Spark Streaming for streaming data ingestion, and Spark SQL. There are also examples of using Spark in combination with GPU to speed up model training, particularly for highly complex use cases.
Cons:
Like Hadoop, Spark and similar products require a lot of hand coding to make solutions work.
This category is still evolving. Spark, in particular, is evolving rapidly, and new versions are not always fully compatible with previous versions, which means that some adaptation of applications to successive versions of Spark may be necessary.
©2016 IDC #US41841116 20
In-Memory Relational
FIGURE 15
In-Memory Relational Markers of Momentum
Source: IDC, 2016
The in-memory relational technology is found in memory-optimized RDBMSs (i.e., they are optimized
for the management of data in memory as opposed to in storage). Some of these databases are
designed mainly for transaction processing, some mainly for analytical processing, and some do both.
Typically, the analytic RDBMSs in this category are columnar, and most use a compression technique
that not only saves memory but ensures that the data is organized optimally for query processing by
enabling the use of the entire microprocessor data cache to be used with data test operators (e.g.,
equals, not equals, greater than, and less than). This makes the use of single instruction multiple data
(SIMD) operations possible, greatly increasing processing speed. RDBMSs that mainly process
transactions typically hold the data in rows. Those that handle mixed workloads may hold some data in
rows, some in columns, or in some cases, other formats designed to minimize instructions and
memory access.
Some of these in-memory relational databases can accept streaming data at speed, allowing queries
that include current and previously collected data to execute on a very timely basis. Other databases
are simply designed to process transactions very quickly or support complex queries very quickly. All
of these RDBMSs use various techniques including persistent transaction logging and snapshotting to
ensure recoverability so that data loss is no more a concern with them than with storage-based
RDBMSs.
Examples of in-memory RDBMSs include Altibase, deepSQL, MemSQL, Oracle TimesTen, SAP
HANA, and VoltDB.
Pros:
SQL is the most commonly understood query language in the IT world, and these products are optimized for it.
In-memory relational technology delivers speed with structure in a familiar format.
Cons:
Requires the data to conform to the schema of the database, so it is really only usable where the data is well understood and its format does not change much
Requires systems with large amounts of memory, which could be a cost concern
©2016 IDC #US41841116 21
Open Data Platform
FIGURE 16
Open Data Platform Markers for Momentum
Source: IDC, 2016
In IoT, the open data platform is a combination of technologies, integrated together, that enable the
management of and access to relevant IoT and enterprise data regardless of where it sits and what its
format is. The components of such a platform usually include a data integration engine capable of
dynamic data integration (rather than batch data integration), composite data frameworks for
federation and virtualization, data transports, connectors to established databases, stream processing
for incoming streaming data, and a central processing engine, which increasingly are being housed
within Hadoop.
Examples of the use of such a platform in an IoT context could include such things as an automotive
service center comparing readings from sensors in your vehicle with data regarding recent problems
and service to determine whether a service call is in order, a bank with a smart app on your phone
comparing movements and purchases with your general purchase pattern to determine if there is a risk
that your phone was stolen, and a utility company comparing patterns of electricity usage from smart
meter data with historical patterns to determine if changes in the distribution of power on the grid are
warranted.
The open data platform is synonymous with the term unified data platform. In some cases, streaming
data and streaming integration software are used in conjunction with broad database and data
management capabilities to offer a comprehensive open data platform. Enterprises may also choose to
build their own from the four core major components.
Examples of products that deliver such functionality include Teradata with its Unified Data Architecture
(UDA), Informatica with the Informatica Platform, PluralSoft with a unified data architecture focused on
healthcare, and IBM Watson Data Platform.
Pros:
Ensures data consistency and provides access to data and the ability to find what you need when you need it
Offers an opportunity to provide business context for IoT data through integration of in-motion and at-rest data
©2016 IDC #US41841116 22
Cons:
Open data platforms are early in their development and deployment. Many of these offer a set of capabilities that need to be assembled for technical and business use cases, and assembly may not be trivial. As these platforms become more widely used, standard architectures and
best practices will emerge, but for now, this represents a high-risk component.
Nontrivial assemblies lead to complex monitoring, management, and maintenance.
IoT Value-Added Data Services
Blockchain
FIGURE 17
Blockchain Markers of Momentum
Source: IDC, 2016
Blockchain provides a decentralized chain of trust for transactions against an object. Blockchain
originates from bitcoin, and many of the first applications of blockchain technologies are focused on
financial services: payments, equities, and money transfers. However, blockchain can be applied
beyond financial transaction use cases to provide a chain of trust for any type of transaction against
any type of object — real or virtual. The value of the blockchain is that it can be trusted, and it is
distributed, not centralized, providing full provenance of the data on the chain.
Blockchain in IoT can be used to validate that data being received from a thing is actually from that
thing and not an imposter. Likewise, instructions from a source to update a thing can also use
blockchain for validation. Blockchain can also be used to represent the most recent state of a thing,
potentially as an alternative to the thing registry and state model because the blockchain keeps an
immutable record of the history of the thing, and could represent the current state. Every trusted
application that needs access to data about the thing will have a local copy of the thing's chain. When
new blocks are added, the distributed chain is also updated. However, these are still speculations on
how the technology could be applied. There is a lot of work and innovation yet to happen before the
most appropriate use cases of blockchain in IoT emerge.
Blockchain in IoT is still very much in its infancy, although some vendors are releasing technology
building blocks, including the IBM Watson IoT Platform and Chronicled, which has launched a
Ethereum IoT registry based on blockchain. Stock.it is a start-up at the intersection of blockchain and
IoT applications in the sharing economy.
It is also not clear whether the blockchain technology used in bitcoin will be exactly the same used in
IoT use cases or whether the term will be closer to a term used to represent an ultra-secure method for
©2016 IDC #US41841116 23
guaranteeing decentralized data integrity. For example, Ericsson's data integrity service is based on a
keyless signature infrastructure (KSI) that offers similar guarantees and has similar constructs. GE and
Ericsson offer this in the GE Predix catalog as a blockchain-enabled service, which digitally signs and
verifies data to assure that configurations, firmware, and data have not been compromised in addition
to a blockchain-like chain of custody. This doesn't appear to be based on a bitcoin proof of work.
Pros:
Data in a blockchain is tamper proof and does not exist in a single location, so it cannot be maliciously modified.
There is no single thread of communication that can be intercepted, preventing man-in-the-middle attacks from occurring.
Blockchain technology may enable autonomous functioning of smart things without the need for a centralized authority.
Every participant in the chain has the most recent version of the truth and state of each thing.
Cons:
Latency of transaction validation in a blockchain network is still very high because every
actor needs to agree the transaction is valid.
Integration of blockchain technologies with legacy systems in existing organizations may
be daunting.
There is still a lot of research and development happening with blockchain, making this a
high-risk technology at this point.
Data as a Service
FIGURE 18
Data-as-a-Service Markers of Momentum
Source: IDC, 2016
Data as a service (DaaS) represents the data and/or content that is produced or derived as a by-
product of the usual economic activity in commercial and public sectors. These data assets may be in
form of raw data or as various value-added content such as lists, data feeds, scores, algorithms,
recommendations, or benchmarks. DaaS are consumed to improve various types of analytics, with the
ultimate goal of improving the quality of decisions. DaaS also represents the opportunity for
organizations to sell their own data to third parties — either raw data or as a component that enhances
exiting products and services.
©2016 IDC #US41841116 24
For example, inclusion of weather or location data — two ubiquitous DaaS options — can enhance
predictive asset maintenance or logistics optimization processes. Organizations providing DaaS
include those in commercial enterprises and government agencies that generate the original raw data
and companies that locate, extract, mine, aggregate, enrich, and/or curate data for resale. There is a
broad range of data providers, brokers, and marketplaces.
In IoT, there are a handful of general-purpose DaaS, such as weather and location data, but there are
also many other specialized, industry-specific, and business process–specific data services. Examples
of data services include GE SmartSignal, Michelin solutions, Volkswagen Car-Net, IBM's The Weather
Company, Pirelli, MyJohnDeere.com, and Verizon's Precision Market Insights.
Pros:
As consumers of external (third-party) data, organizations can enhance their analytic models with the availability of more data and augment their things master data.
As producers of data or various derived value-added content, organizations have the opportunity to monetize such data assets either directly (by selling data to third parties) or
indirectly (by incorporating data into other services they provide).
Cons:
Use of external data can create additional challenges in data integration and data integrity management.
Monetizing one's data is a complex task that requires creation of a strategy and specific plans for packing, pricing, and ongoing maintenance and delivery of such data products.
IoT Analytics
Rich Media Analytics
FIGURE 19
Rich Media Analytics Markers of Momentum
Source: IDC, 2016
Rich media analytics solutions identify objects, entities, events, attributes, or patterns of behavior
(including temporal and special events either in real time or post event) through the detection,
determination, and analysis of video and image data. Use cases for these solutions include security,
object identification, video monitoring/tracking, image search, automatic alerting, forensic analysis,
image categorization, pattern, image, and shape recognition.
©2016 IDC #US41841116 25
The amount of rich media data that needs to be analyzed and understood is increasing exponentially
with growth of the internet and mobile devices that capture images and videos on a more or less
constant basis. However, IDC estimates that much of this data is useless unless some type of
analytics is applied to it.
The market and opportunities for image and video analytics is growing significantly. Many
organizations would like to be able to monetize images for ecommerce. In addition, there is increased
interest in automated solutions for video surveillance — of human and nonhuman activity.
Organizations are also looking at using video and images as part of the data needed to understand
and improve customer experiences, along with social media data, geolocation information, and
transactional sales data. Video data and video surveillance are being used in a variety of ways by
many different organizations. Governments and enterprises are primary users of image and video
analytics today.
Companies offering image and video analytics include Hitachi, Fujitsu, NEC, Sony, JustVisual, HPE,
IBM, Clarifai, Cortexica, Ramp, Aventure, IntelliVision, 3VR, Accenture, and ObjectVideo.
Pros:
The exponentially increasing amount of image and video content offers an opportunity to apply rich media analytics technology to extract valuable information and knowledge.
Image and video analytics can add another dimension to text-based knowledge in diverse areas from healthcare to terrorism investigation to Internet of Things.
Image and video analytics can provide real-time feedback and information for cognitive decision making in areas such as robotics, drones, and driverless vehicles.
Cons:
This area is still emerging, and identifying objects, patterns, and visual cues correctly can be prone to errors depending on the algorithms and tools used.
Many of these tools use extensive amounts of machine learning, which is highly processing intensive.
Relating entities and objects from video to textual records and information can be challenging.
Statistical Analysis
FIGURE 20
Statistical Analysis Markers of Momentum
Source: IDC, 2016
©2016 IDC #US41841116 26
Statistical analytics software includes packages that use a range of statistical techniques to create,
test, and execute models on analyzing IoT data. This genre falls into the advanced and predictive
analytics software segment of business intelligence and analytics tools. Sample techniques used
include descriptive and predictive analysis, regression, and clustering.
Statistical analytics is used to discover relationships in data and make predictions that are hidden, not
apparent, or too complex to be extracted, or when there is not enough data for other types of modeling.
An example of use case in IoT would be in predictive maintenance, where analysis of sensor data
would provide predictions on components that will be in imminent need of maintenance.
Most statistical analytics packages use programming languages that might be proprietary or open
source or a combination. Most packages also include a graphical user interface that allows analysts to
interact with the software and build models with no or minimal programming.
Examples of products include SAS Analytics and SAS Enterprise Miner, IBM SPSS, SAP Predictive
Analytics and SAP Infinite Insight, and Oracle Data Mining. SPSS is a component of IBM Watson IoT
Platform. In addition, open source modeling languages like R are commonly used by advanced data
analysts.
Pros:
This technology can be used where large gaps exist in data models or where data models
are incomplete.
Data can be easily imported from excel files or other formats.
A variety of statistical techniques for analyzing data can be used. Most packages allow power users to use programming languages for complex analyses that cannot be done
with graphical user interfaces.
Cons:
These tools can be fairly complex to use. Users need to be sophisticated in concepts of statistics and data mining and programming in order to be able to take full advantage of
the capabilities of these tools.
Statistical packages typically cannot account for all factors that might affect an outcome,
especially those that cannot be expressed as structured data.
These tools are not suitable for data manipulation or data preparation. They assume that
the data is cleansed, validated, and prepared, and hence bad data will result in poor predictions.
©2016 IDC #US41841116 27
Streaming Analytics
FIGURE 21
Streaming Analytics Markers of Momentum
Source: IDC, 2016
Streaming analytics continuously evaluates and correlates events to detect anomalies and conditions
requiring further action. Events are received and correlated one at a time in real time or in
microbatches, and the logic typically involves a time window. A condition model managed within the
event processing engine describes:
The relationship of two or more events to each other
The relationship of two or more data elements within a single event
The relationship of a new event and the math or logic that should be applied to the event
The comparison of a desired state and the current state
A simple correlation example is evaluating the newest vibration sensor data event associated with a
machine to a threshold level while more complex models ingest streams of data from multiple sources
to identify more complex patterns that may be more appropriate for systems of things rather than
individual things.
Models can include calculations of statistical probabilities and use of algorithms, rulesets, or code that
describe a condition. Models can describe the presence or an absence of a pattern as well as time
logic between data events within a time window. Streaming analytics in runtime tends to be publish-
and-subscribe, with nodes listening for the results of a previous node. This software often constructs a
series of nodes that handle individual processing steps, such as ingesting a sensor event and
correlating it to others, and if the node has an output (a derived event), it serves as input to the next
node or any node subscribing to that output. The next node may enrich the derived event by looking up
customer information or thing information, with that output serving as input to the next node, which
may apply a rule or algorithm to determine the next best action.
The term streaming analytics was first used as a subcategory of complex event processing (CEP).
Today, the term is used synonymously with CEP. There are a variety of software products that handle
streaming analytics, both open source and proprietary. Many value-added offerings are extending
Spark and Spark Streaming.
©2016 IDC #US41841116 28
Examples of products in this category include Apache Kafka Streams, Apache Storm, Apache Spark
Streaming, AWS Kinesis, IBM Streams, Microsoft Azure Stream Analytics, Salesforce Thunder, SAS
Event Stream Processing, SQLStream, and TIBCO's Streambase and BusinessEvents.
Pros:
Can be used when requirements call for low-latency detection of conditions, particularly
under high data volume conditions
A central component of event-driven design that is oriented to decision support and
decision automation
Can be used in a compact way at the edge compared with other analytical techniques
Can plug in machine learning as part of a stream, supporting hybrid cognitive/programmatic use cases
Can be used for preprocessing events that need to be correlated before moving to a different analytical environment
Cons:
Some organizations may opt not to use when they have many applications that rely on mature data management systems. Instead, the organizations will improve their data refresh rates and forego event-driven design.
Popular open source–based streaming analytics software is less mature, missing many of the key elements that are present in proprietary streaming analytics.
There is a scarcity of developer skills in use of streaming analytics.
Supervised Learning
FIGURE 22
Supervised Learning Markers of Momentum
Source: IDC, 2016
Supervised machine learning begins with examples of training data paired with identifying labels (e.g.,
right or wrong and positive or negative) selected from the categories to be learned. Using these pairs
of example data and labels ("training data"), the system learns parameters of statistical models that it
can then generalize to unlabeled examples of data items that were not seen in the training data ("test
data"). In most cases, the learned models improve over time via a feedback loop that adjusts the
model parameters to better reflect additional sets of training or production data. The performance of a
learned model can be measured by simple prediction accuracy or by the particular business metric the
learned model is designed to support. Performance depends on the degree to which the training data
matches the real world, the choice of algorithm, the algorithm's parameters, and the quantity of data.
©2016 IDC #US41841116 29
Companies like IBM, IPsoft, Wipro, Intel's Saffron Technologies, Infosys, CognitiveScale, and Tata
Consultancy Services include machine learning capabilities in their cognitive system platforms that
allow developers and enterprises to build cognitively enabled "smart" applications that learn over time.
In addition, vendors such as Google, Amazon, Microsoft, and Skytree offer commercial machine
learning libraries as standalone tools. There are also many free and open source machine learning
packages, including Apache Spark's MLlib, which is designed to make machine learning easy and
useful inside the popular Apache Spark framework for cluster computing. In addition, Microsoft
recently open sourced its distributed machine learning library, DMTK, under an MIT License.
Additional open source software includes Waikato Environment for Knowledge Analysis (Weka) and
Massive Online Analysis (MOA) from the University of Waikato and H2O.
Deep learning is a particular type of supervised machine learning based on neural network algorithms,
which has seen recent commercial success. Google released its second-generation deep learning
library, TensorFlow, to open source. Other open source deep learning libraries include Caffe from the
University of California, Berkeley; Theano from the University of Montreal; and Torch from Idiap, used
extensively by Google and Facebook as well as Weka and H2O.
Pros:
Supervised learning algorithms can learn quickly from examples and self-correct when changing trends are reflected in new sets of labeled data.
Advances in computing power and ever expanding sources of data make advanced algorithms possible.
Lots of investment by vendors and venture capital firms are leading to rapid progress.
Cons:
Finding or creating the required labeled data is costly and difficult.
A wide range of options make vendor selection tricky. It can go from being quite inexpensive with open source to very expensive with large vendor offerings. The less costly options will entail lots of internal resources to make it work.
As advanced as these products have become, there are still challenges in achieving objectives when there are large numbers of variables and interdependencies for a
particular decision.
Subject matter experts are needed to assist with the initial and ongoing review of data
training sets, which may prove costly and time consuming.
The bias-variance trade-off requires balanced learning algorithms based on the amount of
available data and the discernable complexity of the function to be learned to automatically adjust the bias-variance trade-off.
©2016 IDC #US41841116 30
Unsupervised Learning
FIGURE 23
Unsupervised Learning Markers of Momentum
Source: IDC, 2016
Unsupervised machine learning is another variation of machine learning where algorithms detect and
discern attributes and features without the benefit of labeled training data. Some algorithms cluster
data into meaningful groups by finding centers of data density. Other unsupervised algorithms use
dimensionality reduction techniques (like singular value decomposition) to uncover the essential
attributes of the data without requiring a human to define those attributes in advance. This is
particularly useful for "unstructured" data, such as images or text, where an underlying structure can
be automatically inferred, enabling other algorithms to leverage the data.
Unsupervised learning algorithms based on clustering, dimensionality reduction, and neural networks
can be found in most major commercial and open source packages, including Apache Mahout,
Gensim, Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI), and
word2Vec, to name a few. These algorithms are used in clustering challenges with unstructured data
such as image categorization, text analytics, or speech recognition. Companies offering unsupervised
learning capabilities include Numenta, Nervana Systems, Loop AI Labs, Luminoso, Clarifai, H2O.ai,
and MetaMind.
For specific IoT use cases, IBM offers Streaming Analytics Service and Watson Machine Learning
Service, which provides more than 200 algorithms for analytics and detecting in streaming IoT data.
Pros:
Unsupervised machine learning does not need labeled training data, enabling faster implementation.
It requires little or no user intervention.
It provides low cost of ownership from reduced staffing and hardware requirements.
Self-learning systems provide automatic system updates.
Cons:
Unsupervised learning needs a lot of data to develop good models.
Training data needs to be well represented.
Training systems risk being over trained.
©2016 IDC #US41841116 31
IoT Conditions and Actions
Low-Code Rules
FIGURE 24
Low-Code Rules Markers of Momentum
Source: IDC, 2016
A major driver of IoT is the ability to connect sensor data to analytics to detect and predict conditions
that warrant a response. The consequences of this shift from reactive problem solving to predictive
problem anticipation are the large volume of decisions that must be made quickly to determine how to
respond. In predictive systems, decision conditions are similar to Big Data. There are volume, velocity,
and variety consequences that we can almost think of as big decisions.
Rules engines and decision services are not widely adopted in IoT, but there is some recognition that
decision automation and decision support is a needed area of investment operationally as well as in
support of sensor-based customer experience initiatives. Rules software abstracts the conditional,
decision-oriented logic from system and application logic. This software is used to create rules that
assign and route work, standardize how decisions are made, and automate decisions.
While this type of reasoning is typically embedded as code in systems, the abstraction improves the
ability to make changes rapidly as situations change. Subject matter experts become the managers of
decision assets rather than developers who, without rules, make changes to conditional logic
embedded in systems through change requests.
Example of low-code rules software includes IBM Bluemix Business Rules, which includes a recipe to
integrate rules with the Watson IoT Platform. Because rules design requires Eclipse, Business Rules is
more oriented toward developers than subject matter experts but is capable of supporting
sophisticated operational requirements. IBM also offers Real-Time Insights as part of its IoT platform.
Sapiens DECISION is a sophisticated horizontal offering that provides a development environment for
SMEs and can be deployed decentrally and also support centralized on-demand decision service.
Salesforce IoT Cloud is in preview. It provides a consumer-grade experience that allows end users to
identify events and map them to conditions and conditions to actions. Red Hat BRMS is based on the
open source Drools.
Pros:
Improves the efficiency of managing decisions as assets in environments where high volumes of decisions need to be made efficiently and rapidly
Works in conjunction with analytics to provide precision in how to route actions
©2016 IDC #US41841116 32
Speeds up development and change management
Cons:
Low-code rules can be complex to manage decisions as assets.
The traditional, more highly adopted rules engines can be too hard to use for nondevelopers and the newer, easier-to-use rules engines may be too simple for
sophisticated use cases.
Low-Code App Platform
FIGURE 25
Low-Code App Platform Markers of Momentum
Source: IDC, 2016
Low-code application platforms combine development and runtime into a single offering. They typically
consist of graphical modeling environments to describe workflows, data objects, and forms; point-and-
click configurations; and relatively simple scripting. These environments are popular for rapid
development as well as development teams that include both business participants and developers.
In IoT, low-code platforms are useful for automating workflows, for building mobile apps, and for
assigning and managing tasks. Moving into the future, the low-code platforms will be useful for
designing and automating interactions used to manage an event-based customer experience.
IoT-specific examples of products include IBM's Node-RED and PTC's ThingWorx Foundation.
Generalized low-code workflow or mobile app environments include Nintex Workflow Cloud, Alfresco
Activiti, Bonitasoft's Bonita BPM, BP Logix, Appian, and Salesforce Lightning.
Pros:
Useful for application design involving collaboration between process experts and developers
Fast development cycles
Ability to provide short-term situational apps
Cons:
May not offer the control developers need for specific use cases
©2016 IDC #US41841116 33
IoT Visibility
Operational Intelligence
FIGURE 26
Operational Intelligence Markers of Momentum
Source: IDC, 2016
Operational intelligence, continuously or in microbatch, captures operational data in near real time,
correlating the data against relationships within the data streams, key performance indicators (KPIs),
service-level agreements (SLAs), or time series data, and delivers the results into a dashboard, ideally
with a drill path to explore root cause. The goal is to spot an operating condition immediately that can
be fixed within a current time window to improve operating performance. Alerts are an important part of
OI, as well as linking alerts to third-party actions that trigger and manage the response.
The idea behind OI is creating business value by identifying and solving one operational problem at a
time to improve profitability.
OI can be tightly focused on a particular process or subprocess within a domain or can be much
broader and span domains. With IoT, sensors associated with things can be monitored in near real
time and trigger an action when an anomaly condition occurs. Broader use of operational intelligence
can monitor networks of things as well as merging data streams from many sources and correlating
and detecting conditions. A cross-domain example is the use of OI to link multiple wind farm sensor
data to a real-time electricity market data feed, providing the opportunity to dial up or down the
production of electricity in real time based on market demand.
The lines between the different intelligence and analytical techniques blur and combine as required. OI
is not necessarily used to predict but instead spot a problem as early in a process as possible to make
it almost seem like a prediction. OI is also moving toward the use of machine learning.
OI products include Splunk when Splunk Forwarder delivers data continuously or in micro batches,
Sight Machine, Vitria IoT Analytics Platform, business activity monitoring products from Software AG,
IBM, and Oracle.
TIBCO Business Events, Software AG's Apama, Apache's Storm and Kafka Streams, SAS Event
Processing, and other one-event-at-a-time streaming analytics tools can also be used for OI in
conjunction with real-time monitoring solutions. These are included under Streaming Analytics in the
IoT Analytics section.
©2016 IDC #US41841116 34
Pros:
Produces a higher-level business view of sensor-supporting operations
Makes it easier to get started in IoT by identifying low-hanging opportunities where the
problems are straightforward to identify and causes are fairly well known
Shifts from reactively responding to problems to proactively identifying them to speed up
resolution
Cons:
Is supplemental rather than core IoT AIM technology
Does not incorporate advanced analytics
TECHNOLOGY ADOPTION OUTLOOK
IoT is the opposite of traditional AIM technology adoption, which involves moving data in batches and
then normalizing and loading the data into target systems. Analytical software is used once the data is
loaded and at rest, typically to produce reports or statistical analysis that help in decision making or for
on-demand decision automation.
IoT AIM is about sensing and responding within a time window, continuously moving and managing
sensor events, and handling large volumes of data, continuous decision automation, and decision
support using analytics and rules. Data must travel from a sensor to edge collection to central
processing where it is normalized and analyzed against some type of prediction model or algorithm to
determine whether further action is required. Once actions are required, response cycle times vary
substantially, but the end-to-end cycle time must be faster than the time window allotted to derive
business benefit.
Four considerations should dominate IoT AIM technology adoption planning:
What is the total time window available to deliver business value when a condition is identified
that requires a response? Adoption of AIM technology is required when time windows are narrower than the cycle time of the end-to-end IoT system.
How good is the prediction or insight from your analytics software? Quality problems occur for a variety of reasons, but noisy predictions and wrong or nonactionable predictions are all expensive. Using the best approach to analytics for a particular problem requires an
assessment of whether there are data gaps that need to be resolved as well as identifying options and experimenting with them prior to adoption. Different techniques may also be required for different workloads or stages within a workload.
How much technical debt are you accumulating by repurposing existing AIM technology or investing in custom development? In the beginning, it makes sense to keep costs low by
leveraging existing AIM technology for an IoT project. But technical debt rapidly accumulates when existing technology doesn't really align with needs and has to be customized or contorted on an ongoing basis to make it work. As IoT initiatives are operationalized, the use
of purpose-built tools is almost always a better path once those tools reach the required level of sophistication.
How do technology choices align with your enterprise's adoption risk profiles? Different organizations have different approaches to risk. When it becomes clear that there is a need to add new functionality or replace nonperforming existing technology, the selection has to align
with the skills of the team implementing and using the technology. We assess the adoption risk
©2016 IDC #US41841116 35
and speed of adoption for each of the 25 technologies highlighted in this IDC TechScape. Planning should take both of those factors into account. If a new technology identified in
Figure 1 has a higher risk than is acceptable to your organization but has a fast rate of market adoption, it is important to begin planning and acquiring skills sooner than later for eventual adoption.
LEARN MORE
Related Research
IDC's Worldwide Software Taxonomy, 2016 (IDC #US41572216, July 2016)
Internet of Things Analytics and Information Management Software Taxonomy, 2016 (IDC #US40708515, December 2015)
IDC TechScape Methodology
Unlike other technology assessment frameworks, the IDC TechScape provides a visual representation
of the process of technology adoption, dividing technologies into three major categories based on their
impact on the organization and assessing their relative maturity within their respective categories. The
study examines particular individual categories and provides additional insights about the speed of
adoption, technology potential for success (risk), and industry hype. Refer back to Figure 1 for the IDC
TechScape for Internet of Things analytics and information management.
The IDC TechScape is a tool for strategic planning and tactical decision making for technology
professionals in IT buyer organizations. This audience may include CIOs and senior technology
professionals, strategists, and IT buyers from IT or from lines of business.
The document's two functions:
Strategic planning tool:
Offers a view into where a technology exists in its overall adoption life cycle. Generally, technologies in the early stages of evaluation and deployment are riskier investments thanthose further along in the adoption life cycle as they are deployed more broadly.
Sorts technologies into three categories that may help organizations make judgements about which technologies might provide the greatest positive impact on their organization.
IT strategists can use this information to prioritize interest in a technology or group of technologies.
Tactical decision-making tool: Because it lays out where a technology exists within its overall adoption life cycle, and a certain level of associated risk may be inferred, an organization can use the IDC TechScape to determine whether or not it should immediately adopt a particular
technology or should wait until the risk of adoption is less.
©2016 IDC #US41841116 36
IDC TechScape Categories and Definitions: Transformational, Incremental,and Opportunistic
Executives use the IDC TechScape model to:
Inform technology adoption decisions based on organizational appetite for risk and potential for transformational change
Support a decision on when a technology or group of technologies might be ready for adoption, given the purchasing organization's preferred appetite for risk — whether or not an organization should immediately adopt a particular technology or wait until the risk of adoption
decreases
The three types of adoption curves in an IDC TechScape are:
Transformational. These technologies will completely reshape markets and investment
strategies. They may create new business and/or market opportunities and lead to new enterprise and consumer capabilities. They may differ significantly from current technologies and may have mostly unrecognized market impacts/opportunities. Transformational technologies
have already demonstrated that they fundamentally change current best practices.
Incremental. This new generation of technology measurably improves on an existing category
of technologies to deliver better business outcomes. In terms of business processes, technologies deliver small but measurable improvement over current best practices.
Opportunistic. These technologies will grow based on specific use cases, and they have an undetermined or limited capability to improve existing technologies/processes. Their potential changes currently lack a clear impact on current best practices.
Synopsis
Over time, analytics and information management (AIM) technology adopted for IoT will be different
from an organization's existing technology investments that performs a similar, but less time-sensitive
or data volume–intensive function. Enterprises will want to leverage as much of their existing AIM
investments as possible, especially initially, but will want to adopt IoT-aligned technology as they
operationalize and identify functionality gaps in how data is moved and managed, how analytics are
applied, and how actions are defined and triggered at the moment of insight. This IDC TechScape
covering IoT AIM is designed to help:
Enterprises learn more about the newer AIM technologies that support IoT
Align these technologies with an enterprise's technology risk profile to determine what is ready to adopt and what should be monitored
Gain a better understanding of where an IoT team will need to create skills and competencies as it plans to adopt newer AIM technologies
According to Maureen Fleming, vice president for IDC's IoT Analytics and Information Management
research program, "Implementing the analytics and information management tier of an IoT initiative is
about the delivery and processing of sensor data, the insights that can be derived from that data and,
at the moment of insight, initiating actions that should then be taken to respond as rapidly as possible.
To achieve value, insight to action must fall within a useful time window. That means the IoT AIM tier
needs to be designed for the shortest time window of IoT workloads running through the end-to-end
system. It is also critical that the correct type of analytics is used to arrive at the insight."
About IDC
International Data Corporation (IDC) is the premier global provider of market intelligence, advisory
services, and events for the information technology, telecommunications and consumer technology
markets. IDC helps IT professionals, business executives, and the investment community make fact-
based decisions on technology purchases and business strategy. More than 1,100 IDC analysts
provide global, regional, and local expertise on technology and industry opportunities and trends in
over 110 countries worldwide. For 50 years, IDC has provided strategic insights to help our clients
achieve their key business objectives. IDC is a subsidiary of IDG, the world's leading technology
media, research, and events company.
Global Headquarters
5 Speen Street
Framingham, MA 01701
USA
508.872.8200
Twitter: @IDC
idc-community.com
www.idc.com
Copyright and Trademark Notice
This IDC research document was published as part of an IDC continuous intelligence service, providing written
research, analyst interactions, telebriefings, and conferences. Visit www.idc.com to learn more about IDC
subscription and consulting services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please
contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or [email protected] for information on
applying the price of this document toward the purchase of an IDC service or for information on additional copies
or web rights. IDC and TechScape are trademarks of International Data Group, Inc. IDC TechScape is a
registered trademark of International Data Corporation, Ltd. in Japan.
Copyright 2016 IDC. Reproduction is forbidden unless authorized. All rights reserved.