building smart iot devices with automl - sensiml...building ai on extreme edge devices (iot smart...

Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design Rev. 1.11 – February 7, 2020

SensiML Toolkit © Copyright 2020 by SensiML Corp.

Please visit the SensiML website (https://www.sensiml.com) for more information.

No part of this manual may be photocopied or reproduced in any form without prior written consent from SensiML Corp. SensiML

and the SensiML logo are trademarks of SensiML. Other product or brand names are trademarks or registered trademarks of their

respective holders

https://www.sensiml.com/

Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 2

Table of Contents

AI Moves to the Extreme IoT Edge .................................................................................. 5

A Smarter AI Pipeline ................................................................................................................................................ 6

Smart Edge AI Tools .................................................................................................................................................. 6

About this Guide ................................................................................................................ 9

How the Smart Edge AI Approach Works ..................................................................... 10

Smart Sensors ........................................................................................................................................................... 10

AutoML ........................................................................................................................................................................ 10

Data-Based Training ............................................................................................................................................... 11

Data Science Engine ............................................................................................................................................... 11

Good Data Collection ............................................................................................................................................. 11

Optimized Coding ................................................................................................................................................... 13

The Key Stages of Smart Edge AI Process .................................................................... 14

Model/Hypothesis Development ...................................................................................................................... 15

IoT Device Prototype (Physical Design Considerations) ........................................................................... 16

Sensor Selection ....................................................................................................................................................... 16

Sensor Configuration ............................................................................................................................................. 16

Sensor Data Collection .......................................................................................................................................... 17

Data Labeling ............................................................................................................................................................ 17

ML Algorithm Development ................................................................................................................................ 17

Optimized Endpoint Code.................................................................................................................................... 18

Local IoT Device Insight (Test/Validation of Local IoT Model) ............................................................... 18

Developing Your Application Model ............................................................................. 18

Constructing Your Hypothesis ............................................................................................................................ 19

Defining Your Insights ........................................................................................................................................... 19

Classifier Model/Class Mapping ........................................................................................................................ 20

Prototype IoT Device ....................................................................................................... 23

Sensor Selection ............................................................................................................... 23


Types of Sensors ...................................................................................................................................................... 23

Virtual Sensors .......................................................................................................................................................... 24

Sensor Interfacing ................................................................................................................................................... 24

Physical Sensor Placement ................................................................................................................................... 25

Sensor Configuration ...................................................................................................... 26

Analog Noise Suppression ................................................................................................................................... 26

Signal Conditioning ................................................................................................................................................ 28

Sampling Rate and Recording Length ............................................................................................................. 28

Sensor Data Collection .................................................................................................... 29

Understanding Data Inputs ................................................................................................................................. 30

Sources of Variance ................................................................................................................................................ 31

False Positives ........................................................................................................................................................... 33

Population Diversity ............................................................................................................................................... 33

Subject Sample Size and Dataset Sufficiency ............................................................................................... 34

Phasing Data Collection ........................................................................................................................................ 36

Documenting Methodology ................................................................................................................................ 36

Data Labeling ................................................................................................................... 36

Enumerating Relevant Metadata Annotation ............................................................................................... 36

Defining Data Labeling Methodology ............................................................................................................. 37

ML Algorithm Development ........................................................................................... 37

Defining Model Appropriateness ...................................................................................................................... 38

Accuracy ...................................................................................................................................................................... 39

Specificity .................................................................................................................................................................... 39

Sensitivity .................................................................................................................................................................... 39

Precision ...................................................................................................................................................................... 40

F1 Score ....................................................................................................................................................................... 40

Performance Measures for Multi-class Datasets ......................................................................................... 40

Confusion Matrices ................................................................................................................................................. 40

Overfitting/Underfitting ........................................................................................................................................ 41

Data Splitting: Train versus Test Data .............................................................................................................. 43


Interpreting ML Performance ............................................................................................................................. 43

Converting an Algorithm to Optimized Endpoint Code ............................................. 44

Test/Validation of Local IoT Device Insight.................................................................. 45

Sample Bias ................................................................................................................................................................ 45

Lifelong Learning and Iterative Model Updates .......................................................................................... 46

Conclusion ........................................................................................................................ 47

Appendix – Smart Edge AI Test Plan Template (Example) ......................................... 48

References ........................................................................................................................ 50


AI Moves to the Extreme IoT Edge

AI holds great promise for IoT device developers seeking to build intelligence into network edge

embedded sensing products. But until just recently it has remained beyond reach for most

development teams. AI development tools until now have been primarily offered as cloud based

solutions because of two key challenges: First, a lack of low-cost hardware capable of running

complex AI processing on the embedded edge device, and second, the complexity and know-

how required to implement AI algorithms on IoT devices within existing development tools has

been prohibitive.

Thus, a centralized cloud-based “big data” approach persists, wherein sensors are largely dumb

devices with little or no local data processing capabilities. Large volumes of raw sensor data are

sent via (hopefully) high-bandwidth networks to cloud-based systems for processing by server

executed algorithms. These cloud centric systems use traditional AI frameworks such as Google

TensorFlow, Caffee, Apache Spark, and others to generate and manage AI insights for

applications involving these originating sensor devices. These systems required complex manual

interaction and data science expertise to execute. They also result in centralized cloud

applications that face inherent network issues of latency and bandwidth demands that challenge

large deployments of connected endpoint device networks. The net result is many missed

opportunities to realize the true benefits of connected intelligent IoT applications. Beyond a few

high-volume IoT products where the investment in time and effort to hand-code algorithms can

be rationalized, most applications make do with more limited sensor insights and/or the time

and expense of shipping raw data for centralized processing.

Enter the newest generation of AI development tools designed specifically for IoT developers.

Such tools enable the implementation of learning AI and more sophisticated sensor algorithms

running directly on edge embedded sensing devices. Some of these tools also automated the

process to a large extent to allow use by developers without extensive data science and

algorithm coding firmware expertise. In other words, the same benefits AI-based algorithms

have brought to cloud-centric big data analytics are now possible to implement directly on the

originating IoT sensing node. For the first time, IoT developers can get real-time responsiveness,

adaptive smart devices, network efficiency and resiliency, and security and data privacy that

comes with localized data processing domains.

Furthering this breakthrough, recent advancements in embedded hardware and AI algorithm

automation tools offer a new frontier for competitive differentiation of IoT devices. The

combination of these new machine learning (ML) advancements with low cost sensors and

microcontrollers empowers IoT developers with more modest budgets and team sizes to create

their own ”smart sensors” quickly and easily. Now IoT domain experts can create complex ML

algorithms by simply training the ML algorithm with datasets to define the insights they want.


A Smarter AI Pipeline

The AI pipeline used to generate insightful algorithms without explicit coding refers to the entire

process for teaching a device using real-world data. This includes data input through to the

algorithm execution for insight output itself. Gaining such insight at the IoT device traditionally

required writing algorithms by hand to fit such devices. The notion of running cloud-based deep

learning algorithms on a microcontroller seemed a bridge too far and thus edge IoT was limited

to those things that could be implemented by skilled teams hand-tuning specific application

code for such devices.

Building AI on extreme edge devices (IoT smart home, wearables, industrial IoT sensor nodes,

remote 5G sensors to name a few) using this newest generation of tools allows the entire

algorithm development process to be dramatically streamlined and simplified. The “magic” of

this approach comes ironically from the application of AI into the very process of creating AI

itself by automating data science expertise and its associated manual coding. Such automated

machine learning workflows are known appropriately as AutoML. Combining this with the

unique ability to generate code specifically for the smallest footprint edge devices (the extreme

edge let’s say), yields new development tools for IoT OEMs we refer to as Smart Edge AI Tools.

The hardware enablers for this new Smart Edge AI Tools approach for IoT devices are the myriad

low-cost, high-performance microcontrollers that enable low-cost sensors to be applied to

millions of applications and locations previously impractical to monitor. This revolution is a

classic example of Moore’s Law of hardware advancing faster than software. While hardware has

been around several years capable of AI at the edge, it is only very recently that AI development

software has caught up for harnessing the new capabilities of IoT hardware. Now Smart Edge AI

tools are transforming IoT endpoints from merely dumb data collector nodes feeding centrally

processed cloud AI into contributors of distributed network analytics with the collective power

of many truly smart IoT sensor nodes. Local sensors at the endpoints can drive new useful

decisions in real-time from local processing of sensor data instead of cloud-based central

processing.

Smart Edge AI Tools

At the heart of this new data driven AI approach is a reduction of human intervention, skill sets,

and long lead times needed for developing ML algorithms using traditional hand-coding.

AutoML algorithm development often can deliver equivalent or better results as systems

involving human intervention by data scientists utilizing conventional coding methods. Such

tools automatically traverse the hundreds of thousands of modeling options to converge on

solutions that meet or exceed defined constraints set by the user.


Figure 1 – Traditional coding process vs. the new Smart Edge AI approach

The best of the Smart Edge AI tools incorporate automation across multiple aspects of the

endpoint AI algorithm pipeline to reduce the manual stages needed for data collection to model

cost/performance evaluation. The more advanced tools automate data collection, cleansing,

feature selection, labeling, event detection, classifier algorithm selection, hyperparameter

optimization, and tuning. As you evaluate tools for AI model creation, you should consider how

much of the process a given Smart Edge AI tool covers and what remains for the user to

perform themselves.

SensiML Toolkit includes industry-leading levels of AutoML automation across the

endpoint sensor AI algorithm generation pipeline. Automation includes data collection

and multi-user aggregation of samples, metadata and label annotation, repetitive

segment label prediction, segmenter algorithm creation, feature engineering, classifier

algorithm selection, classifier hyperparameter tuning, and test/validation stages.

A Real-World Example…

To illustrate the power of distributed analytics using Smart Edge AI Tools versus centralized

cloud sensor processing, let’s consider an example application for intelligent agricultural

livestock monitoring. Similar benefits can be applied across virtually all connected IoT

applications, but we’ll use this one to illustrate the typical differences.

In this example application we consider the benefits of deploying smart animal wearable devices

that can be affixed to each cow in a rancher or farmer’s herd to continuously monitor for


notable health indicators important to the farmer. Our example herd of cattle are fortunate as

they are free-range, grass-fed cattle allowed to roam the pasture rather than being confined to

feed pens. While this promotes herd health and happiness, it presents challenges in that the

cattle are remote and distributed across a large area.

Each monitored animal in our example smart farming application is equipped with an array of

sensors integrated into a compact worn device that can measure animal temperature, motion in

six axes (X,Y,Z acceleration and rotation using a MEMS IMU sensor), audio (using a digital

microphone), humidity, environmental temperature and humidity, and location (using GPS). This

allows the farmhands to potentially monitor the herd constantly and attend to sickness or needs

quickly.

Now let us know consider two means for implementing this application: A centralized or ‘edge’

gateway processing approach and a truly distributed approach using Smart Edge AI Tool

generated algorithms.

The centralized sensor analytics approach requires a constant streaming of updated sensor data

to be sent wirelessly to a processing node that might be in the cloud or might reside on a

centralized gateway or server on the farm. Either way, the application developer must choose

update frequency, data fidelity, and sensor breadth based on the realities of network bandwidth

and coverage, and volume of raw data needed for the analysis. For centralized analytics, this

may mean that the only data continuously monitored are temperature, humidity, and GPS fix as

these are each small datastreams that can be carried over limited cellular networks in remote

areas.

With this compromise, the developer may only be able to offer infrequent sampling of more

complex audio and motion data when the cows are brought into the milking parlor and Wi-Fi

coverage. Alternatively, the centralized analytics solution may demand many wireless access

points be deployed across the farm (at high cost) to provide the level of performance across the

ranch for continuous monitoring of all sensor data.

No matter the network topology, the transmission of continuous complex datastreams means

much higher power consumption as radio transmit power can reach 100mw and total power

consumed by the radio several times as much. Compare this to processing where low-power

microcontrollers consume on the order of 100uA/MHz and you begin to recognize the power

saving benefits of local processing and small datastream transmission versus small processing

and large datastream transmission. If the farmer has to make the rounds to replace smart

sensor batteries every month, much of the labor-saving benefit of remote monitoring from an

IoT device is negated.

Now consider the distributed analytics approach. With the ability to practically deploy

algorithms for complex pattern recognition, the smart livestock developer can now offer a

practical solution for gait monitoring, health assessment from motion and animal utterances,

and estrus monitoring (when cows are in heat for mating) based on complex waveforms of


motion and sound impractical to continuously transmit and much less valuable if only

intermittently polled by gateway/server edge processing. Because processing can be done

locally at several orders of magnitude less power consumption than high bandwidth transmitted

raw data, battery life is substantially improved as well. Taken together, the benefits of Smart

Edge AI are substantial and those who implement such distributed IoT analytics will find

themselves possessing much more competitive solutions than centralized alternative offerings.

About this Guide

This guide is for IoT device developers who recognize the benefits of distributed smart IoT

device analytics and are looking to harness this advantage. Doing so quickly and efficiently

means building algorithms capable of running at the endpoint sensor node. For most, this

means looking to the new Smart Edge AI tools and the data driven process of training

algorithms versus writing tops-down code. As both tools and process are unique and new, this

guide walks through the various considerations and seeks to arm the reader with the knowledge

to be successful in this new AI based development methodology. It provides IoT developers

with real world, practical advice for implementing this new approach from sensor selection and

data capture to generating local insights.

The objective of this guide is to teach you how to harness the new Smart Edge AI approach for

planning, designing, and executing your own intelligent sensor products. It focuses on helping

you understand the best methodologies to generate optimal results from this new data driven

AI process. The key to success in this new approach is planning out your implementation

process upfront to ensure the desired results and avoid the common pitfalls of bad training data

that leads to poor performing algorithms.

This guide takes you through the stages needed for mastering this data-driven process to

capture, label, organize, and analyze data so that it can be used to gain insights at the endpoint

device. We walk you through the key upfront considerations that go into building high quality,

accurate, and efficient ML algorithms, including the general principles of ML model design, data

collection methodology and implementing ML based algorithms for embedded IoT sensing

devices.

This guide is brought to you by SensiML, a leading provider of Smart Edge AI developer tools for rapidly creating embedded endpoint AI algorithms. Throughout the guide we will highlight specific benefits of the SensiML Toolkit in boxed text like this, but this guide covers general principles of design, planning and implementing AI based algorithms at the extreme edge regardless of your choice of software tools.

https://sensiml.com/


How the Smart Edge AI Approach Works

The Smart Edge AI process enables an intelligent IoT sensor to take raw physical sensor data and

transform it in real time to provide local insights directly at the endpoint. These insights

generated at the IoT device also enable you create a foundation for hierarchical data analytics

by providing the front-end processing of physical sensor inputs feeding network

communications and higher-level, cloud-based processing. By unburdening the network and

downstream computing resources from the real-time signals processing effort, you can provide

an optimal distributed AI system with far less network throughput required, much lower latency,

processing autonomy at the various stages of analysis, and the ability to act in real-time to

critical sensor events. Providing such an automated means for intelligent endpoint coding,

distributed analytics becomes much more practical for the “long tail” of specialized sensor

applications across industrial and consumer IoT sectors.

Smart Sensors

An intelligent IoT sensor device (like a fitness wearable, an industrial smart sensor, a smart pet

collar, or elderly fall detection wristband, etc.) includes one or more integrated physical sensors,

a microcontroller, and a means of communicating information to other parts of the system. The

physical sensors combined with intelligent sensor processing algorithms running in some

portion of the microcontroller constitute a smart sensor device. The algorithm code runs in the

microcontroller taking physical sensor data and converting it to specific insights right on the IoT

device.

AutoML

The data driven approach to building ML algorithms uses a wide range of statistical modeling,

machine learning, or deep learning methods that are automatically chosen by the ML generator

to best match your specific sensor data. AutoML does the algorithm building for you based on

you providing training data from your IoT device sensors. Free from the complexity of manually

building ML algorithms, you no longer need to focus on manual ML algorithm generation. The

process of deploying an endpoint AI solution for this new process is one of interaction between

sensor pre-processing choices, training/testing data collection and labeling processes to

generate accurate ML algorithms based on these inputs. In simpler terms, AutoML converts the

algorithm development process of codifying algorithm expertise for a particular device into one

of ”teaching by example” and then using those example data to train the AutoML tool to code-

gen the right algorithm on your behalf.


Data-Based Training

ML uses statistical techniques to enable programs to learn through training, rather than being

programmed with rules or explicit programming in a language like C or C++. Such

programming rules are not only technically challenging to construct, but also get increasingly

”brittle” as more and more code is added to address the inevitable corner case conditions

typical in real-world sensor algorithms.

The Smart Edge AI tools process training data to provide prediction of future data based on

established models that were tuned to “ground-truth” data during training that can be

generalized for new as-yet-unseen instances. Corner cases for ML are handled simply as

additional data points used to cover the expected range of variability expected. While the model

still grows in complexity as the dataset demands additional dimensions, branches, or neurons to

characterize the variance observed, the underlying model remains the same.

ML enables devices to contextualize their immediate environments far better using data such as

vision, sound, heat, and vibration. ML systems can process training data over time to

progressively improve performance on a task, providing results that improve with experience.

Once an ML system is trained, it can analyze new data and categorize it in the context of the

training data. This is known as inference and in the case of Smart Edge AI, can be performed

locally on the device where in many cases the output decisions matter most.1

The SensiML Toolkit software, using local inferencing, allows developers to create their

own application specific virtual sensors where most of the raw physical sensor pre-

processing occurs on the virtual sensor device Microcontroller Unit (MCU) and what gets

shipped to the cloud is the classification data they actually care about.

Data Science Engine

Building data science decision making and processing techniques into AI development tools is

the difference between AutoML tools versus machine learning frameworks that provide the

algorithms but still rely upon the expertise of the user to correctly apply them. This data science

expertise built into the software takes away the need for human data science and coding

expertise out of the workflow and shifts it from IP embodiment in logic-based algorithms to

application knowledge embodied in datasets themselves. Instead the system continues to

develop intelligence as the datasets grow over time and benefit from automated and iterative AI

approaches to embedded code generation.

Good Data Collection

As you might guess, shifting the challenge from understanding ML techniques and algorithms to

presenting AutoML tools with representative train/test datasets makes data collection the critical


effort for these tools. Sensor data collection and labeling is about providing better training sets

to your ML algorithms. Using the data driven approach means you need to place high emphasis

on pre-planning for what matters in your application and how to properly collect the data to

support it. You collect actual real-world data as would normally be seen by the application

sensor(s).

One of the first questions to ask and resolve in data collection planning is the feasibility for

collecting and labeling the desired model data in the first place. In most cases it’s possible to

establish conditions to capture examples of each of the desired states or inputs that the smart

device should be capable of recognizing. Where the application supports this, data collection

becomes a process of capturing a statistically significant number of labeled training data

examples for each state of interest. An example would be capturing multiple instances of

different hand gestures to be recognized from motion data of a wearable device. For this

application, it’s relatively easy to perform each such gesture a multitude of times, with examples

from different users, left-handed people, right-handed people, and construct the desired

training dataset.

In other instances, real world data collection for each desired insight state may NOT be practical.

Think about re-creating all of the mechanical failure modes for an expensive piece of factory

machinery. In this example, the purpose of the smart device is often to create predictive

maintenance algorithms that can warn in advance of such failures given their cost to repair and

downtime to machinery. Thus, it would be of little value to induce such fault states purposefully

just for the sake of collecting example training data for a preventative smart device.

In such cases where real-world data collection is not practical, other techniques can be

employed to overcome this obstacle. In some cases, you can create simulation data to

approximate a fault state. In others, it may be preferable to simplify the initial model to normal

or expected behavior versus anomalous behavior. In this case, the expectation is that additional

data will be collected in use which can be used to further train the algorithm when such faults

are observed in normal use. From the outset, the model can provide anomaly detection to

minimally inform when machine behavior is out of expectation.

Regardless of which above method is chosen, haphazard data collection will undermine ML

algorithm development time and quality just as poor programming can add time spent in QA

and debug sorting through logical errors and software bugs. The traditional AI adage that more

data is better in ML should really be stated as more good data is better than volumes of

unfiltered raw data. Faulty data, incomplete data, and poorly labeled data injected into training

datasets leads poor results. Faulty and poor annotated data can also be hard to rectify after the

fact without proper means to discriminate the faulty data from quality data that preceded it. The

time to think carefully about this is BEFORE data collection begins not afterwards. The price to

be paid can otherwise often be the need to recollect data again from scratch if the test

approach, metadata collection, and labeling cannot be salvaged.


Managing data collection and labeling for ML at the endpoint requires planned, disciplined

methodology as the basis for building optimized algorithms. Without a solid upfront design and

plan, your team can end up spending a large fraction of their time fixing preventable data

quality issues as they surmount this learning curve. Often such issues are discovered late in the

schedule after the initial dataset leads to poor model performance, extended timelines, added

project risk, and disillusionment for AI based approaches.

Over time datasets can be extended to cover a broader population, more classification types,

and more negative cases. It’s just as important as what sensor patterns should trigger a

meaningful event as are what patterns of sensor data should not trigger an event. For example,

a glass breakage sensor for a burglar alarm system should certainly be sensitive to detect a true

window breakage, but should be discriminant enough to disregard car noises, kids playing,

people coughing or sneezing, and myriad other sounds that might trigger a false alarm in a

poorly constructed model. Often, developers will overlook the importance of negative testing in

this way, so be sure to give thought about your model sensitivity and its ability to reject false

positives by providing sufficient negative test data.

Optimized Coding

This step in the ML pipeline involves transforming the algorithm into code that can be run

optimally on your target hardware. Because of the inherent limitations of storage and low power

sensor microcontrollers, converting an algorithm into a power efficient code for the smart

sensor requires optimization. Using a Smart Edge AI tool will create the right implementation of

embedded code delivery and assurance automatically. The process of converting an ML

algorithm into efficient embedded executable code utilizes feature and classifier libraries already

hand-tuned for low-power resource constrained microcontrollers and can instantly generate

corresponding library or binary code formats for your chosen ML algorithm and target platform.

Even the best staffed data science development teams can struggle or waste time pursuing

algorithms that may work well but are not appropriate for the level of computing resources

available on endpoint microcontrollers and IoT devices. This is a common pitfall among users

who immediately assume they need to employ deep learning or artificial neural networks

(ANNs) as their algorithm of choice simply because this is considered state of the art rather than

contemplating suitability for their intended usage. ANNs are very powerful tools indeed and for

many applications like processing of image data for object recognition are the only viable

method. But a great many other applications can be better served by using feature pre-

processing and classic machine learning approaches at far less computational cost.

The proper fitting of ML algorithm to application can mean the difference between an IoT

device that is overpriced to its target market because a complex routine required an application

processor rather than a microcontroller. Or a mobile IoT device that has substandard battery life

because it demands excessive computational power and/or memory access versus a more suited


algorithm. The beauty of AutoML tools with rich libraries of ML algorithms and hardware specific

profiling capability is the ability to choose the BEST algorithm for a given task.

The SensiML Toolkit software supports a broad array of ML algorithms spanning simple

binary classifiers, to decision trees, hierarchical and ensemble models, regression models,

and neural networks.

The Key Stages of Smart Edge AI Process

The building of an endpoint AI solution for your IoT device is a process that includes stages

shown in Figure 2 below. Understanding this workflow enables you to define and execute your

sensor data collection to produce optimal training data for your IoT application requirements.

This diagram also shows how these stages fit into the broader scope of distributed cloud and

edge AI and analytics for a connected IoT device network application.

The overall process shown in the diagram describes your specific application model as built on a

data driven ML approach. You define how various sensor data and metadata can be collected,

labeled and curated to form your custom predictive model.

We will describe below each of these steps in brief and then expand upon them later in the

document along with key insights and pitfalls throughout.


Figure 2 - The stages of the Smart Edge AI pipeline

Model/Hypothesis Development

The overarching aspect of developing a quality application model is that it requires a solid

understanding of the cause and effect aspects of the conditions you are attempting to detect

and the sensor data that can reliably and practically be used to inference those states. A domain

expert typically has the intuitive understanding of a given application to formulate a good

working hypothesis prior to collecting and analyzing a great deal of data for correlation, making

them an invaluable asset in the process upfront and throughout the process. Armed with such

insight, the domain expert can then put their hypothesis to test by collecting a small initial

dataset, confirming the hypothesis, refining or reformulating the test collection process as

required based on this initial collection. Only then does it usually make sense to expend the

effort collecting a larger dataset to make the model robust for corner cases prior to putting the

code into executable form. The dataset is run through data science algorithm optimization and

search to arrive at a working algorithm manifested in code on a device.

Thus, think of your application model as an experiment. For a data driven ML model, the

application starts as a hypothesis for how various sensor data and metadata can be collected

and labeled for building a predictive model. Hypothesis is the formation of a working theory for

how available physical sensors can be used to determine a desired set of classes for a given

application and is then confirmed incrementally to reduce project risk. As part of your model


development process, you will need to define roles between domain expert(s), data collection

personnel, and data managers to execute these experiments as efficiently as possible.

IoT Device Prototype (Physical Design Considerations)

Constructing a prototype IoT device is literally building a physical prototype device that can be

used during development to collect sensor data in the intended application. As such, it should

as closely as possible approximate the intended final product from the sensor standpoint so that

the data collected during development will not differ appreciably from that from the final

product. Devising the data collection plan for development is a concurrent task that falls within

this prototyping phase of development. Careful thought should be given to maintaining fidelity

of sensor data as it will be in the end product even if form factor, processor, or interconnects in

the prototype may differ. If interconnects will involve analog interfacing of raw sensors to the

prototype, ensure that differences in sampling rate, noise, and bit depth will not corrupt your

training data for the final product. In short, at the same time you’re formulating your design of

the physical product and prototypes you should also be formulating your plan for data

collection and labeling for the intended application and AI algorithm.

Sensor Selection

Sensor selection is about choosing the right type, placement, and mounting of physical sensors

for your application. Most applications involve physical processes that can be measured either

directly or indirectly by one or more means. For instance, an application to detect failing motor

bearings might choose to measure bearing temperature using a thermocouple or IR sensor,

bearing noise from a microphone, bearing vibration from a piezo sensor or MEMS

accelerometer, bearing drag from torque sensors or motor current, or some combination of

these. Each can be correlated with bearing wear but will have different response characteristics,

signal-to-noise ratios, and other limiting constraints that lend one approach to be selected over

others for a given application.

Sensor Configuration

Sensor configuration encompasses a number of factors aimed at maximizing the signal inputs

into the ML algorithm from the selected physical sensor(s) to ensure optimal performance. This

step includes considerations such as sample rate, signal gain or amplification, noise suppression,

filtering or signal conditioning, and analog to digital conversion. Even the best sensor can easily

be undermined by improper sensor configuration and signal conditioning leading to a poor

signal-to-noise ratio feeding the downstream ML processing.


Sensor Data Collection

Sensor data collection concerns the process of capturing and logging actual sensor data

samples to be used as the training data for ML algorithm development. Beyond just recording

arbitrary data from sensors, when done well, the data collection phase seeks to minimize the

effects of any undesired or unknown effects that might introduce variability into the model and

subsequently lower the predictive performance of the ML algorithm.

SensiML Data Capture Lab provides a means to capture and label datasets accurately

whether by a single user or a large team of data collection and test technicians are

involved. By streamlining annotation for labels and project metadata based on predefined

custom fields, large scale collections can be done conveniently without custom scripting,

field notations, separate spreadsheets, or file conversion headaches.

Data Labeling

Data labeling often, but not always, is combined with sensor data collection. It is the supervised

piece of supervised machine learning in that labels provide examples of input sensor data and

output classification results used to train the ML model. Therefore, it becomes obvious hat

labeling is a critical step in the process and can lead to either good model performance or poor

performance in the same way a bad teacher can misguide a student and undermine their

learning with wrong examples. The actual process of labeling may be split from data collection

based on what makes sense for a given application. Some labels are readily apparent and can be

annotated easily by anyone. Others require expert insight and may not be performed by those

doing bulk subject testing and data collection work.

ML Algorithm Development

The Smart Edge AI approach to ML algorithm development uses training and test data to

generate the desired results. This automation of data science and ML processes is performed in

the cloud during the development phase to generate the algorithm that you will embed in your

IoT devices for local inferencing at runtime. The distinction of cloud versus edge processing is

that cloud computing is used only during the model selection and optimization process to

automate the work of the data scientist using AutoML tools that refine the algorithm to fit into

the capabilities of the endpoint device itself.

https://sensiml.com/wp-content/uploads/SensiML-Technical-Overview-Brief.pdf


SensiML Toolkit provides automation for the full spectrum of Smart AI functions to automate the process of ML algorithm generation. With over 80 feature extractors and a dozen different ML classifiers tuned for low-power endpoint IoT microprocessors, SensiML Toolkit can provide you with candidate algorithms when supplied with basic performance and constraint parameters.

Optimized Endpoint Code

Once you have generated the optimal algorithm for your IoT device insights, you need

functional code that you can load on to you low-power embedded IoT endpoint. Here the ML

algorithm is implemented in embedded code and then flashed onto the target IoT device to

generate classification results to be tested in real-world settings for accuracy. At this stage, the

model can either be deemed acceptable or additional test data added to the data collection

phase and the process iterated based on this new data. The challenge is to maintain model

fidelity such that the hard work in the model optimization and selection is not compromised by

simplifications made in the name of power and resource reductions. Using the Smart Edge AI

tool will create the right implementation of embedded code delivery and assurance

automatically.

The process of converting an ML algorithm into efficient embedded executable code can

be non-trivial. SensiML Toolkit utilizes feature and classifier libraries already hand-tuned

for low-power resource constrained microcontrollers and can instantly generate

corresponding library or binary code formats for your chosen ML algorithm and target

platform.

Local IoT Device Insight (Test/Validation of Local IoT Model)

Once created and loaded onto the IoT device itself, the remaining step in the workflow is to

subject the device to new validation testing to create data that was not used for the train/test

phase. This important step ensures the model generated is generalized and not overfit to simply

provide good results when presented with the same data as used to construct the model in the

first place. In some cases, the data collected for model validation may suggest a need for further

data collection and can be contributed back to the train/test dataset and the process is

repeated.

Developing Your Application Model

As previously mentioned, a model is the combination of the initial hypothesis put to practice

through collection of data that is labeled and then processed through a feasible working ML

classifier. This model building process done upfront is critical for an effective AI implementation

for sensors at your endpoint IoT device. It’s common for data collection projects start out with




too little attention spent on defining the insight objectives up front. The rationale for this

shortcut is the idea that capturing a mountain of data will sort it out in the end. This “more is

better” approach to data collection is reinforced by confusion between supervised ML as needed

for most smart sensor applications, and the branch of AI used for data mining or ”big data”

applications. Big Data uses unsupervised ML techniques like cluster analysis to discover useful,

previously unknown associations buried in large datasets.

When problems arise in developing data-driven edge IoT inferencing algorithms, often the issue

was a failure to take into account various common issues in the upfront design of the

application. It can be expensive and frustrating to discover a fatal flaw in the collection process

only after significant data collection has already been completed. To reduce this risk, develop a

well-planned application model with effective data collection protocols and an iterative process

for data collection. This section walks through key upfront assumptions and considerations to

help ensure you think through the various factors in your application model.

Constructing Your Hypothesis

The hypothesis is put to practice through the collection of data that is labeled then processed

through a feasible ML classifier. It comes down to the datasets you are using to train your

algorithm. This is why planning your data collection and labeling is essential to data driven ML

algorithm development. The hypothesis defines the problems or events you want the algorithm

to act on. It also defines the datasets you need to match the answers to the problems identified.

These are ground truth events based on real world empirical evidence from direct observation.

Hypothesis is the formation of a working theory for how available physical sensors can be used

to determine a desired set of classes for a given application (a good hypothesis would be using

accelerometer and gyro sensors on a user’s wrist to detect a given tennis swing, a bad

hypothesis would be using a temperature and pressure sensor to determine which class of

keyword is being spoken to a home smart hub). You’re relying on the domain expert to have a

sufficient understanding of the application to have a good assumption of what sensor data can

be used with an ML classifier and pre-processing to arrive at an accurate model. That’s the

hypothesis part. The proof-of-concept (limited data collection) is used to refute/confirm this and

then made robust with much more data typically.

Defining Your Insights

Insights are where one or more input sensors and/or contextual inputs are used to predict a

discrete state or class label (e.g. for a predictive maintenance demo this might be: ‘machine

normal, ‘faulty bearing’, ‘imbalanced load’). As you consider your set of desired insights, it’s

useful to think not only of those items of immediate interest but also about future insights of

interest that might be far easier to include in data collection from the outset than to recollect

from scratch at a later phase. By listing out all potential areas of interest upfront, you improve


your odds for optimizing data collection and development speed over the long run. The

predicted quantities of immediate interest, it is also worth considering desired future insights

where you might be able to start capturing valuable data now even though these model insights

might not be utilized until a subsequent product release.

Insights generated from ML predictive models can either take the form of continuous or discrete

values.

• Continuous events are periodic events requiring continuous classification, such as for

predictive maintenance function like what is the motor status is in a normal, warning or

failure state, or in a fitness application that detects user activity such as running, walking

or resting. Continuous events involve predictive regression models.

• Discrete events have trigger event actions. Classification occurs after a trigger, such as

in a wearable application with different types of gestures. Discrete values are commonly

known as predictive classifier models.

Classifier Model/Class Mapping

Your training data must contain the correct labeled answers or classes (ground truth) for the

example input data you provide as the basis of your model. Classification is the process of

predicting the class for the given data points. Classes are sometimes called targets, labels or

categories.

SensiML Toolkit uses the term event to refer to a class label as classification of time-

series sensor data typically involves a span of recorded sensor readings for a period of

time during which an event takes place that is of interest for subsequent detection.

The ML learning algorithm finds patterns in the training data that map the input data attributes

to the provided training classes and then delivers a ML predictive model for classification of

newly presented input data thereafter. Classification predictive modeling is the task of

approximating a mapping function (f) from input variables (X) to discrete output variables (y).

Classification belongs to the category of supervised learning where the targets or classes are

also provided along with the input data. There are many applications in classification in many

domains such as in credit approval, medical diagnosis, target marketing etc. Defining your

requirements upfront for classification is an essential upfront step and should mapped out and

reviewed by all those involved in the data collection process. This important step cannot be

overstressed in importance as the cost and time required for performing data collection and

labeling is the dominant activity in AutoML workflows. Done right based on good upfront

planning, the algorithm development process can be much faster than traditional hand-coded



algorithm development. Done wrong, additional time is often needed to augment or worst-case

recollect labeled datasets where classes were not properly labeled and/or relevant metadata not

collected. Involving the overall team’s input often can reveal any such deficiencies in data

collection methodology prior to the expense of data collection itself. Below is an example test

plan template to illustrate the class mapping portion of the test plan preparation (the full test

plan template and example inputs can be found in the appendix):

Smart Edge AI Test Plan: Boxing Punch Detection Wearable

Revision: 1.0 Last Revised: 12/15/2019 By: SensiML AE Team

Application Summary: Motion classification for recognition of boxing punches from glove-mounted

3-axis accelerometer and 3-axis gyro sensor device.

Desired Inference Classifications

Categorical Variable

(SensiML Event Group)

Class 1

(SensiML Event 1)

Class 2

(SensiML Event 2)

Class 3

(SensiML Event 3)

Class n

(SensiML Event n)

Must Include

Boxing Punch Jab Hook Uppercut Overhand

Should Include

Boxing Impact Knockout Punch Solid Connect Glancing Blow Miss

May Include

Boxing Stance Upright Semi-crouch Full Crouch

Future Classes

Boxing Defense Bob Block Clinch Cover-Up

Note above that the classification types are bucketed into ‘must include’, ‘should include’, ‘may

include’, and ‘future classes’. The reasoning behind this is to force upfront thought into the


potential additional data that may or may not be included in the product feature plans for an

initial product but may have value later. Again, the difference with data-driven algorithm design

versus code-based algorithm development is the collection and curation of high-quality

datasets. Given the time and expense of signing up user subjects for the sports wearable data

collection example in the test plan shown above, it can be minimal incremental effort to have a

domain expert (like a coach) label not just the type of punch, but the stance and impact as well.

Going back to capture this later could double the cost of re-recruiting subjects, securing lab

space, technician time, and domain experts compared to a trivial addition of coach labeling time

if anticipated beforehand.

On the other hand, adding completely new collection protocols (like the boxing defensive

moves), may or may not have merit depending on additional time/cost to collect longer session

data. Only you will know this answer for your intended application. The intent of the test plan

template is to elicit the consideration of such costs and future plans.


Say your product plan is to build a wearable sensor or sensor array for runners. Your first

product iteration might be aimed primarily at novices to provide coaching advice on how to

avoid injury from poor form. Your domain expert, an exercise physiologist, has determined that

heel strike and excess tibial (shinbone) rotation are the key motion parameters they seek to

detect from a device worn on the ankle with accelerometer sensors. Great, you know the

problem and the details for test planning will be more involved as discuss later, but you proceed

to enlist 200 test subjects to collect running data with the prototype ankle sensor product.

Fast forward to the next product release and the plan is now to expand the product’s insight to

running performance as well. The exercise physiologist tells you to infer this insight they need

user data on hip rotation and arm swing. If only you had thought ahead in our previous data

collection, you might have outfitted the initial 200 subjects with another sensor capable of

measuring that data as well. It was out of scope at the time, but the incremental cost of

attaching arm and waist worn sensors and capturing data and metadata for later use would have

been far less than recruiting 200 more subjects to start anew. Opportunity missed, and time and

costs increase. This example above applies not only to labeled sensor data but even more so to

associated context or metadata.

Now imagine your exercise physiologist, working with the product design team, raises the

opportunity to improve the wearable sensor’s feedback based on knowing user attributes. They

suggest that running speed is correlated with body mass index (BMI), a popular and simple to

obtain metric to obtain. But alas, you did not think to capture height and weight data for our

subjects during the data collection effort, so you’re missing the key context data needed as an

input feature to enhance the ML algorithm. In this case, the missed opportunity was involving

the domain expert in the upfront test design process to understand how desired insights might


drive inputs beyond the sensors themselves and which subject metadata should have been

included from the outset to inform the modeling work. While the above example is not always

foreseeable or practical, it pays to spend the time upfront to consider longer term product

insights.

Prototype IoT Device

As previously covered, any prototype to be used during development to collect sensor data in

the intended application should as closely as feasible approximate the sensor response and data

of the intended final product so that artifacts or erroneous data are not injected into the

algorithm that lead to poor algorithm performance.

Some changes can readily be made and transformed without the need to recollect data.

Examples include changed orientation of axes for inertial sensors or even changes in sensor

vendors provided there nearly identical responses or correlated calibrations. Positional changes

in motion sensors where the IC is moved elsewhere on the device printed circuit board (PCB) can

have more complex implications to repurposing of existing data. Careful thought and validation

of results before and after changes are needed to retain use of data across such physical design

changes.

Sensor Selection

The choice of sensor type for your IoT device starts with a clear definition of what predicted

outcomes are being sought from an algorithm. Sensor choice also includes not just what sensors

to use but also how many, location, orientation, physical coupling, frequency response, and

range. The selection of the sensor inputs feeding your model is one of ensuring you have

chosen your input sensors to maximize this correlation of measurable signal from the physical

sensor to desired insight while minimizing noise that can mask the desired signal.

Factors that influence the choice of sensor are many and highly specific to the application in

question. While it’s beyond the scope of this guide to give guidance on each and every potential

application, in this section we cover factors that are common across most applications. When it

comes to determining sensor location, the cost of time and effort to collect data far outweighs

the cost of the sensors themselves in most cases and thus favors over-collection at many

different locations simultaneously while still performing trial capture sessions.

Types of Sensors

For the time-series data, there’s always a physical sensor at the front of the chain that measures

a real-world property and converts to an electrical analog signal. There are a wide range of


physical sensors available as individual units of in combinations. The following lists the most

common types of sensors.

Physical sensor types

• Temperature Sensor

• Proximity Sensor

• Accelerometer

• Gyroscopic

• Vibration sensor

• IR Sensor (Infrared Sensor)

• Pressure Sensor

• Light Sensor

• Ultrasonic Sensor

• Acoustic Emission

• Smoke, Gas and Alcohol Sensor

• Touch Sensor

• Color Sensor

• Humidity Sensor

• Tilt Sensor

• Flow and Level Sensor

SensiML Toolkit supports a wide variety of time-series sensors such as accelerometers,

gyroscopes, magnetometers, microphones, load cells, pressure sensors, strain gauges,

acoustic emission sensors, ultrasonic, and piezo vibration sensors.

Virtual Sensors

Virtual sensors are the combination of individual physical sensors that enable you to monitor

more complex composite activities. For example, for a running wearable, there exists no such

physical sensor as a musculature/skeletal injury “riskometer”. Instead, you use readily available

low-cost MEMS motion sensors like multi-axis accelerometers and gyroscopes combined with

the ML algorithm and context data to create a “virtual” sensor or injury ‘riskometer’. Just like a

physical sensor, this virtual sensor has metrics for sensitivity, noise immunity, and error. You

control those performance parameters partially with use of ML methods to maximize the

algorithm portion, but also though is the selection and configuration of the physical input

sensors themselves.

Sensor Interfacing

When talking about the source physical sensors used as input to the virtual sensor device, there

are generally two types of sensor interfaces:


• Analog sensors – These purely analog devices that must interface with the

microcontroller (MCU) so there needs to be an analog to digital converter.

• Digital sensors – While most of these sensors have an MCU in them, they are almost

always not user-programmable and only support the fixed-function protocol within their

datasheet for interfacing with a host MCU. Digital sensors tend to have much better

noise immunity and are generally preferred in most cases.

The SensiML Toolkit executes via data received an Analog/Digital Converter (ADC) that

itself is either integrated in the MCU SoC, integrated as part of the sensor IC, or as a

discrete ADC IC. The SensiML Toolkit receives data from digital sensors via a peripheral

bus (i.e. SPI, I2C, I2S, and UART).

Physical Sensor Placement

A sensor may or may not be physically mounted on the same board as the MCU doing the

sensor processing. It depends on the application. If monitoring a large machine, there may be

multiple wired analog or digital sensors feeding into a virtual sensor MCU board that classifies

machine is operating normal or has one or more fault states based on limit switches, load cells,

temp sensors, vibration sensors located at various points on the machine. When the sensor is

remoted, noise and cable lengths come into play and dictate which interface makes the most

sense.


Consider an application to monitor and classify proper versus suboptimal running form from

motion data placed on an athlete. At first, we may not yet already know where such motion

sensors should best be placed on the subject. Should these be located on the shoe? The ankle?

Calf or thigh or perhaps on the runner’s hip to capture pelvic rotation? Certainly the insights of a

domain expert (in this case an exercise physiologist or coach) would help narrow down the

possibilities. But it’s also worthwhile to consider capturing data from multiple different locations

(more than are envisioned in the final product or system) to understand where the greatest

correlated sensor signal originates for the class labels desired.

When initially formulating a plan for sensor placement, consider overpopulating sensors for the

initial trial if practical. Often the cost and effort of collecting data can far outweigh the cost of

the physical sensors (exceptions include very expensive piezo and acoustic emission sensors and

other more exotic sensor types). Having more data initially can help quickly converge on ideal

placement, orientation and placement.


Sensor Configuration

Sensor configuration encompasses a number of factors aimed at maximizing the signal inputs

into the ML algorithm from the selected physical sensor(s) to ensure optimal performance.

Configuration of sensors includes considerations such as sample rate, signal gain or

amplification, noise suppression, filtering or signal conditioning, and analog to digital

conversion. Even the best sensor can easily be undermined by improper sensor configuration

and signal conditioning leading to a poor signal-to-noise ratio feeding the downstream ML

processing.

Analog Noise Suppression

Industrial environments often exhibit considerable electromagnetic interference (EMI) which can

adversely impact sensitive analog sensors. Noise can be injected through conducted noise in

shared power supply leads or induced through radio frequency (RF) into nearby sensors, signal

conditioners, microcontroller boards, and associated interconnect cabling. Sources for this noise

are many in the industrial environment and can include:

• Motors

• Transformers

• Contactors

• AC Power Conduits

• Solenoids

• Fluorescent and Arc Lighting

• Variable Frequency Drives

• Switching Power Supplies

Undesired noise can obscure the true signal and diminish model performance unless properly

mitigated. As an example, figures below show the effect of a nearby DC motor to a sensor that

used shielded coax signal cabling but lacked ideal shielding at signal termination.


Figure 3 - Baseline noise in analog sensor input

Same signal w/ nearby DC motor running

Clearly the noise seen in the right image would be undesired and likely to degrade model

performance unless filtered out or suppressed. To reduce EMI noise in your signals, the

following are considered as best practices to follow.2

1) Use high quality cabling for interconnects. Shielding, wire gauge, outer casing

material, flexure ratings, and terminations are features to review in the selection of

cabling. Better cable can go a long way towards minimizing induced noise issues. At

minimum use of twisted-pair wiring can help, but full braid shielded coax cabling works

2) Suppress motor noise at the motor itself. Use of common mode chokes and/or filter

capacitors at the motor terminals can do much to attenuate noise at the source.3 With

DC brushed motors, arcing between brushes and commutator always occurs to some

degree and can generate RF and conducted electrical noise. The best mitigation is

proper brush alignment to minimize the arc noise, followed by metallic shielding of the

motor case to minimize the transmission of the RF noise.

3) Isolate power source for sensors and data acquisition. Route cabling for sensors and

downstream amplifiers and signal conditioning modules separate from power

conductors for motors, contactors, transformers, and other noises sources. Verify power

supplies for data acquisition and sensors are not receiving conducted noise from power

supply coupling of other equipment. If cabling must intersect power wires, ensure they

do so at 90 degrees and not in parallel wire runs.

4) Maintain shielding at cable splits and terminations. At any shielded cable split or

termination use connectors with metal back shells that are properly connected to the

shield wire. No portion of the sensor conductor should be unshielded.

5) Use differential versus single-ended inputs. By not gauging different sensors to

common ground, noise susceptibility can be greatly reduced. Connect the output signal

to the plus (+) differential input and the sensor ground to the minus (-) differential input.

6) Avoid and remove ground loops. For shielded signal cables, ensure only one end of

shield wire is terminated (at the zero-signal reference potential for the signals within the

shield). In special circumstances shields may be terminated at both ends but care must


be taken that there is no difference in potential between either ends of the shield,

because if there is, a ground loop will be induced.

These are the primary considerations to keep in mind with analog sensor noise suppression. Far

more detail can be readily found in other literature, a few of which are listed in the references

section of this document.

Signal Conditioning

Not every signal pre-processing step can be done fully within the post analog-to-digital

conversion realm and thus attention and care must be placed on analog realm signal

conditioning steps. Typical cases where analog signal conditioning can be used effectively to

cleanse raw sensor data include:

• Amplification of very low voltage or charge based sensors (examples: piezo sensors,

thermocouples, and biosensor electrodes).

• Passive low-power analog filtering.

• Voltage-current translation for long cable runs.

• Signal Isolation (i.e. opto-isolators to protect compute domain from high

power/voltage).

• Calibration / Linearization (although often possible with post-ADC calibration).

• Sensor Excitation (necessary for some sensors like strain gauges and gas/oxygen

sensors).

SensiML Toolkit include a variety of pre-processing algorithms that can be used to cleanse

raw sensor data prior to feature transformation and classification within the algorithm

Sampling Rate and Recording Length

The general rule of thumb during data collection is to capture train/test data at the highest

fidelity as practical. The reason for this is that it’s always possible to go back and down sample

original data to a lower data rate. But it’s not possible to add fidelity to a signal that was

captured at too low of a sample rate. Sample rate selection should be at least twice the

maximum frequency component to be measured. This criteria, popularly known as the Nyquist

theorem, dictates faithful reproduction of digitized analog signals. Sampling rates lower than the

Nyquist sample rate will introduce error in the form of aliasing.

Figure 4 shows a graphical example of aliasing that illustrates with a simple sine wave signal the

misleading signal profile produced by sampling rate too low for the signal being measured.

Aliasing is an effect that causes different signals to become indistinguishable.


Figure 4 - Graphic example of aliasing showing misleading signal profile

Recording length for collected data is dependent on the nature of the event itself. For a discrete

event (examples: motion gesture, sound, or any time-series event with a clear start and end), it’s

no great surprise that the recording time should encapsulate the full event. While a distinct

signature may be possible from a subset of the event window, it’s best to collect as much data

as possible upfront and such schemes to identify subsets can be applied as post-processing of

the original dataset. Much like sample rate, it’s possible to truncate a sampled event window but

impossible without recollecting data to resurrect portions not captured originally.

For continuous events (vibration data, regularly cyclic movements like running/walking), it’s

helpful to capture a practically large number of iterations to use later for assess variation over

time and repetitions. For example, running and walking data would preferably contain 50-100

steps per recorded data sample rather than 2-5 strides. The incremental time and storage cost

are minimal but the cycle to cycle variance data might reveal important model information

contained in the longer sample window.

Sensor Data Collection

Capturing an accurately representative data set is the most important step in creating a smart

sensor algorithm using ML methods. The quality of the dataset is paramount to the ML

methodology. Upfront time spent ensuring data collection and labeling will be done in a high-

quality and cost-effective manner is well worth the invested pre-work effort. Figure 5 shows a

sample dataset collected from accelerometer.

In the data driven ML development process, your data collection is your custom training

mechanism for algorithm development. With the data science knowledge built into the

algorithm process, data capturing and labeling can be performed by semi-skilled data collection


people under guidance of the IoT device domain expert. Data collection and labeling for ML

algorithm learning is an iterative process of building a suite of datasets showing the events you

want to define as the basis for insights.

Figure 5 - Sample sensor data collection

In this section of the guide, you will grasp the key factors to carefully consider at the outset of

an ML data collection project. Working through a data collection plan will vastly improve your

odds for generating a high-quality application model quickly and with minimal iterative rework.

Keep in mind however that many applications are started without full knowledge of the factors

at play that will affect the model. Even the best planned data collection efforts can involve

iterative refinement to the test plan. It’s for this reason that later in the guide we’ll cover pilot

testing and the importance of validating the test plan itself before embarking on large scale

collection.

Training data is the fuel that ML uses to build an algorithm. A good rule of thumb is you

will need 30 to 50 data set samples to create a good algorithm. This is an iterative, trial

and error process that should not be done all at once.

Understanding Data Inputs

Building models is about using predictive algorithms to classify useful insights from varying

input data. Creating a model is about inferencing from signal patterns generated from data

inputs. The figure below shows and example of data inputs for a motor fault detection and

classification.

There are three sources of variance in any data collection for ML application.

1. Signal: Useful sensor measured differences correlated to intended insight.

2. Metadata: Knowable contextual differences that can be useful filters.

3. Noise: Unknown differences not correlated to intended insights.


These variances define the foundation for planning your data collection and labeling objectives.

1. Signal: Capture/label examples for each desired insight state across the distribution of

metadata and noise.

2. Metadata: Seek to convert as much noise variance into measurable metadata as

possible.

3. Noise: Seek to suppress; more unexplained variance equals more data required for a

good algorithm.

Figure 6 - Motor fault detection and classification data Inputs example

Sources of Variance

Variance should be thought about in terms of intended and unintended variance.

• Intended variance includes all the sources of variation that are directly associated with

our application model.

• Unintended variance is what we might consider as a form of noise in that we have not

or practically cannot accommodate it through measured inputs or simply did not

anticipate or think about or understand it in the upfront planning process.


Take the case of a predictive maintenance sensor for a piece of industrial machinery. You can

anticipate that if a V-belt between two drive pulleys is about to fail based on vibration and


microphone sensors attached to the pulley bearing blocks. You’ve thought about source of

variance in our failure model and included the following measured sources in the table below.

Intended / Measured Variance

Sensor Measured Variation Audio Signal as Measured At Belt Pulley

Vibration Signal as Measured at Pulley

Annotated Metadata Variation Brand of Belt Installed

Technician Who Installed Belt

Calculated Metadata Variation

Time Since Belt Placed in Service

Cumulative Machine Hours Since Belt

Last Changed

Table 1 - V-Belt intended/measured variances

At the same time, you should also spend time considering all of the other unplanned measures

and sources of variance that could impact or model. For this sample case, these might include

the variances in the table below.

Unintended / Unmeasured Variance

Potentially Measurable Variation

Pulley Center Distance

Pulley Angular Misalignment

Pulley Axial Misalignment

Belt Tension

Belt Wear Thickness

Belt Durometer Hardness

Belt Temperature

Ambient Ozone Reading at Machine

Potentially Annotatable Metadata Variation

Belt Plant of Manufacture

Belt Manufacture Date

Number of Plies (Belt Construction)

Table 2 - V-Belt unintended/unmeasured variances

For this example, the V-Belt failure prediction sensor, some of the unintended variation sources

may seem absurd at first. But this exercise is useful to go through at the outset and one of the

most valuable and overlooked tasks in data collection pre-planning. While many of the

unintended variations and measure may not be addressable, this exercise usually reveals

opportunities to confirm, control, or collect variance data that may prove invaluable during the

algorithm development and modeling phase.


In this V-Belt sample case, this exercise could lead you to recognize or change the following

elements of your failure prediction:

1) You may for train/test, choose to standardize on a brand, type, maximum age, and

construction of V-belt to eliminate unknown manufacturing variance in the belt itself

from corrupting our model. In turn, your fan belt sensor may then be certified as

accurate only when used with recommended belt specifications and brands

2) You conclude that belt tension is highly correlated to pulley separation and can be

readily measured with a non-contact displacement sensor on the idler pulley, so you

ensure this is added as an input sensor measured feature.

3) You recognize angular and axial misalignment could contribute substantially to your

implementation outcomes and ensure that operators are trained properly to configure

these contributing sources of wear variation to within a specified tolerance during

train/test. You further stipulate that the accurate of the resulting model requires that

operators ensure pulleys are aligned within model specifications.

False Positives

When developing an application, the tendency is to focus on the events of interest (i.e. a specific

gesture, a particular machine fault). But for good algorithm performance you also need carefully

consider those things that are NOT events of interest but might be confused by the algorithm as

such. False positives are a significant problem in any application that attempts to provide

meaningful insight from sensors. Excessive false positives erode confidence in the insights and

can lead to alarm fatigue (think of the din of false buzzers and alarms in a typical hospital ER).


A watch wearable where the device needs to trigger an event when the user brings their wrist up

to their face like their intending to read the watch display. This event triggers a screen wakeup.

But if you want to train the algorithm not just on the proper event that triggers an event but

also a bunch of examples of when NOT to trigger (scratching your head with your hand, wiping

your mouth with your arm, reaching out to turn on a light switch, etc.), you can see that even a

simple event detection algorithms can get complicated. You need capture not only all the

variations that SHOULD trigger (like left-handed vs right-handed people) but also those that

SHOULD NOT.

Population Diversity

As you seek to capture and properly represent variation in your application model, frequently

one of the most important is that attributed to subject-to-subject variance. The challenge is to

understand how to select your training dataset population to be most representative of the data


diversity expected for the desired insights. This is where knowledge of the application domain

expert is invaluable.

The broader the level of population diversity, the larger the dataset that is required to properly

characterize outcomes across the population. Don’t assume a fixed number of subjects will

suffice irrespective of population variance. Instead, ensure a sufficient number of representative

subjects exist for each unique combination of meaningful population groups. A good rule of

thumb is to have at least 10-20 examples for each unique combination. In each case, the

categories of subject variation should be captured and annotated for each subject as metadata.

In the runner example, you would record a running session and also collect information about

each subject’s experience level, height, and weight, and the type of running surface used during

the data collection.

Note that individual sources of metadata can be added over time to the model. Thus it is not

necessary to collect running data for all subject build types and experience levels across every

imaginable running surface from the outset. Rather the developer must realize the model’s

limitations will be subject to the coverage across the possible source of variation and population

diversity in the dataset to date. Don’t expect the model to universally apply to as-yet

unmeasured subject attributes not yet included in the training dataset. It’s more than likely the

model will perform suboptimal in those instances.


If you are creating a fitness wearable device for assessing and coaching on running form, an

expert running coach would understand that differences in running styles. Novices will exhibit

many more examples of poor form while seasoned runners will more likely demonstrate proper

form. Having training subjects solely from just one or the other population risks generating a

model that is incomplete in its characterization of various running forms that are likely to be

encountered by users of the product. In this case, experience is likely just one of several

important factors. Others might include runner height, weight, and even the type of surface

(treadmill, track, road, or unpaved trail).

Subject Sample Size and Dataset Sufficiency

One of the most common questions asked during ML data collection and labeling is ‘How many

subjects/examples do I need to get a quality model?” Unfortunately, there’s no one answer to

this question. For more information, read “How Much Training Data is Required for Machine

Learning.” Figure 6 shows a practical and data-driven way of determining whether you have

enough training data is by plotting a learning curve.

https://machinelearningmastery.com/much-training-data-required-machine-learning/

https://machinelearningmastery.com/much-training-data-required-machine-learning/


Figure 6 - A sample learning curve 4

The learning curve represents the evolution of the training and test errors as you increase the

size of your training set.

• The training error increases as you increase the size of your dataset, because it becomes

harder to fit a model that accounts for the increasing complexity/variability of your

training set.

• The test error decreases as you increase the size of your dataset, because the model is

able to generalize better from a higher amount of information.

As you can see on the rightmost part of the plot in the previous figure, the two lines in the plot

tend to reach and asymptote. Therefore, you eventually will reach a point in which increasing the

size of your dataset will not have an impact on your trained model. The distance between the

test error and training error asymptotes is a representation of your model's overfitting. An

asymptote is a line that continually approaches a given curve but does not meet in any finite

distance. This plot is saying whether you need more data or not. Basically, if you represent test

and training error for increasing larger subsets of your training data, and the lines do not seem

to be reaching an asymptote, you should keep collecting more data.

https://i.stack.imgur.com/haBYt.png


Phasing Data Collection

The process of collecting data should be iterative. It’s worth splitting a large data collection

effort into a pilot phase and a volume phase such that the limited pilot phase data can be

assessed and methodology revised if needed. Experience with sensor data ML projects shows

that even the best planned processes invariably have factors that had not been considered

upfront. Some of these are practical operational factors like developing an efficient process for

prepping subjects, collecting the data, annotating metadata, and organizing the database for

use later. Other factors arise after initial analysis of a limited dataset which may reveal surprises

about level of variance, signal/noise issues, missing metadata, or inappropriate population

selection which require revisions to the test plan.

Documenting Methodology

Ensure technicians, domain experts, database curators document the process methodology as

they proceed through the Smart Edge AI process so there is written documentation to reference

along the way ensuring consistent approach is used, and to debug any issues during the pilot

phase of data collection:

Data Labeling

Data labeling often goes hand-in-hand with sensor data collection attaching ground truth labels

for predictive insights to the training samples collected as well as relevant contextual or

metadata that aids in the predictive model. We treat this step separately as it may or may not be

done in tandem with data collection. The choice typically is driven by practical issues in

assessing and annotating ground truth as data is being collected. A hand gesture recognition is

pretty obvious to anyone collecting data. Alternatively, measuring gas turbine operating

conditions that lead to compressor blade failure is a different matter. Since structural health at

the microscopic level cannot be measured in situ during operation, labeling based on separate

disassembly and ultrasonic inspection requires separate but coordinated processes and datasets.

Enumerating Relevant Metadata Annotation

As stated in the section on population diversity, you should enumerate all of the potential

sources of variance and includes these as metadata in your data collection process. In addition

to the sensor data itself, it’s equally important to collect other contextual inputs such as those

metadata which can influence outcomes if known. Bear in mind that to be useful, the contextual

data needs to be knowable both for train/test stage model generation and during runtime use

of the model for prediction.


Careful thought upfront can identify the right set of metadata to capture and record during data

collection to balance the cost of collecting it versus its immediate usefulness for modeling and

long-term potential for value as more data is developed or the application evolves.


Consider a predictive maintenance model for a motor that predicts bearing failure. If the

variance in manufacturing quality across bearing suppliers was found to be substantial this

might be a useful model input, provided this data is easily obtainable in use. In this case, it may

not be practical or reasonable for the user to have such component information upfront, thus

while an important input, it’s not a practical one for purposes of building our model. Whether or

not you would choose to capture this information, it comes down to judgement within the

application as to whether the information might be usable in the future. It’s possible that such

information would be known upon servicing of the motor and thus could augment the mode

input at a later date.

Defining Data Labeling Methodology

The mechanics of labeling data are usually dictated by the complexity of the label

determination. In many cases, the outcomes desired are straightforward and immediately

obvious and thus can be annotated directly at the time of capture. Such is the case with gesture

recognition where anyone that is conducting the test is probably capable of recognizing the

performed gesture during data capture. But other applications may not be so readily apparent

and require offline analysis by a domain expert to label “ground truth” properly in the training

dataset.

ML Algorithm Development

Thus far, we have focused nearly all our attention on the front-end stages of ML algorithm

building: data collection, physical sensor choices and factors, pre-processing considerations and

test methodologies for collecting high-quality ML training data. These are vital factors in the

steps leading up to ML algorithm development itself as algorithms for ML are only as good as

the data used to train them.

The ultimate success in ML algorithm development beyond the front-end stages relies equally

on the ability to analyze this training data using ML frameworks to generate accurate predictive

ML code that performs well classifying never-before-seen data. This step is often the biggest

hurdle in the ML pipeline as it typically requires human expertise, data science training, and the

expertise in operating the specific AI tools and methods used to select and tune the ML

algorithms themselves based on this knowledge. Volumes have been written on this stage of the

process and are far beyond the scope of this document to cover.


Generating ML algorithms using the AutoML approach delivers sophisticated ML methodologies

to mainstream developers without data science backgrounds. Key to effective implementation of

AutoML tools is sufficient knowledge to understand how to interpret the results of the AutoML

process and evaluate models generated by the AutoML tool for appropriateness, accuracy,

sensitivity, specificity, overfitting, data splits, and efficiency.

The AutoML approach to ML generation is typically done in the cloud to harness the processing

power needed to create the algorithm. Once the algorithm is completed, you can evaluate it for

model appropriateness at the endpoint.

Defining Model Appropriateness

By model appropriateness, we’re talking about having a well-considered selection and

evaluation of possible algorithms for a given dataset and application. Data scientists will often

cite the “No Free Lunch Theorem” 5which in simple terms states that, averaged over all possible

problems, no one classifier outperforms any other. Thus, it’s common practice in machine

learning to try many models to find the one that works best for a particular application and

dataset. Data splitting of training datasets (which we discuss shortly) is used for validation or

cross-validation to assess the predictive value of many different models with the most suitable

one chosen based on the comparative results.

It’s beyond the scope of this document to get into the specifics of each and every type of

classification algorithm. The list is extensive and includes, in rough order of complexity, Naïve-

Bayes, k-nearest neighbor (KNN), logistic regression, support vector machine (SVM), decision

trees, ensemble models, and artificial neural networks (ANN) also known as deep learning

models.

What is worth noting is that AutoML tools, by their nature can allow for rapid iteration across

multiple algorithms, thus 1) making it practical to evaluate many different models against a

given dataset to determine the best performing type for a given problem and 2) allowing

developers with only modest data science background to test and select amongst a variety of

algorithms. The effectiveness of this approach obviously depends upon the variety of supported

algorithms and the level of automation supported by a given AutoML tool.

Unlike many tools that focus only on deep learning of ANN model frameworks, SensiML

supports a broad and growing array of classifiers. This diversity of modeling approaches

increases the likelihood of finding an optimal performing algorithm for a given application

and dataset. When factoring the limited resources of implementing ML in resource

constrained endpoint microcontrollers and IoT devices, this is particularly important to

good performance.


Accuracy

Accuracy represents the ratio of correct predictions provided by a given ML algorithm to the

total number of predictions made by that algorithm. Beyond the rather obvious desired

condition of an algorithm that achieves 100% accuracy, this metric if used alone is lacking in

scenarios where performance is less than 100% accurate. To understand more, we need to

further explore cases where an algorithm is inaccurate.


Imagine an algorithm that can use vibration and sound measurement to predict in advance

when a machine is about to suffer a catastrophic failure. Now imagine two scenarios where such

prediction algorithm fails. In the first failed case, the algorithm might indicate imminent machine

failure when in fact the machine is perfectly healthy and nowhere near failure. This could result

from erroneous signals, noise, or outside influences that trigger what is known as a false

positive. Now consider a case where the machine indeed fails, but the algorithm did not

anticipate the imminent failure and indicated all is normal. This opposite algorithm failure is

what is known as a false negative. Neither is desirable, but based on the application one type of

failure may be more costly or detrimental than the other (as with medical screening and lab

tests).

Where this is particularly important is in datasets that are class-imbalanced as is often the case

in real-world applications. By imbalanced, we mean skewed datasets where the incidence of one

class far outweighs others. In our machine example, the state of ‘machine healthy’ is usually

many orders of magnitude more prevalent than ‘machine failure imminent’. Thus, an algorithm

may appear very good by always reporting ‘machine healthy’ no matter what, since that’s the

predominant machine state anyways. Obviously that would not be a very insightful predictive

maintenance algorithm even if it was 99.98% accurate in actual use.

Specificity

To combat the issue of accuracy in class imbalanced datasets, we need to look at additional

metrics. One such metric is known as specificity which measures the ratio of algorithm predicted

negatives to all negative instances (whether true negative or false positive predictions). Stated

differently, this is a measure of false alarms or false positives. In applications where false

positives are to be avoided to maximum extent possible, specificity is an important metric.

Sensitivity

The reverse case is what is known as specificity (otherwise known as recall) or the ratio of

predicted (true) positives to all positives (either true positive or false negatives). Stated

differently, this is a measure of how good the algorithm is at catching positives. An algorithm


that always predicts a machine about to fail when applied to a machine that most often is

running normally would have high sensitivity (100%) but very low accuracy.

Precision

Yet another metric looks at the ratio of correctly predicted positives. This is known as precision.

An algorithm that gets half of its positive predictions right (and thus half are false positives)

would have a precision of 50%.

F1 Score

From the above metrics, it’s probably clear that no one of these measures alone gives a good

enough picture of overall model performance. And accuracy alone can be most misleading when

class imbalance is high like a motor that 99.9% of time is running normally and thus the 0.01%

instance of impending failure looks insignificant but comes at high cost of a destroyed machine.

The F1 score, a harmonic mean of precision and sensitivity, calculated as 2 x (precision x

sensitivity) / (precision + sensitivity) attempts to negate the biasing effect of highly imbalanced

class distribution and favors balance between precision and sensitivity. Often it is a better overall

metric than simply looking at accuracy.

Performance Measures for Multi-class Datasets

You might be wondering what if the algorithm is predicting from amongst multiple classes? The

above explanation of accuracy, specificity, selectivity, and F1 scores were introduced for the

simplest case of two-state classification (positive or negative, yes or no). Do are these measures

applied when the algorithm required must select amongst many classes (select from one of a

dozen gestures, or predict various machine fault states)? The answer is that these metrics when

used in multistate classifiers are considered for a given class relative to the sum of all other

classes. So in this way, performance measures can be broken down for each class in a multiclass

classifier and evaluated on the importance of each and all of the classes present.

Confusion Matrices

In the prior section we mention how we can extend two class algorithm performance measures

to multiclass datasets. But more commonly the performance of multiclass classification

algorithms is assessed by use of a confusion matrix. Figure 7 shows an example of what a

confusion matrix looks like.

On the vertical axis is represented the actual (or ground truth) distribution with one row per

class with each row summing to the total actual class occurrences for each class label. The

horizontal axis, with one column per class tabulates the distribution of algorithm predicted class


labels with each column summing to the total predicted class occurrences for each class label. In

the field of the 2x2 table are cells representing the total number of occurrences of actual and

predicted for each and every combination of actual and predicted class.

A perfectly accurate prediction model would see all occurrences at the same class label for

actual and predicted value and thus only values in the cells diagonally from the upper left to

lower right with all other cells being zero. Erroneous predictions show up in the triangular

regions on either side. Thus as a tabulated representation of multiclass algorithm performance,

the confusion matrix provides a handy and quickly readable format to assess model quality.

Figure 7 – Confusion matrix example

Overfitting/Underfitting

Having assessed the primary metrics of accuracy, specificity, selectivity, F1 scores, and confusion

matrices, we may think our job is done at the point we achieve algorithms with satisfactory

performance on these measures. But there are other considerations that can lead us into a false

belief we have a good performing algorithm.

Let’s next look at the concept of model fitting, either underfitting, overfitting, or good fit

characterization of algorithm to the training dataset. To help, a visual representation is given of

each state is shown in Figure 8.


Figure 8 – Underfitting, appropriate (good) fitting, or overfitting

Imagine a dataset with two features represented by the x and y axes in the graphs shown above.

By features, we mean model inputs derived from the underlying sensors we measure to predict

an outcome as a class. The classes in the graphs above are represented by the different markers

showing the distribution of the different classes relative to the two features in the space

represented by the x and y axes (known as feature space). Again, for simplicity we’ll consider

only two classes for illustration: “X”s and “O”s.

Now if we’ve selected good features for our dataset and application, we should see separable

regions in feature space that distinguish between the Xs and Os. With real-world data though

the reality is often not 100% clean and we’ll have complicated borders and outliers between the

regions that require are dividing line between classes (known as the decision boundary) to be

more complex. If our decision boundary is too simple to represent the true complexity of the

separable classes in our dataset, our algorithm can be said to be underfitting. Much of the

variance in the model is not being explained by our simple linear decision boundary in the left-

hand picture and thus we have an underperforming model. Now this will be evident in our

performance metrics (accuracy, F1 scores, confusion matrix) so we have clues to this problem

from the prior performance analysis discussed earlier.

Now let’s look at the right side. We’ve come up with a seemingly wonderful model that ensures

every class is where it belongs. We did this by creating an overly complex decision boundary

that meanders through feature space as required to appropriately classify our training dataset.

Great right? Wrong. In short, we’re kidding ourselves with a false sense of fit or what can be

characterized as overfitting.

An analogy we can all probably intuitively grasp the false sense of predictive value we might

obtain from taking a couple months of stock market asset price data and applying a complex

polynomial curve fit and then extrapolating that curve into the future and believing it will

continue to hold based on historical data. A similar false sense of accuracy applied in multiclass

classification models with overly complex decision boundaries or what we call overfitting.


Data Splitting: Train versus Test Data

The key to detecting overfitting is holding back some of our original dataset from the algorithm

training set and using it like new unseen data to test whether our model works when presented

examples it hasn’t been trained against. Thus we choose a fraction of data for training and the

remainder for testing (or validating) our model. If we’ve been successful in creating a model that

isn’t overfitting, the model should still perform well when given this “new” data we’ve set aside

for validation. The properly fit model is said to generalize well.

So where do we split our data? Too much of it used for training leaves little to provide assurance

testing against overfit. Too little used for training, we end up with a poorer model because we

did not have sufficient number of samples to characterize all of the actual variance in the data.

Training dataset size translates directly into money. Money from time and effort to collect

samples, money from machine time or test conditions that are difficult to record as needed. We

don’t want to collect more data than we need. Rather, we wish to collect just enough to capture

real-world variance for algorithm training with enough left over for validation and overfitting

checks.

A very common and clever solution is what is called k-fold cross-validation. This involves

chopping up your overall dataset into k subsets of data called folds. Then we iteratively train our

algorithm with all but one of these folds and validate with the holdout fold where we perform

the same train/test using a different holdout fold each time. This yields the much more efficient

use of the limited dataset between training and testing and maximizes our ability to tune the

model while limiting overfit.

SensiML supports an enhanced version of k-fold cross validation called stratified k-fold

cross validation that seeks to ensure each fold has a equal distribution of classes. So data

is thus redistributed to ensure each fold is a reasonable representation of the whole

dataset. SensiML also offers other validation schemes such as stratified shuffle split,

metadata k-fold, and stratified metadata k-fold validation.

Interpreting ML Performance

There are many more techniques to model training, combining, and validation that seek to

improve performance and minimize the pitfall of overfitting. These are beyond the scope of this

document and well covered in numerous references on machine learning. Our objective is NOT

to make you into a data scientist, but to teach you enough to understand how good

experimental methods, dataset collection, and labeling can be combined with powerful AutoML

toolkits and a basic understanding and appreciation of interpreting ML performance results to

achieve stellar algorithm performance with modest data science skills.


Converting an Algorithm to Optimized Endpoint Code

At this point, we have covered a great deal of ground having addressed sensor inputs, signal

processing, data collection and labeling, AutoML algorithm search, and performance assessment

aspects of endpoint AI algorithm development. But we still do not have functional code that we

can load on our low-power embedded IoT endpoint and test our application. The next step in

the ML pipeline involves transforming our algorithm into code that can be run optimally on our

target hardware. Since our hardware platform of choice for smart IoT endpoints is not a Linux

server with terabytes of storage, GHz of multicore CPU and GPU processors, and Gbps of

network bandwidth, committing an algorithm to practice in the form of power efficient code is

itself a non-trivial task.

This task must be done either iteratively, starting from an idealized algorithm that is in-turn

simplified to fit the hardware or integral with algorithm search and selection. The former is

common in development process. A statistical tool (like MATLAB or R) or ML framework (Caffee)

is used to arrive at an algorithm and then manually coded by firmware engineers in conjunction

with data scientists providing guidance on deviations from algorithms provided in compute rich

modeling tools.

For example, floating point math may be substituted for quantized integer mathematics to

minimize clock cycles required and/or memory for large arrays. Complex math may be

approximated with lookup tables, compiler optimizations made, DSP extensions and math

libraries employed, profilers used to streamline loops/branches and various other means to

improve cost/performance on the microcontroller chosen for the IoT device endpoint.

The challenge is in so doing, to maintain model fidelity such that the hard work in the model

optimization and selection is not compromised by simplifications made in the name of power

and resource reductions. It is here that many AutoML tools fall short as they do not carry the

implementation to the point of embedded code delivery and assurance. A few such tools do

automate this step in the pipeline as well.

SensiML supports a multi-platform embedded code generation step that integrates code

optimization early in the model selection process. Each feature extractor and classifier

algorithm includes profile data on code size to ensure that candidate models not only fulfill

performance constraints like accuracy and F1 score, but also memory limitations imposed

by the chosen target hardware to be used for the end product. In this way, SensiML Toolkit

extends the AutoML pipeline to ensure bit-exact implementation of efficient code on

device that performs just as expected from the output of the ML algorithm selection and

tuning process.


Test/Validation of Local IoT Device Insight

Once the ML algorithm has been implemented in embedded code and then flashed onto the

target IoT device, you are ready to test it in real-world settings for accuracy. At this stage, the

model can either be deemed acceptable or additional test data added to the data collection

phase and the process iterated based on this new data.

This testing includes empirical tests done on the device in real-world usage but also usually

includes reproducible test files run both in emulation and on the target device. Use of test files

ensures a repeatable test bench with which to consistently measure behavior of the algorithm

over changes as we make modifications. As you perform validation testing, it’s important that

you keep in mind some of the same error sources that can get introduced in model training and

test for these to ensure the model generalizes and performs well.

Sample Bias

Since humans are responsible for collecting both training and test data in building algorithms,

any inherent bias in human decision-making during training is often carried over into test phase.

Sample bias results from omission or skewing of subjects used for train/test datasets in a way

that is not representative of the population of subjects as a whole (i.e. what will be encountered

in subsequent real-world product use). Ways to detect sample bias may include assigning the

training data collection and test data collection to different individuals. Also using

methodologies to ensure random (stochastic) selection methods are used in selecting

samples/subjects for both training and testing.

Bias in ML models can trigger costly errors if not caught in the validation stage. Imagine a

product that is released only to discover it has omitted a major segment of subjects found in the

application population. This is where the advice and input of domain experts familiar with the

application space is invaluable. Have domain experts involved closely as the test data selection

process is made and utilize appropriate data de-biasing techniques to remove bias from your

test datasets. Fortunately, ML techniques provide advantages if or when bias is discovered.

Because all data collected can be partitioned into either training or testing, any bias discovered

can be corrected by re-pooling and partitioning data using stochastic methods to true

randomness been the splits.

The other advantage of ML is the speed in which iterative retraining can be performed versus

hand-coded algorithm development methods. With hand-coded algorithms, a given model

implementation is tested against expected usage as hopefully represented by bias free test data

as well as empirical spot checks from new sensor data streams coming from supported usage

conditions. Any errors in model performance that are discovered are isolated and root caused.

Developers seek to understand in this step if the issue is a result of a logical error in the

algorithm or whether it represents conditions and sensor data not anticipated at the outset of

the algorithm creation. Thus hand-coded algorithms involve significant time and manual effort


spent in probing cause/effect scenarios for outlier data and determining how to revise the

algorithm to address these exceptional conditions.

While the same process is true with traditional AI-based testing, the key difference is automated

re-training of data which is performed in compute time using standardized workflows and tools.

Thus a great number of passes through the algorithm development step with subsequent

testing can be performed using tools that support this process. By retaining with the inclusion of

the same test data used to discover incorrect prediction(s), the same data can be re-run through

model tuning or retraining step. In this way, the same dataset collected at the outset can be

augmented with additional new data for errant cases discovered during empirical testing as well.

Without manual effort expended in understanding how to revise the underlying algorithm to

accommodate the wrongly predicted example, the model training process can simply be

repeated as though this data was part of the initial training dataset.

Lifelong Learning and Iterative Model Updates

Taking this closed-loop learning concept further, the true power of AI based algorithms comes

from the adoption of what is known as “lifelong learning.” The ability to continually improve

algorithm performance throughout product usage, drawing not only from data collection and

labeling performed during the IoT sensor device’s product development stage, but even after

commercial launch and use in the field. This ability to generate datasets drawing from an entire

installed base of IoT devices rather than just the sample set conducted as part of the limited

product development and testing phases, opens up even more learning capabilities.

Such lifelong or continual learning can be segmented into two types, collective learning

benefiting the entire global population of devices or localized learning where algorithm

performance is adjusted based on new data specific to a given device. Collective learning

requires connectivity and transmission of newly labeled data back to the dataset repository used

for the AutoML retraining process. Localized learning, or algorithm personalization uses newly

labeled data to re-learn locally on a specific endpoint node.

Because the computational power required to retrain models is typically considerable, a full

retraining of the model is usually not practical on the endpoint IoT sensor device. Instead, the

feasible approach is to generate good results in the vast majority of cases performed by

classifier tuning. In this approach, feature engineering is skipped under the assumption that the

underlying feature vector is still valid and errors in prediction can be eliminated by changing

classifier hyperparameters. Much less computationally intensive than a full retrain and feature

engineering job, this tuning which involves adding, pruning, or resizing classifier neurons,

adding or modifying decision tree branches can be accomplished within the constraints of the

embedded microcontroller in most cases. By allowing the device to revise its model the machine

learning aspect is realized over time and performance will improve in use with additional

corrected data examples.


Figure 9 – Lifetime learning process

Conclusion

Over the course of this paper we have presented a number of considerations that should be

factored in the design of Smart Edge AI or AutoML based IoT sensor device algorithm

development. While there is much to know and factor into careful upfront planning, the process

building IoT algorithms with automated ML tools can vastly outperform traditional coding

approached in time, cost, and quality. Rather than expending a great deal of resources and time

on data science and embedded code firmware optimization, the AutoML approach exemplifies

in Smart Edge AI tools can free developers to focus on their intended application functionality.

Sensor

Data Input

Feature

Extraction

Buffered

Raw Data

AI Algorithm

Processing

Buffered

Feature

Vector

Prediction

Correct?

Tune Edge Model w/ Label

Corrected Feature Vector

Retrain Model Adding Label

Corrected Raw Data

Yes

No (Learn Locally)

No (Learn Globally)


Appendix – Smart Edge AI Test Plan Template (Example)

The following template is provided as an example for advance annotation of data collection,

metadata collection, and sources of variance to control during train/test data collection for

Smart Edge AI data capture activity. The example used is for a consumer sports feedback

wearable device, but similar approach can be used regardless of intended application.

Smart Edge AI Test Plan: Boxing Punch Detection Wearable

Revision: 1.0 Last Revised: 12/15/2019 By: SensiML AE Team

Application Summary: Motion classification for recognition of boxing punches from glove-mounted

3-axis accelerometer and 3-axis gyro sensor device.

Desired Inference Classifications

Categorical Variable

(SensiML Event Group)

Class 1

(SensiML Event 1)

Class 2

(SensiML Event 2)

Class 3

(SensiML Event 3)

Class n

(SensiML Event n)

Must Include

Boxing Punch Jab Hook Uppercut Overhand

Should Include

Boxing Impact Knockout Punch Solid Connect Glancing Blow Miss

May Include

Boxing Stance Upright Semi-crouch Full Crouch

Future Classes

Boxing Defense Bob Block Clinch Cover-Up


Intended Variance

Metadata Variable Metadata Value 1 Metadata Value 2 Metadata Value 3 Metadata Value n

Annotated Metadata

Subject ID Unique User ID#

Gender Male Female

Experience Expert Intermediate Novice

Dominant Hand Left-Handed Right-Handed Ambidextrous

Calculated Metadata

Subject Height Height (inches)

Subject Weight Weight (lbs)

Unintended Variance

Metadata Variable Metadata Value 1 Metadata Value 2 Metadata Value 3 Metadata Value n

Annotated Metadata

Test Technician Technician ID#

Collection Date m/d/y h:m

Calculated Metadata

Subject Warm-up Time (minutes)

Ambient Temp Temp (°F)

Sensor Inputs

Sensor Sample Rate Full Scale Range Type (Digital, ADC) Notes

6DoF IMU

(Accel/Gyro) 200 Hz

+/- 2G,

+/- 2000 dps Digital

QuickLogic Chilkat EVB

(on-board sensors)


References

1 “Embedded Machine Learning Design for Dummies” 2 “Top 8 Ways to Deal with Noise in Data Acquisition and Test Systems” 3 “A Guide to Understanding Common Mode Chokes” 4 "No Free Lunch Theorems for Optimization"

https://pages.arm.com/machine-learning-for-dummies.html?utm_source=google&utm_medium=cpc&utm_campaign=2019_ai-ml-platform_mk09-3_search_bol&utm_term=ml-dummies-guide&utm_content=whitepaper&utm_source=google&utm_medium=cpc&gclid=EAIaIQobChMI9tenvbCo5AIVwZ-zCh3ROQIAEAAYASAAEgJV8_D_BwE

https://www.winemantech.com/blog/top-8-ways-to-deal-with-noise-in-data-acquisition-and-test-systems/

https://www.coilcraft.com/cmc/index.cfm

https://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf

building smart iot devices with automl - sensiml...building ai on extreme edge devices (iot smart...

Documents