state-of-the-art techniques in deep edge intelligence

1

State-of-the-art Techniques in Deep Edge Intelligence

Ahnaf Hannan LodhiDepartment of Computer Engineering

Koc UniversityIstanbul, Turkey

[email protected]

Barıs AkgunDepartment of Computer Engineering


[email protected]

Oznur OzkasapDepartment of Computer Engineering


[email protected]

1 The potential held by the gargantuan volumes of databeing generated across networks worldwide has been trulyunlocked by machine learning techniques and more recentlyDeep Learning. The advantages offered by the latter have seen itrapidly becoming a framework of choice for various applications.However, the centralization of computational resources andthe need for data aggregation have long been limiting factorsin the democratization of Deep Learning applications. EdgeComputing is an emerging paradigm that aims to utilize thehitherto untapped processing resources available at the networkperiphery. Edge Intelligence (EI) has quickly emerged as apowerful alternative to enable learning using the concepts ofEdge Computing. Deep Learning-based Edge Intelligence or DeepEdge Intelligence (DEI) lies in this rapidly evolving domain. Inthis article, we provide an overview of the major constraints inoperationalizing DEI. The major research avenues in DEI havebeen consolidated under Federated Learning, Distributed Com-putation, Compression Schemes and Conditional Computation.We also present some of the prevalent challenges and highlightprospective research avenues.

Index Terms—Edge Computing, Edge Intelligence, DeepLearning, Artificial Intelligence, Deep Neural Networks

I. INTRODUCTION

The last decade has witnessed a remarkable transcendenceof data and information-centric services in all spheres of theglobal domain. The availability of powerful computationalresources and the ascendancy of learning systems have furtherled to unprecedented growth in data-generating devices andsensors. It has been estimated that close to 29 billion deviceswould be connected to the Internet by 2023 [1] with thedata traffic expected to reach 131 Exabytes (EB) by the endof 2024 [2]. Furthermore, the requirements for 6G aimingfor data rates of approximately 1Tbps per user [3] havefurther reinforced a growing realization that the traditionalcentralized/cloud systems would be unable to efficientlymanage the accompanying computation requirements.

The ability to imbue systems with intelligence is at theforefront of this technological revolution. Conventionalmachine learning tasks have rapidly found applications inmultiple domains. Simultaneously, Deep Learning (DL) hasrisen meteorically through the last decade with unparalleledperformance primarily in Computer Vision and Natural

1“This work has been submitted to the IEEE for possible publication.Copyright may be transferred without notice, after which this version mayno longer be accessible.”

Language Processing (NLP) fields. However, the performanceoffered by Deep Learning systems comes with significantcomputation and memory costs in addition to massive datarequirements. As of writing this article, the current State-Of-The-Art (SOTA) for image classification in the ImageNetLarge Scale Visual Recognition Challenge (ILSVRC) is theFixEfficientNet-L2 [4] which achieves a Top-5 accuracy of98% using 480M parameters. However, providing similarperformance in related applications at the user-end is currentlyconstrained by limited computational resources exacerbatedby collecting, communicating and storing the required amountof data.

The paradigm shift in the nature of networked servicesand the transformation of the connected devices coupledwith the distributed nature of data requires a decentralizedapproach for extracting maximum learning benefits. Toavoid overwhelming the network and data servers, thecomputational load must be moved at or closer to the networkedge. Edge Computing [5] offers a potentially powerfulsolution to this problem. The edge computing frameworkaims to leverage distributed computing concepts to alleviatethe computational load from the network core benefitingfrom processing power available close to the network edge.The confluence of Artificial Intelligence (AI) and EdgeComputing results in Edge Intelligence (EI). Application ofDeep Learning to achieve Edge Intelligence offers the addedbenefit of employing raw data without the considerable featureengineering and pre-processing overhead. The computationpower of the elements closer to the edge network offersa powerful alternative to centralized computing albeit in adistributed manner. Successful exploitation may result inelements of cloud services being shifted in close proximityto the data sources ensuring better data security as well asreduced load on the network backbone.

Our contribution with this article is an attempt at for-malizing the key constraints which must be addressed forrealizing efficient DEI applications. To the best of our knowl-edge, these constraints have been disparately discussed while’Device Disparity’ and ’Inference Transparency’ have notbeen formally considered previously. We further unify broadresearch avenues and classify the work under these categoriesto provide a concise overview of the evolution of Deep EdgeIntelligence. Lastly, we attempt to identify challenges andresearch directions not only related to the implementation of

arX

iv:2

008.

0082

4v3

[cs

.AI]

24

Dec

202

0

2

DEI but also present some potential learning schemes whichmight be highly suitable for this rapidly evolving domain. Tosummarize, the unique highlights of this article are as follows:

a) We consolidate the operational constraints for DeepLearning-based Edge Intelligence to reflect aspects notonly related to Edge Intelligence but also Deep Learning.

b) We classify the avenues of research progression in DeepLearning-based Edge Intelligence. Our aim with thiscategorization is to provide a broad basis for the currentand future research themes.

c) We identify key areas which offer the potential forprospective research uniquely suited for optimalapplication of DEI.

With this article, we aim to elaborate on the evolution ofresearch directions for Edge Intelligence (EI) in the presenceof the prevalent challenges. Starting from the elucidating onthe concepts of EI and key drivers behind Deep Edge Intelli-gence (DEI) in Section-II, we present the latest perspectivesfor achieving DEI in Section-IV. Finally in Sections-V, weprovide some open challenges and prospective research direc-tions to achieve robust and efficient learning paradigms forDEI. This is followed by Section-VI in which we summarizethe details of this article along with some thoughts to spurconstructive discussion on the topic.

II. EDGE INTELLIGENCE

The true potential of Artificial Intelligence (AI) is un-locked in a connected/networked domain where intelligentservices can be extended simultaneously to a large scale ofusers. Some of the more applications are based on but notlimited to facial recognition for smart surveillance, objectrecognition, language translation, sentiment analysis, and loadprediction. Sufficient to say that Machine Learning has alsobeen deployed as the first line of detection of the novelCoronavirus–caused respiratory disease (COVID-19) [6] atmajor transit points based on thermal imaging. Deployment ofisolated or centralized processing clusters offers an inefficientsolution due to resource wastage, prohibitive costs with themajor disadvantage being that of the distributed nature ofdata itself. The development of a framework that supportsdistributed operation and access to remote data is an essentialrequirement for the rapidly transforming applications.

A. Edge Computing

Edge Computing is the emerging domain being developedto address the limitations associated with cloud computing.This paradigm aims at shifting the computation load towardsthe outer edges of the network utilizing the devices at thedata origin and their geographical proximity. The motivationbehind the development of this domain is two-fold: It aimsto reduce the computational and communication load fromthe network core while enabling dynamic resource allocationfor applications in cyber-physical systems such as industrialIoT, smart buildings and grids, autonomous transportation,and remote healthcare [7]. Conventional networks can becharacterized by reduced computational power in the elements

Fig. 1. Levels of Network Hierarchy: a) End / User device: Devices/sensorsat the origin of data, b) Edge Nodes: Intermediate network devices connectingend devices to the network core, c) Cloud Servers: Hub of computation anddecision making in existing networks. The two outer levels are jointly referredto as ”Edge Levels” and are the focus of Edge Computing.

at the fringes of the network. However, the addition of devicesat the outer levels has also significantly outpaced the increaseof processing power at the network core [1]. Edge Computinghas emerged to support this dramatic increase in resourcerequirements by leveraging the untapped potential away fromthe enterprise data centers. Processing power is obtained by acollaborative operation between various entities at the networkedge including the user devices, mobile-based stations andgateways and access points.

B. Network Hierarchy

Due to the relatively early stages of development, EdgeComputing offers a multitude of terminologies to depict vari-ous stages of the network. In general, however, a conventionalnetwork can be categorized into three hierarchical levels fromthe perspective of Edge Intelligence (EI) as depicted in Fig-1:

1) End or user devices lie at the outermost levels ofthe network. These are the set of elements that arethemselves responsible for data generation or house ap-plications which do. These include but are not limitedto mobile devices including smartphones, tablets andsmart wearables, IoT devices, sensors for smart grids,homes and healthcare, and autonomous cars and drones.These devices are characterized by less computing powercompared to other devices on the network.

2) Edge server or nodes comprise of mobile base stations,gateways, access points (APs), micro data-centers, etc.existing at the intermediate level between the Edge De-vices and the Cloud/Data server level. These are estab-lished in proximity of the data sources having compar-

3

atively higher computational and storage capacity thanuser devices but still considerably lower capacity thanthe cloud level. Combined with the end devices, these twolevels can be thought to jointly form the ’Edge Level’.

3) Cloud or data server level is the innermost level inthe network hierarchy possessing maximal computationand storage ability. The processing is carried out at thislevel in conventional networks and settings. However, theremoteness of Cloud level from Edge devices incurs aconsiderable communication overhead, latency as well aspotential privacy issues.

C. Edge Intelligence

Edge Intelligence (EI) refers to the utilization of ArtificialIntelligence (AI) paradigm in Edge Computing Scenarios.Traditionally intelligent inferences are generated at the cloudlevel, which required data aggregation before undertaking thelearning process. Edge Intelligence, on the contrary, relieson establishing the AI support away from the cloud levelimparting a certain degree of intelligence at the networkedge. Edge Intelligence aims at maximally offloading thelearning and inference computations to the edge level thusalleviating resource demands at the cloud level. In generalEdge Intelligence requires a multi-disciplinary approach usingknowledge and efficient practices from fields including butnot limited to AI, Computer Architecture, Embedded systems,Compressive sensing and Distributed Systems, etc. for optimalperformance.

D. Operational Constraints

Adopting distributed computing brings forth its own set ofchallenges for optimal operation including data sharing, secu-rity, latency, etc. Additionally, the typical learning scenarioshave so far utilized centralized frameworks unifying both dataand computing resources at one single entity in the form ofservers or clusters. Edge Intelligence, on the contrary, aims toexploit the significantly untapped potential of the billions ofelements at the network edge. The current research to achievethis goal is primarily driven by the following factors [8], [9]:

1) Cost: Any form of decentralized computation results inmajor costs related to communication, energy, processingand memory.• Communication Costs: Remote computations require

data to be exchanged between various distributed el-ements introducing not only the cost of data commu-nication but the associated overhead costs in alreadycongested networks. The services at the end devices,whether provided by mobile applications over cellularnetworks or IoT networks are increasingly resortingto provide improved user experiences as well infor-mation. The demand for immersive Quality of Experi-ence (QoE) using Augmented Reality / Virtual Reality(AR/VR) alone is expected to result in an 8-foldincrease in data traffic. [10]. Furthermore, while theemerging networks are offering higher speeds, legacynetworks would face increasingly difficult prospects for

supporting such services. Incorporating Edge Intelli-gence in such an environment would thus be associatedwith its communication costs.

• Energy: A considerable majority of nodes at the edgelevel, whether the user devices or the edge nodes oftenoperate with limited energy budget. As opposed toa cloud server where energy constraints are consid-ered for economical operation, mobile devices at theuser end cease operation if they exceed their energyconstraints. Thus, all aspects of the learning process,from training to inference must respect these energylimitations.

• Processing and Memory: Deep Learning models re-quire considerable processing and memory resourcesto extract and learn the deep representations of thedata. In addition to a dearth of processing power atthe edge level, managing sufficient resources for deeplearning models to run in the presence of numerouscompeting services is a major challenge. In general,deeper networks more often outperform shallow net-works and thus higher performance requires largerstorage requirements.

2) Latency: Time-critical applications require real-time ornear-real-time inferences. For example, translation ofconversations from one language to another, object seg-mentation in photos or analysis, fusion and logical in-ference of data carried out by the sensors of autonomousvehicles especially self-driving cars are some applicationswhere the delay between the input and inference canresult in seriously compromised performance. Addition-ally, offboarding data for remote computation imposedadditional time costs due to communication to deeperlevels of the network. [11] presented effects of proxim-ity for a face detection task running on Amazon WebServices indicating a 50% task completion rate between200-600ms depending on the server location.

3) Scalability: With the proliferation of devices at the Edgelevel, sharing data for centralized as well as distributedcomputing becomes increasingly difficult due to commu-nication and processing bottlenecks. Decentralized com-puting must be able to seamlessly cater to the billionsof devices that contribute data or computing resourceswithout degrading the device performance as well ascongesting the network.

4) Privacy and Security: Malicious adversarial actionsagainst intelligent systems can encompass both the data aswell as the learning framework. Deep Learning-based AIsystems carry their own inherent risks [12], [13] , [14]. AIin a distributed setting exposes the learning architectureto a certain degree if the learning parameters are beingshared. These could then be used for adversarial actionsagainst the learners leading to degraded performanceand crippling critical automated systems. Furthermore,maintaining data integrity as well as prohibiting theexploitation of the associated metadata are challengesfaced whenever data is shared over the network. Thischallenge in addition to the privacy risk to the data makes

4

Edge Intelligent operation [15] [16] more difficult.5) Reliability: Conventional Deep Neural Networks (DNNs)

do not cater to reliability issues. However, in a distributedsetting, reliability guarantees need to be established forobtaining correct loss in addition to the fact that the stateof the network affects the performance of the edge com-puting scenarios [17]. There exists an algorithmic chal-lenge for implementing Edge Intelligence in the presenceof network (e.g. packet error and dropout, congestion)and client (e.g. offline, busy, adversarial environment)reliability issues both for training and inference process.Emergence of 5G, Ultra-Reliable-Low Latency (URLLC)networks offer promising avenues [18], however thelearning frameworks themselves need to be adapted tofully utilize the benefits of these modern communicationtechnologies.

6) Inference Transparency: Critical learning applicationsfor domains such as health, finance, security among oth-ers require interpretability [19] ”as an important elementof engendering trust in the inference system”. Inter-pretability, however, is yet to have a precise definitionin terms of machine learning though [20], [21] havepresented a general framework for gauging the underlyingdynamics of learned decision making. [22] presents anelaborate survey of the existing works categorizing themunder Model transparency and Model functionality forDeep Learning. A distributed setting, however, offers aunique scenario where the model transparency not onlycauses additional constraints on the model and operatingcosts, it constitutes an entirely different challenge dueto the environment and nature of distributed modelsthemselves. The first reported case of the hazards ofopaque learning systems and their impact on real-lifescenarios was recently reported as the wrongful arrest byDetroit Police due to a faulty match by the facial recog-nition system. It is imperative thus that transparency isintegrated into the learning models and Edge Intelligencemust adhere to the same.

7) Device Diversity in Edge: The edge levels house a myriadof clients with distinct storage and processing capabilities.The devices farther away from data origins are more ca-pable both in terms of memory and computational powerwith central servers being the most resourceful devices.Furthermore, the networks employ multiple communica-tion links and protocols which results in a highly diverseenvironment. These device and network characteristicsexacerbate the difficulty of developing a collaborativestructure for learning. Any Edge Intelligent system musttherefore possess inherent tolerance for this heterogeneity.Furthermore, the designed frameworks should considerthe nature (e.g. memory, processing, energy, static or mo-bile, data availability) of the platforms they are intendedfor. Platforms such as mobiles and autonomous vehicleshave relatively better processing power as compared toIoT or Smart Home sensor networks. A DNN designedfor IoT will not be optimal for a mobile setting and viceversa. It is therefore imperative that the learning systemsconsider the characteristics of their intended platforms as

a tunable parameter.

III. DEEP LEARNING FOR EDGE INTELLIGENCE

AI has revolutionized the digital services relying oninnovative new features and novel experiences such as smarthome, self-driving cars, and AR/VR applications. This surgeof data-driven applications has seen intelligence becoming anessential component for keeping up with growing emphasison customized experiences while fulfilling the contrastingService Level Requirements (SLR). Edge Intelligence is fastemerging as one of the most viable options to keep at parwith this thriving digital world without congesting the corenetwork. Within the learning paradigms, Deep Learning hasrapidly evolved to deliver the best performance in somedomains with major applications on the user end.

Deep Learning (DL) [23] has found widespread applica-tions in statistical learning tasks. Inspired by the anatomyof the human brain, Deep Neural Networks can learn deeprepresentations of data by just observing large volumes ofdata itself. DNNs have achieved unprecedented success incomputer vision for both images and videos, natural languageprocessing [24], medical diagnosis [25] and security appli-cations. Convolutional Neural Networks (CNNs), RecurrentNeural Networks (RNNs) including Long-Short Term Mem-ory (LSTM) Networks, AutoEncoders, Generative AdversarialNetworks (GANs), Deep Reinforcement Learning (DRL) [26]and their variants are some of the most commonly employedDL schemes.

A. Deep Edge Intelligence (DEI)

Traditionally, Deep Learning has been applied in a cen-tralized environment owing to resource constraints and datarequirements. Deep Learning-based Edge Intelligence or DeepEdge Intelligence(DEI) encompasses executing DL modelsThe Edge Intelligence resulting from the confluence of DeepLearning and Edge Computing however demands a carefulconsideration of the operating requirements to maintain adelicate balance between often conflicting requirements al-ready discussed in Section-II. We term this emerging domainas Deep Edge Intelligence (DEI). Contrary to conventionalDeep Learning, environment characteristics including devicecapability, network capacity/mode and acceptable performancethresholds form key constraints at the outset of the designprocess for DEI.

B. Deep Edge Intelligent Configurations

As with all machine learning frameworks, Deep Learningalso involves two distinctive phases: a) Training and b) In-ference The modalities involved in both of these processesoperating in a distributed environment significantly alter theway DEI problems are formulated. Training requires dataaggregation and computing resources to enable a DNN tolearn. Aggregating data from a distributed environment isbound to result in considerable communication and time costsin addition to posing a significant risk to user privacy and load

5

Ref Scope Highlights Major Challenges Identified

[17] DEI based Edge-operationsmanagement and applications • AI based network and resource manage-

ment operations for EC• AI applications feasible for deployment

in Edge environment

• DNN model finalization• Data management

[27] DEI techniques for optimizedtraining and inferences • DEI performance indicators

• DEI Techniques for inference and train-ing

• DEI development platforms• DNN and device limitations• Computation aware management

[28] DEI Applications• EI applications for autonomous structures

and operations of the future• Including driver-less cars, smart grids,

homes and cities

• Consolidation of emerging DEIenablers for optimal performance

[8] Modes of DEI deploymentacross Edge nodes for trainingand inference

• Identify various scenarios for optimal ex-ecution of DEI schemes on end-devices

• Model Splitting between devices andedge nodes and cloud servers

• Resource management• DEI benchmarking• Convergence of DEI with net-

work abstraction schemes

[29] Optimized DEI• Hardware, software and run-time opti-

mization• DNN security in edge environment

• Joint Hardware-Software opti-mization for DNNs

• Hardware-aware DNN Tuning• Conditional Computation

[9] Explore DEI research for train-ing and inference for both ”In-telligent Edge” and applica-tions.

• Application settings for both IntelligentEdge through automated caching, com-putation offloading and management op-erations

• DEI scenarios for both inference andtraining in addition to supporting hard-ware

• Software support for DEI and model de-sign frameworks

• Redefining of DNN performancecriteria for DEI

• Joint DNN optimization for train-ing and inference

• widespread adoption of Intel-ligent Edge and emphasis onTransfer Learning.

[30] DEI for caching and offloadingin addition to training and in-ference

• Identify core areas being utilized for eachof the four domains of focus

• Modes and acceleration of training andinference

• Edge optimized model design• Communication and model compression

in addition to cache operation deploy-ment

• Data disparity and uniform acces-sibility

• Pitfalls of deploying centrallytrained models

• Privacy and security issues

TABLE I: Summary of Related Surveys for Deep Edge Intelligence (DEI)

on the cloud network. Inference, on the other hand, requiresthat input be transmitted to the cloud at the cost of latency andprivacy while a consistent memory footprint is required for theDNN at the cloud level. To address these issues, DEI resortsto collaborative operations spread among various elements atall levels of the network. In principle, such models can bethought of as operating in a a) Centralized (Cloud-Centric)Mode b) Decentralized (Edge Centric) Mode c) Hybrid (JointCloud-Edge) Mode.

A summary of the surveys on the prevalent research di-

rections for harmonizing Deep Learning for Edge operationsis provided in Table-I. The areas covered under these worksrange from application to optimization centric and offer manyprospective avenues for DEI. [17] classifies the AI solutionsin the Edge environment as AI for Edge for managing net-work operations and AI on edge for various AI applicationsoperating at Edge networks. [27] provides an overview of theexisting literature according to training and inference opera-tions providing performance metrics. They further characterizethese architectural configurations according to multiple train-

6

ing and inference scenarios distributed between various levelsof the network. [28] provides a review of the application-specific Deep Learning approaches for Edge Intelligence. [8]elucidates on the modes of Deep Learning operation for EdgeIntelligence while [29] discusses current works on softwareand hardware optimization for Deep Learning in a distributedsetting. [9] presents a very comprehensive overview mergingvarious aspects of Deep Learning based solutions for improvedcommunications, network operations and implementations foredge networks. Finally, [30] provides an extensive reviewextending the operational classification of the existing EdgeIntelligence literature to include caching and offloading inaddition to training and inference.

IV. DEEP EDGE INTELLIGENCE (DEI):ENABLING MECHANISMS

The realization of Deep Edge Intelligence has been madepossible by some key enabling factors. These techniques aremore often applied in conjunction with each other to obtainoptimal performance for various settings. There also exists aconsiderable overlap in scenarios in which these techniquesare jointly applied for allowing inter-disciplinary solutionsto the challenges of DEI. These key enabling mechanismsinclude: a) Federated Learning, b) Compression Schemes,c) Distributed Computation and d) Conditional Computationas shown in Fig-2. Various works have elucidated on theseresearch areas considering their usage implications in trainingor inference processes or their utilization in model design ormodifications. Automating operations or intelligent edge ap-plications are also ways under which these domains have beendiscussed. However, these factors have always contributed inone form or the other in all settings which is why we focusholistically on the key concepts which are rapidly enabling theadvancement of DEI.

Fig. 2. Deep Edge Intelligence domains with major sub-domains

A. Federated Learning

Edge Intelligence tries to leverage the resources availablemostly at the edge level of the network. However, this domainhas a heterogeneous nature both in terms of the deviceand the communication protocols that link these devices.Federated learning, proposed in [31], enables these entities(clients) to learn collaboratively while being coordinatedby a central server without ever exchanging raw data. Thismechanism decentralizes the learning process by enablingclients to start from a common DNN and train on the locallyavailable data. Federated Learning ensures the privacy of

the users since no data is communicated between the clientsduring the training process. Subsequently, multiple clients arethen solicited by the central server to share their parameterswhich are then aggregated by the server itself (Fig-3).The updated model is then broadcast to the clients whichthen repeat the process until convergence. The inferenceprocess takes place on the client itself and thus data isnever transmitted over the network which ensures user andclient privacy. [32] in a comprehensive survey classifies thechallenges for federated learning dividing the issues as either”Algorithmic” or ”Practical” in nature. Communication andmodel optimization are the two key parameters that are usedto gauge the effectiveness of an FL algorithm. The other keyissues which dictate the current research avenues in federatedsetting are:

a) Non-IID and Imbalanced Data: Various works havetried to build on the Spatio-temporal relationships present inthe data, particularly for a geographical vicinity. Most suchworks deal with Edge Intelligent Caching, however, this cannotbe used as a generally acceptable property of data. As no priorinformation is available about the data being held at variousclients, the data is thus characterized as non-Independent andIdentically Distributed (non-IID) as well assumed to be classimbalanced. Furthermore, differential privacy may be ensuredby restricting participation by overly eager clients or addingnoise to the data at the clients, thus preventing memorization.

b) Inconsistent participation: Adverse network condi-tions, as well as offline devices, are catered by the centralserver by orchestrating a pool of devices to share their pa-rameters. Additionally, the access to training data as well asprocessing capability across the edge is non-uniform. The ag-gregation protocols should be robust to client/network outagesor degraded contributions in a federated setting

c) Privacy and Security: Federated learning maintainsprivacy by communicating only model parameters or updates.Furthermore, application of secure aggregation aims to add an-other layer of security on the communicated data. Additionally,However, malicious action against the participating devices inform of poisoning data, update corruptions can compromisethe aggregation process ultimately affecting all the clients.Thus, there remains a need to introduce trust guarantees forthe clients to ensure secure FL.

d) Synchronous vs Asynchronous Operation: A typicalfederated setting entails some level of synchronization tobe placed to ensure timely updates for the client model.However, varying network conditions, as well as client capa-bilities and presence are conditions that can seriously degradesynchronous federated learning. While asynchronous orches-tration of clients in a federated setting proves robust to thechallenges caused by adverse network and communicationconditions, such designs also cause higher latency and maygive rise to convergence issues. As an example, [33] establishthat the consensus mechanisms implemented through standardgossip algorithms in the presence of compressed communica-tion do not converge.

[34] while introducing Federated Learning identifies data

7

Fig. 3. Centralized Federated Learning

related challenges as key issues for a federated setting. Exper-imental results indicate that the proposed Federated-Averagingpossesses a degree of robustness to the class-imbalanced andnon-IID data. Then, [35] provides theoretical guarantees forlocal Stochastic Gradient Descent (SGD) running on convexobjective functions on batch learning problems, operatingon individual clients and averaged on a central server. [36]presents Temporarily Weighted Aggregation Federated Learn-ing (TWAFL): an asynchronous mechanism for aggregatingand updating parameters of shallow and deep layers of DNNsat different frequencies while aggregating the model parame-ters using temporal weighting. The authors report speeding upof the convergence process assigning higher weights to recentupdates. [37] proposes In-Edge AI using Deep ReinforcementLearning in a federated setting for caching and computationoffloading. The paper uses a Double Deep Q-Network forboth caching and offloading operations being used in federatedsettings and reports performance on par with centralizedDRL algorithms. An attempt to mitigate the reliability issuesassociated with central aggregation has been conducted in [38]where BlockFL, a blockchained Federated Learning architec-ture, is proposed. The model updates computed locally bythe clients are uploaded to the associated miner which afterperforming the Proof of Work (PoW) uploads the local updatesto the distributed ledger. The global update is computed atthe individual clients accessing the distributed ledger. Thisdecentralization, however, is achieved at the cost of greaterlatency during the training process.

B. Compression Schemes

Compression schemes are aimed at reducing thecommunication costs and memory footprint costs in DEIsettings. To this end, the major applications of compression liein two areas: a) Model Compression and b) CommunicationCompression

a) Model Compression: The parameters of DNNmodels can vary from thousands to hundreds of millions ofparameters. Model compression aims to compress to reducethe storage requirements of the DNN models as well as to

speed up the training process without degrading the accuracyof predictions. Model compression can be achieved by fivemajor methods [30]: Dimensionality reduction, pruning andsparsification, quantization, knowledge distillation and layerdesign. Dimensionality reduction is achieved through low-rankapproximation and matrix factorization which aim to constructreduced rank representative matrices or decompose the denseoriginal matrices such as weights of DNN layers. However,this process requires fine-tuning of the extracted matricesoften accompanied by a loss of accuracy. Network pruningentails the elimination of insignificant parameters within aDNN leading to a sparse network. Pruning can be appliedto both individual weights (fine-grained) or entire layers(coarse-grained). Once sparsification is achieved, the networkis retrained while keeping the pruned elements frozen. Therequirement of deploying high precision DNNs is evaluatedand downgraded accordingly during the network quantizationprocess. The existing redundancies in the DNNs are exploitedfor both eliminating and reducing precision while obtainingacceptable performance. Knowledge distillation, inspired fromtransfer learning, entails the construction of a smaller DNNwhich mimics the behavior of a larger DNN. The processinvolves creating a deeper teacher model which is then used totrain the more compact student model. Layer design is aimedat introducing compactness in the DNN at the architecturallevel. Pooling has been one of the ways conventional deepnetworks have tried to introduce compactness in the networkarchitecture. The principles of model compression apply tomultiple DNN settings and have found active use cases infederated and distributed settings.

[39] explores the application of a low-rank approximatedConvolution Neural Network (CNN) model in inferencegeneration while deployed in an IoT network. Convolutionallayers have benefited from compact design involving smallerfilter size, reduced channel depth and downsampling, allaspects which were combined in Squeezenet to achieve a 50xreduction in the number of parameters with a Top-5 accuracyof 83%, which is comparable to that of AlexNet. [40] furtherbuilds on this and explore the feasibility of deploymentof their proposed model on edge devices. pRivate mOdelcompressioN frAmework, RONA, a knowledge distillationframework is proposed in [41]. The presented frameworkemphasizes on differential privacy and explore its performanceon a mobile platform. [42] presents an energy-aware pruningmechanism for CNNs where the energy consumption of therespective layers are evaluated. The pruning is conducted onthe largest layers with the highest energy requirements andthe network is retrained after each pruning round. Finally,the entire model is globally fine-tuned to achieve the bestenergy-accuracy trade-off.

b) Communication Compression: At the other end ofthe spectrum are the applications of compression techniqueson the communication involved in operating DEI. Parametersharing during the training process in federated deep learningand feature uploads for both inference and distributed aresome of the key beneficiaries of communication compression.

8

Dimensionality reduction of the DNNs results in structuredupdates being learned to consist of a lesser number ofparameters. On the other hand, the updates being shared bythe clients are themselves compressed using quantization andvarious coding schemes. In addition to the communicationitself, the frequency of communication also becomes criticalat higher scales involving a large number of devices. Gossiptraining provides an alternative to centralized training,instead relying on gossip algorithms to achieve fasterconvergence, reduced upstream communication cost andcomplete decentralization. Furthermore, adaptive clientselection for participation in the aggregation is anotheractive area of research that can yield reduced communicationcosts. Following such a mechanism, the clients with valuablecontributions are selected for aggregation in a round.

Deep Gradient Compression (DGC) by [43] has reportedthat 99.9% of the gradients exchanged in distributed SGDis redundant to successfully employ gradient compressionfor distributed training. [44] shows that without appropriatecustomization, regular gossip training algorithms result in poorconvergence and far greater communication costs. The authorsinstead propose GossipGrad for better scalability reducing theoverall communication complexity from Θ(log(p) to O(1).To further reduce edge-cloud communication in a federatedsetting, [33] presents CHOCO-SGD as a communication ef-ficient distributed Stochastic Gradient Descent (SGD) andaggregation method. They also present CHOCO-Gossip as aconverging gossip algorithm for the distributed average con-sensus problem under compressed communication conditions.

C. Distributed ComputationVarying combinations of limited power, memory and

computation resources act as major bottlenecks foredge devices while sending inputs upstream can causenetwork congestion, adversely affecting privacy and latency.Distributed computation sees a collaboration of availableprocessing resources in achieving adequate processing power.Distributed computations can be categorized into two groups:a) Pipelining and b) Model Parallelism and Concurrency.

a) Pipelining: Pipelining [45] serves as an effectivetool in distributing specific DNN computations or layersamong various on-device processors or other nodes acrossthe network. The former method is used in a general multi-processor environment to speed up processing while the latteris referred to as (Model Segmentation) in the context of DEI.Implementing the segmented structure of a DNN can resultin reduced energy costs and improved inference latency. Onthe other hand, Computation Offloading can be consideredas an offshoot of model segmentation. However, computationoffloading is more adaptive in terms of collaborating withnodes based on the prevalent network conditions and variousrequirements. Model Segmentation and Computation Offload-ing are frameworks that reflect a collaborative processingamong various devices spread across the network hierarchy.

• Model Segmentation: Model Segmentation is imple-mented by partitioning the DNN layers among various de-

vices across the levels of the network as depicted in Fig-4. This enables the architecture to leverage collaborativeoperation using the resource-constrained devices at theedge. Various blocks or layers are housed at various levelsacross the network. With split models, only the learnedfeatures instead of the inputs are shared. Additionally,features communicated by the preceding portion of theDNN to the subsequent layers are much smaller than theinput size and thus has the potential to reduce inferencelatency. Segmented models also achieve improved latencyand energy efficiency.

Fig. 4. Model Segmentation

• Computation Offloading: Computation Offloading is anoptimization-based method which enables the tasks tobe computed off-device. Offloading techniques enablethe complete or partial offloading of DNN computationsdeeper along the network. However, the decision tooffload computation needs to consider trade-offs betweenvarious factors including but not limited to inferencelatency and accuracy, network status as well as targetdevice resources and scheduling. Computation offloadingcan be achieved in the following configurations [9]:Partial Offloading takes place when some of the tasksare uploaded to other nodes and cloud. HorizontalOffloading allocates concurrent tasks to edge deviceswhich then communicate their results to the cloudserver. A natural extension of such an offloadingscheme for the highly resource-limited end-devicesis to only perform input pre-processing to producea Region of Interest (RoI) which is then fed to theclassifier/s. Vertical Segmentation entails that the tasksare allocated hierarchically along the network upstream.Each preceding result becomes an input to the subsequentstage.

b) Model Parallelism and Concurrency: : The systemspresent across the network have a highly heterogeneousnature. The smart mobile devices have particularly seen aproliferation of multi-core processor systems which maybe utilized for an edge intelligent environment. Modelparallelism divides the computation among various on-deviceprocessors. For a convolutional layer in a DNN, filters inthe depth dimensions can be assigned to separate processorsto run simultaneously. Attempts have also been made toconfigure DNNs to maximize parallelism and minimizecommunication by introducing redundant operations withinthe DNN architecture [46]. Model concurrency, on the otherhand, overlaps operations such as forward and backward passand weight updates by executing them simultaneously.

9

GPipe [47] pertains to model segmentation where a DNNpartitioned into K cells. Each of these cells can then beplaced on a different accelerator which is then trained onmicro-batches. [48] proposes Neurosurgeon, a framework topartition layers across the network based on their computeand data characteristics. [49] explores the feasibility of modelparallelism techniques for highly resource-constrained IoTnetworks and propose an evenly distributed data processingpipeline. JointDNN proposed in [50] presents a collaborativelearning method between edge and cloud while also employingcompressed communication. Finally, DNNs have also beenconverted into Directed Acyclic Graphs (DAG), which arethen simplified using pruning and the process is repeatedto eventually yield a much simpler graph network. In [51],the authors present the Dynamic Adaptive Surgery Scheme(DADS) to segment DNNs under various network conditionsby treating inference minimization as a min-cut problem.

D. Conditional ComputationDNNs with greater depth have displayed the ability to

learn deeper representations of the input. However, this depthis often accompanied by prohibitive computation, latencyand memory costs. Conditional computation mechanismstry to establish the best trade-off between accuracy and acombination of constraints discussed in Section-II-D whichthey achieve by exploiting redundancies. A conditionallycomputing learning system may resort to selectivelydeactivating certain network operations if the executiondoes not yield considerable improvement in the inferencequality. This mechanism is called ”Early Exit of InferenceEEoI”. In terms of a DNN, it can be compared to thedrop out layers; the only difference being the case of EEoI,entire computation blocks are restricted from operating. Theprinciple of conditional computation may also be extended toselect the most valuable contributors in a Federated Learningsetting. In such a case, only those clients may become partof the aggregation which have a significant impact on thenetwork learning or performance. Another application ofEEoI lies in the domain of Edge Caching which involvesskipping DNN execution if the features of the current inputresemble those of a cached inference.

Early Exit of Inference: Early Exit of Inference (EEoI)has been established that the shallow layers of a DNN learnmore general features whereas deeper layers learn more spe-cific ones. Early Exit of Inference, based on this observation,is another mechanism to obtain a balance between inferencelatency and accuracy. In such cases, Early Exit of Inference(EEoI) enhances efficiency by allowing the inferences to begenerated by earlier layers instead of having the input totraverse the entire depth of the DNN. The suggested mech-anism is employed if the inference can be generated withhigh confidence. EEoI has also been combined with ModelSegmentation to allow computation offloading in cases wherethe inference does not meet the required thresholds. A majorbenefit of EEoI when compared with Model Segmentation andOffloading Computation is its potential for reduced communi-cation overhead and improved inference latency.

EEoI has also found its application in the context of EdgeCaching which is based on identifying ”redundancy of re-quests” [30]. Edge Caching is inspired by the same mechanismused to speed up memory access in computing systems.The efficiency of Edge Caching systems is conditional uponeffectively identifying the redundancies in the communicationor computation. Edge Caching has conventionally been catego-rized into ”What”, ”Where” and ”How” to cache at the EdgeNetwork. Attempts to answer these through the applicationof various DL applications have been done among others in[52], [53], [54]. These and other related works have managedto explore these questions by centralized learning predictingpopular content based on the features of the user requests ortraining a DNN using the outputs from heuristic algorithms.These methods certainly improve the caching performanceand yield high Quality of Experience (QoE). However, thefacilitation of DEI deployment for Edge Caching is achievedby addressing the problem of ”How to Access” the cache.DNNs deployed in a geographically co-located manner arelikely to receive similar requests. Caching the results ofprevious inferences can enable future inputs to skip DNNexecution if their features match the features of the storedinference with a certain confidence. In this manner, EEoI isachieved in case of a successful cache ’hit.

[55] proposes ’Boomerang’, a cooperative DNN frameworkfor IoT employing EEoI mechanism by training with multipleexit points. Shallow-Deep Networks by [56] introduces theconcept of ’Overthinking’ in DNNs to elaborate redundantand wasteful computations in DNNs particularly CNNs. Theauthors term their use of EEoI as Internal Classifiers (ICs)which they introduce along various stages of the DNN tocombined shallow and deeper layers as required. [57] proposesCachier, which stores the inferences as well as the correspond-ing inputs features at the Edge Nodes. An offline machinelearning model trained on features from possible queries isthen used to find the best match for the features from thecurrent input. CacheNet, proposed in [58], is a mechanismto cache DL models on end devices as well as edge nodes.The end device houses multiple smaller submodels trained onknowledge partitions whereas baseline models are kept at theedge device. The end-device uses a ’hint’ mechanism to selectthe model most suitable for the input data which in case of a’miss’ uses the edge model for inference.

V. OPEN CHALLENGES ANDFUTURE RESEARCH DIRECTIONS

DEI offers a powerful solution for ensuring the ubiquityof AI for the current networks with a considerable space forfuture-proofing. Despite the obvious benefits, there still existseveral open challenges and research directions that maybe explored to enhance the effectiveness of DEI. The openchallenges can be categorized under three broad categories asdepicted in Fig- 5:

1) DNN Design2) Learning Domain3) Data Issues

10

Fig. 5. Some open research directions for DEI

Each of these domains presents its own unique set of chal-lenges addressing which are essential for DEI to flourish. Wepresent some research avenues and open challenges under eachof these areas in the subsequent pages.

A. DNN Design

The DNN architectures for DEI should be designed withattention to the operating edge environment to performoptimally. Special emphasis should be laid on adapting themodels to the edge constraints which in turn brings itsown set of challenges. Going forward, DL Architecture andHyperparameter Search and Explainability are two significantchallenges for DEI.

a) DL Architecture and Hyperparameter Search:DL is being employed to address increasingly challengingproblems. Since the early days of MNIST classificationproblems, the models have evolved to become deeper andmore sophisticated in terms of the layers and the connectionsthey employ. As elaborated in Section-II, the edge networkdepicts a significant diversity in devices as well as data (IIDvs. non-IID). DNN architecture designs need to cater forthese additional conditions for an optimal performance. TheDNN architecture engineering for such a varied environmentmay be excessively challenging and time consuming fortraditional human efforts. The success of Automated MachineLearning (AutoML) which allows for automated DNNarchitecture search is evident by the fact that the currentILSVRC winner [4] is based on an architecture refinedby AutoML. Although AutoML frameworks such as thoseoffered by Google and Microsoft Azure offer powerful tools,they explore model design space for centralized learningsettings. Model segmentation, concurrency and federation aresome aspects essential to DEI which are currently absentfrom such automated search. [59] and [60] propose anAutoML framework for refining and designing mobile-basedDNN architectures respectively. However, the approach isstill limited to a family of devices and AutoML may berequired to cater for the additional constraints offered by theedge environment. AutoML with the incorporation of edgeconstraints may be employed to accelerate designing andtesting DNNs for DEI.

b) Explainability: The concept of model explainabilityhas been briefly stated in Section-II. However, the prospectof interpretability in the context of DEI is itself a verychallenging topic. Inference explainability for DL has garneredconsiderable importance as a means to provide context to thestatistical metrics used to gauge the efficacy of the learnernetwork. explainability for conventional models has so farbeen achieved by using consolidating information across thedepth of the network. In conventional DNNs, the notionexplainability has been addressed through ”Attention Mech-anisms” [61] [62] or inference-time/post-hoc processes suchGradient Weighted Class Activated Mapping (Grad-CAM)[63]. For DEI, however, the shallow nature and segmentationare key challenges that need to be catered for to incorporateinterpretability. Furthermore, coming up with mechanisms thatprovide explainability without considerable communicationand computation costs is a highly challenging prospect.

B. Learning Domain

Learning algorithms dictate the effectiveness of DEI.However, on-device resources at the edge network are scarceand the homogeneity, availability and quality of data all arehighly conditional. A considerable amount of research isbeing carried out to address the former issue in the form ofdistributed computation schemes and specialized hardwarelike NVIDIA Tensor Cores, Tensor Processing Units andApplication Specific Integrated Circuits (ASICs) [45],however,limited research has been carried out in exploring algorithmicalternatives. Evolutionary Computation and Meta and Multi-Task Learning are two powerful algorithmic families whichhave discussed below in the context of DEI as powerfulalternatives for research.

a) Evolutionary Computation for DEI: Actions ina network are propagated widely in the Spatio-temporaldomain. With intelligent clients becoming prevalent in theemerging networks, predicting the interactions betweenvarious entities is likely to become highly complex. GoogleDeepMind’s AlphaStar [64] offers an interesting case studyin this perspective for an environment which is characterizedby Real-time interaction, Partially observable environment,Complex action and state space and Non-unique strategicperspectives, all aspects which are extant to varying levelsin the current and future networks. While the highlight ofthe original work refers to the efficacy of fusing variousaspects of AI research, [65] highlights the effective use ofPopulation-Based Training (PBT) in the former, a centralnotion of the Evolutionary Computation. Additionally, PBT isboth asynchronous and distributed [66] and exploits scalabilityto overcome prohibitive memory and time costs associatedwith training large single networks. Federated Learning maybe considered a variant of PBT, however, the aggregationschemes merge the models without attaching considerablesignificance to individual conditions and performance. PBTmay find significant usage in DEI environments whereco-located clients may be enabled to learn competitively orcollaboratively in a Game Theoretic setting especially for

11

autonomous operations. To the best of our knowledge, verylimited work has been carried out to explore the opportunityafforded for DEI by Evolutionary Computing.

b) Meta & Multi-Task Learners: The DL models in theDEI environment are much more likely to come across tasksor data they have not been trained on. Having a distinct modelfor each of these expected scenarios is a prohibitive prospect.Therefore, AI domains that enable generalized learning maybe able to address this challenge. Meta-Learning is one suchparadigm in which a learner trained on certain data can learn anew task with relatively few examples. A Meta-Learning basedmodel-agnostic learning framework for gradient descent basedlearners including DL was proposed in [67]. For a distributedsetting a Federated-meta-learning system was proposed in[68]. The system uses a set of nodes that collaborate in afederated fashion to learn their tasks. Once an external nodeis required to perform a certain task, the coordinating platformshares the model with that target node to enable it to adaptthe model using a few iterations. [69] presents a meta-learningbased Personalized Federated Learning. The authors proposethe determination of an initialization point for the model for allthe clients which enables them to learn their respective tasksin a few iterations. Similarly, Multi-Task Learning (MTL)is another domain which may prove advantageous for DEImodels. The goal of MTL is to quickly learn models formultiple tasks simultaneously. Then [70] proposes FederatedMulti-Task Learning in which the model weights of the clientslearning their tasks, are constrained to be related.

C. Data-associated Challenges

The proliferation of AI has been possible because of theaccessibility of data. With its distributed nature, DEI offers amultitude of challenges in securing and efficiently utilizingdata. The risk to Privacy and Security and the issues arisingfrom Data Diversity are two major issues at the heart of thechallenges of the effective utilization of data.

a) Privacy and Security: The emergence of data whileoffering numerous advantages, also holds significant risksto individual privacy and security in the hands of maliciousactors. DEI requires varying levels of participation fromthe networked elements from minimized data sharing inFederated Learning to offloading computation all the wayto the cloud. The threat to DEI is two-fold: Comprisingthe user privacy or the learner integrity. Ensuring privacy iscurrently achieved via [32] Secure Multi-Party Computations(MPC), Differential Privacy and Transparency. Security onthe other hand may be targeted by overwhelming the device/s,corrupting data and model updates and malicious exploitationof the knowledge of the DNN models for inference evasion.A DL based unsupervised method for adversarial attackdetection in the Edge Computing scenario has been proposedin [7]. The suggested scheme actively learns the features ofattacks in an unsupervised manner. Similarly, [71] highlightsthe vulnerabilities of a Federated Learning setting. Theauthors envisage that a model replacement scheme where

a malicious actor aims to replace the global model with abackdoored model while ensuring that the latter survives theaveraging process. [72] is a representative work that highlightsthe problems with inference evasion in a scenario where acat image is falsely classified as Covid-19 positive chestX-ray image by both DNNs and Bayesian Neural Network(BNN). This work refers to the problems which arise whencommon DL models are faced with Out-of-Distribution data.It also highlights that armed with knowledge about the DNN,malicious agents can effectively bypass DL based filteringand security measures.

b) Data Diversity: Data plays a pivotal role in thesuccessful execution of any learning system including DL.Traditionally, data aggregation for centralized learning resultedin maintaining a level of homogeneity. The challenges relatedto data for DEI on the other hand are:

1) Accessibility, Quality and Bias: With DEI becomingpervasive at the edge, a key requirement is for the DEImodels to get access to the user data which would requirea formal mechanism to be in place. An inherent propertyof this mechanism has to be ensuring the privacy of theusers/data origin. Additionally, there is no existing way togauge the quality of data being provided by various Edgeentities. The data may be noisy, feature-poor, mislabelledor downright malicious. Furthermore, any data which maybe used for DEI learning will be biased which must becatered for by the DEI models.

2) Heterogeneity The myriad of devices residing at the edgelevels follow a wide array of protocols for data collection.The data accessible for DEI is very likely to be non-uniform. Attempts to bring homogeneity in this data willlose valuable information in the process which allows theDL models to learn the deep distinctive features. Anotheraspect of data heterogeneity is associated with the envi-ronment as well as the source where the data is beingcollected [30]. Environments or sensors can be overlynoisy which adversely affects the data being collected.Furthermore, various sensitivity and quantization levelsfor sensors measuring the same quantity even in the sameenvironment may result in large variations in the datagenerated. Additionally, data augmentation is commonlyused for DL models to improve the learning process. Ina distributed environment, augmentation across all datainstances may also lead to exacerbation in the adverseproperties of the data, the effects of which are likely topropagate across devices due to model sharing.

VI. CONCLUSION

This paper aims to consolidate the current trends inresearch on Edge Intelligence vis-a-vis Deep Learning.Due to the relative nascence of the Edge Intelligencefield, we feel that there exists a considerable diversity inthe nomenclature and areas being identified. During theformulation of this report, we have aimed to conceptuallyconsolidate the existing areas currently being researchedin Deep Edge Intelligence. Initiating with an emphasis

12

on the fact that data and devices are on track to becomeprohibitive for centralized computations, we introduce theconcepts behind Edge Intelligence subsequently focusing onDeep Edge Intelligence (DEI). We summarize some of thekey factors which govern the current and prospective areasof research among which to the best of our knowledge,’Model Transparency’ and ’Device Limitations’ have notbeen explicitly covered previously. These enablers or theircombinations have been employed in various DNN modelsto jointly to achieve optimal performance in a DEI setting.The major challenges for DEI vary from DNN design tothe learning frameworks for efficiently dealing with theconstraints of DEI.

It is worth noting that despite these early successes, thenetwork and data environment is predicted to increase atbreakneck pace with 6G communication envisioned to disruptthe way services are provided currently. One of the keychallenges to developing lasting frameworks for DEI is toestablish theoretical guarantees for DEI. Such works will beable to identify potential solutions that can persevere in the 6Gera and beyond. Furthermore, better communication links willenable an exponential proliferation of heterogeneous data atthe edge network. Effective learning strategies such as Multi-Task Learning, Self-Supervised learning and Meta-Learningare some of the key domains whose application in DEI canlead to more generalized learning schemes.

REFERENCES

[1] “Cisco annual internet report 2018–2023 white paper”.” [Online]. Avail-able: {https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html}

[2] “Ericsson Mobility Report June 2019,” p. 36, 2019. [Online].Available: https://www.ericsson.com/49d1d9/assets/local/mobility-report/documents/2019/ericsson-mobility-report-june-2019.pdf

[3] “Key drivers and research challenges for 6G ubiquitous wirelessintelligence,” p. 36. [Online]. Available: {http://jultika.oulu.fi/Record/isbn978-952-62-2354-4}

[4] H. Touvron, A. Vedaldi, M. Douze, and H. Jegou, “Fixingthe train-test resolution discrepancy: Fixefficientnet,” arXiv preprintarXiv:2003.08237, 2020.

[5] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision andchallenges,” IEEE internet of things journal, vol. 3, no. 5, pp. 637–646,2016.

[6] D. S. W. Ting, L. Carin, V. Dzau, and T. Y. Wong, “Digital technologyand covid-19,” Nature medicine, vol. 26, no. 4, pp. 459–461, 2020.

[7] Y. Chen, Y. Zhang, S. Maharjan, M. Alam, and T. Wu, “Deep learningfor secure mobile edge computing in cyber-physical transportationsystems,” IEEE Network, vol. 33, no. 4, p. 36–41, Jul 2019.

[8] J. Chen and X. Ran, “Deep learning with edge computing: A review,”Proceedings of the IEEE, vol. 107, no. 8, p. 1655–1674, Aug 2019.

[9] X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen,“Convergence of edge computing and deep learning: A comprehensivesurvey,” IEEE Communications Surveys Tutorials, vol. 22, no. 2, p.869–904, 2020.

[10] T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta, and D. Sabella, “Onmulti-access edge computing: A survey of the emerging 5g network edgecloud architecture and orchestration,” IEEE Communications SurveysTutorials, vol. 19, no. 3, p. 1657–1681, 2017.

[11] M. Satyanarayanan, “The emergence of edge computing,” Computer,vol. 50, no. 1, pp. 30–39, 2017.

[12] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learningin computer vision: A survey,” IEEE Access, vol. 6, p. 14410–14430,2018.

[13] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, andA. Swami, “The limitations of deep learning in adversarial settings,”in 2016 IEEE European Symposium on Security and Privacy (EuroS P),Mar 2016, p. 372–387.

[14] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks anddefenses for deep learning,” IEEE Transactions on Neural Networks andLearning Systems, vol. 30, no. 9, p. 2805–2824, Sep 2019.

[15] Y. Xiao, Y. Jia, C. Liu, X. Cheng, J. Yu, and W. Lv, “Edge computingsecurity: State of the art and challenges,” Proceedings of the IEEE, vol.107, no. 8, p. 1608–1631, Aug 2019.

[16] Y. Chen, Y. Zhang, S. Maharjan, M. Alam, and T. Wu, “Deep learningfor secure mobile edge computing in cyber-physical transportationsystems,” IEEE Network, vol. 33, no. 4, p. 36–41, Jul 2019.

[17] S. Deng, H. Zhao, W. Fang, J. Yin, S. Dustdar, and A. Y. Zomaya,“Edge intelligence: The confluence of edge computing and artificialintelligence,” IEEE Internet of Things Journal, p. 1–1, 2020.

[18] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless networkintelligence at the edge,” Proceedings of the IEEE, vol. 107, no. 11, p.2204–2239, Nov 2019.

[19] Z. C. Lipton, “The mythos of model interpretability,” Queue, vol. 16,no. 3, pp. 31–57, 2018.

[20] C. Molnar, Interpretable Machine Learning, 2019, https://christophm.github.io/interpretable-ml-book/.

[21] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretablemachine learning,” arXiv preprint arXiv:1702.08608, 2017.

[22] S. Chakraborty, R. Tomsett, R. Raghavendra, D. Harborne, M. Alzantot,F. Cerutti, M. Srivastava, A. Preece, S. Julier, R. M. Rao, and et al.,“Interpretability of deep learning models: A survey of results,” in 2017IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advancednad Trusted Computing, Scalable Computing and Communications,Cloud and Big Data Computing, Internet of People and SmartCity Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).IEEE, Aug 2017, p. 1–6. [Online]. Available: {https://ieeexplore.ieee.org/document/8397411/}

[23] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,no. 7553, pp. 436–444, 2015.

[24] D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages ofdeep learning for natural language processing,” IEEE Transactions onNeural Networks and Learning Systems, 2020.

[25] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sanchez,“A survey on deep learning in medical image analysis,” Medical imageanalysis, vol. 42, pp. 60–88, 2017.

[26] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “Asurvey of deep neural network architectures and their applications,”Neurocomputing, vol. 234, pp. 11–26, 2017.

[27] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edgeintelligence: Paving the last mile of artificial intelligence with edgecomputing,” Proceedings of the IEEE, vol. 107, no. 8, p. 1738–1762,Aug 2019.

[28] F. Wang, M. Zhang, X. Wang, X. Ma, and J. Liu, “Deep learning foredge computing applications: A state-of-the-art survey,” IEEE Access,vol. 8, p. 58322–58336, 2020.

[29] A. Marchisio, M. A. Hanif, F. Khalid, G. Plastiras, C. Kyrkou,T. Theocharides, and M. Shafique, “Deep learning for edge computing:Current trends, cross-layer optimizations, and open research challenges,”in 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI),Jul 2019, p. 553–559.

[30] D. Xu, T. Li, Y. Li, X. Su, S. Tarkoma, and P. Hui, “A survey on edgeintelligence,” arXiv:2003.12172 [cs], Mar 2020, arXiv: 2003.12172.[Online]. Available: http://arxiv.org/abs/2003.12172

[31] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas,“Communication-efficient learning of deep networks from decentralizeddata,” arXiv:1602.05629 [cs], Feb 2017, arXiv: 1602.05629. [Online].Available: http://arxiv.org/abs/1602.05629

[32] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis,A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings,and et al., “Advances and open problems in federated learning,”arXiv:1912.04977 [cs, stat], Dec 2019, arXiv: 1912.04977. [Online].Available: {http://arxiv.org/abs/1912.04977}

[33] A. Koloskova, S. U. Stich, and M. Jaggi, “Decentralized stochasticoptimization and gossip algorithms with compressed communication,”arXiv preprint arXiv:1902.00340, 2019.

[34] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas,“Communication-efficient learning of deep networks from decentralizeddata,” arXiv:1602.05629 [cs], Feb 2017, arXiv: 1602.05629. [Online].Available: http://arxiv.org/abs/1602.05629

13

[35] S. U. Stich, “Local sgd converges fast and communicates little,”arXiv:1805.09767 [cs, math], May 2019, arXiv: 1805.09767. [Online].Available: http://arxiv.org/abs/1805.09767

[36] Y. Chen, X. Sun, and Y. Jin, “Communication-efficient federateddeep learning with asynchronous model update and temporallyweighted aggregation,” arXiv:1903.07424 [cs, stat], Mar 2019, arXiv:1903.07424. [Online]. Available: http://arxiv.org/abs/1903.07424

[37] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-edgeai: Intelligentizing mobile edge computing, caching and communicationby federated learning,” IEEE Network, vol. 33, no. 5, p. 156–165, Sep2019.

[38] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Blockchained on-devicefederated learning,” IEEE Communications Letters, vol. 24, no. 6, p.1279–1283, Jun 2020.

[39] P. Maji, D. Bates, A. Chadwick, and R. Mullins, “Adapt: optimizing cnninference on iot and mobile devices using approximately separable 1-dkernels,” in Proceedings of the 1st International Conference on Internetof Things and Machine Learning, 2017, pp. 1–12.

[40] M. J. Shafiee, F. Li, B. Chwyl, and A. Wong, “Squishednets: Squishingsqueezenet further for edge device scenarios via deep evolutionarysynthesis,” arXiv preprint arXiv:1711.07459, 2017.

[41] J. Wang, W. Bao, L. Sun, X. Zhu, B. Cao, and S. Y. Philip, “Privatemodel compression via knowledge distillation,” in Proceedings of theAAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 1190–1197.

[42] T.-J. Yang, Y.-H. Chen, and V. Sze, “Designing energy-efficient convolutional neural networks using energy-aware pruning,”arXiv:1611.05128 [cs], Apr 2017, arXiv: 1611.05128. [Online].Available: http://arxiv.org/abs/1611.05128

[43] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradientcompression: Reducing the communication bandwidth for distributedtraining,” arXiv preprint arXiv:1712.01887, 2017.

[44] J. Daily, A. Vishnu, C. Siegel, T. Warfel, and V. Amatya, “Gossipgrad:Scalable deep learning using gossip communication based asynchronousgradient descent,” arXiv preprint arXiv:1803.05880, 2018.

[45] T. Ben-Nun and T. Hoefler, “Demystifying parallel and distributed deeplearning: An in-depth concurrency analysis,” ACM Computing Surveys(CSUR), vol. 52, no. 4, pp. 1–43, 2019.

[46] U. Muller and A. Gunzinger, “Neural net simulation on parallel comput-ers,” in Proceedings of 1994 IEEE International Conference on NeuralNetworks (ICNN’94), vol. 6. IEEE, 1994, pp. 3961–3966.

[47] Y. Huang, Y. Cheng, A. Bapna, O. Firat, D. Chen, M. Chen, H. Lee,J. Ngiam, Q. V. Le, Y. Wu, and et al., GPipe: Efficient Training of GiantNeural Networks using Pipeline Parallelism. Curran Associates, Inc.,2019, p. 103–112. [Online]. Available: {http://papers.nips.cc/paper/8305-gpipe-efficient-training-of-giant-neural-networks-using-pipeline-parallelism.pdf}

[48] Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, andL. Tang, “Neurosurgeon: Collaborative intelligence between the cloudand mobile edge,” ACM SIGARCH Computer Architecture News, vol. 45,no. 1, pp. 615–629, 2017.

[49] R. Hadidi, J. Cao, M. S. Ryoo, and H. Kim, “Collaborative executionof deep neural networks on internet of things devices,” arXiv preprintarXiv:1901.02537, 2019.

[50] A. E. Eshratifar, M. S. Abrishami, and M. Pedram, “Jointdnn: an efficienttraining and inference engine for intelligent mobile cloud computingservices,” IEEE Transactions on Mobile Computing, 2019.

[51] C. Hu, W. Bao, D. Wang, and F. Liu, “Dynamic adaptive dnn surgeryfor inference acceleration on the edge,” in IEEE INFOCOM 2019-IEEEConference on Computer Communications. IEEE, 2019, pp. 1423–1431.

[52] Z. Chang, L. Lei, Z. Zhou, S. Mao, and T. Ristaniemi, “Learn to cache:Machine learning for network edge caching in the big data era,” IEEEWireless Communications, vol. 25, no. 3, pp. 28–35, 2018.

[53] Z. Chen, Q. He, L. Liu, D. Lan, H.-M. Chung, and Z. Mao, “Anartificial intelligence perspective on mobile edge computing,” in 2019IEEE International Conference on Smart Internet of Things (SmartIoT),Aug 2019, p. 100–106.

[54] S. Rathore, J. H. Ryu, P. K. Sharma, and J. H. Park, “Deepcachnet:A proactive caching framework based on deep learning in cellularnetworks,” IEEE Network, vol. 33, no. 3, pp. 130–138, 2019.

[55] L. Zeng, E. Li, Z. Zhou, and X. Chen, “Boomerang: On-demandcooperative deep neural network inference for edge intelligence on theindustrial internet of things,” IEEE Network, vol. 33, no. 5, pp. 96–103,2019.

[56] Y. Kaya, S. Hong, and T. Dumitras, “Shallow-deep networks: Under-standing and mitigating network overthinking,” in International Confer-ence on Machine Learning, 2019, pp. 3301–3310.

[57] U. Drolia, K. Guo, J. Tan, R. Gandhi, and P. Narasimhan, “Cachier:Edge-caching for recognition applications,” in 2017 IEEE 37th interna-tional conference on distributed computing systems (ICDCS). IEEE,2017, pp. 276–286.

[58] Y. Fang, S. M. Shalmani, and R. Zheng, “Cachenet: A model cachingframework for deep learning inference on the edge,” arXiv preprintarXiv:2007.01793, 2020.

[59] Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl formodel compression and acceleration on mobile devices,” in Proceedingsof the European Conference on Computer Vision (ECCV), 2018, pp.784–800.

[60] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard,and Q. V. Le, “Mnasnet: Platform-aware neural architecture search formobile,” in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 2019, pp. 2820–2828.

[61] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.Gomez, L. Kaiser, and I. Polosukhin, Attention is All you Need.Curran Associates, Inc., 2017, p. 5998–6008. [Online]. Available:http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

[62] S. Jetley, N. A. Lord, N. Lee, and P. H. S. Torr, “Learn to pay attention,”p. 14, 2018.

[63] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh,and D. Batra, “Grad-cam: Visual explanations from deep networksvia gradient-based localization,” International Journal of ComputerVision, Oct 2019, arXiv: 1610.02391. [Online]. Available: http://arxiv.org/abs/1610.02391

[64] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik,J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, and et al.,“Grandmaster level in starcraft ii using multi-agent reinforcement learn-ing,” Nature, vol. 575, no. 7782, p. 350–354, Nov 2019.

[65] K. Arulkumaran, A. Cully, and J. Togelius, “Alphastar: an evolutionarycomputation perspective,” in Proceedings of the Genetic and Evolution-ary Computation Conference Companion. ACM, Jul 2019, p. 314–315.[Online]. Available: https://dl.acm.org/doi/10.1145/3319619.3321894

[66] M. Nowostawski and R. Poli, “Parallel genetic algorithm taxonomy,” in1999 Third International Conference on Knowledge-Based IntelligentInformation Engineering Systems. Proceedings (Cat. No.99TH8410),1999, pp. 88–92.

[67] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning forfast adaptation of deep networks,” arXiv preprint arXiv:1703.03400,2017.

[68] S. Lin, G. Yang, and J. Zhang, “Real-time edge intelligence inthe making: A collaborative learning framework via federated meta-learning,” arXiv:2001.03229 [cs, stat], May 2020, arXiv: 2001.03229.[Online]. Available: http://arxiv.org/abs/2001.03229

[69] A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federatedlearning: A meta-learning approach,” arXiv:2002.07948 [cs, math,stat], Jun 2020, arXiv: 2002.07948. [Online]. Available: http://arxiv.org/abs/2002.07948

[70] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, FederatedMulti-Task Learning. Curran Associates, Inc., 2017, p. 4424–4434.[Online]. Available: http://papers.nips.cc/paper/7029-federated-multi-task-learning.pdf

[71] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How tobackdoor federated learning,” arXiv:1807.00459 [cs], Aug 2019, arXiv:1807.00459. [Online]. Available: http://arxiv.org/abs/1807.00459

[72] A. Mallick, C. Dwivedi, B. Kailkhura, G. Joshi, and T. Y.-J. Han, “Canyour ai differentiate cats from covid-19?sample efficient uncertaintyestimation for deep learning safety,” p. 9.

state-of-the-art techniques in deep edge intelligence

Documents