federated imitation learning: a novel framework for cloud … · 2019-12-30 · complex skills...

Federated Imitation Learning: A Novel Framework for Cloud RoboticSystems with Heterogeneous Sensor Data

Boyi Liu1,3, Lujia Wang1, Ming Liu2 and Cheng-Zhong Xu4

Abstract— Humans are capable of learning a new behaviorby observing others to perform the skill. Similarly, robots canalso implement this by imitation learning. Furthermore, if withexternal guidance, humans can master the new behavior moreefficiently. So, how can robots achieve this? To address theissue, we present a novel framework named FIL. It providesa heterogeneous knowledge fusion mechanism for cloud roboticsystems. Then, a knowledge fusion algorithm in FIL is proposed.It enables the cloud to fuse heterogeneous knowledge from localrobots and generate guide models for robots with service requests.After that, we introduce a knowledge transfer scheme to facilitatelocal robots acquiring knowledge from the cloud. With FIL,a robot is capable of utilizing knowledge from other robotsto increase its imitation learning in accuracy and efficiency.Compared with transfer learning and meta-learning, FIL is moresuitable to be deployed in cloud robotic systems. Finally, weconduct experiments of a self-driving task for robots (cars).The experimental results demonstrate that the shared modelgenerated by FIL increases imitation learning efficiency of localrobots in cloud robotic systems.

I. INTRODUCTION

In tradition imitation learning scenarios, demonstrationsprovide a descriptive medium for specifying robotic tasks.Prior work has shown that robots can acquire a range ofcomplex skills through demonstrations, such as table tennis[1], drawer opening [2], and multi-stage manipulation tasks[3]. Nevertheless, there exist a number of problems in theapplication of imitation learning. For example, large amountsof data are required and sometimes they are heterogeneous.These drawbacks result in long training time for the robot andlimited generalization performance. Cloud robotic system [4]can be adopted to increase the learning efficiency of robots,and federated imitation learning algorithm is proposed to fusethe shared knowledge of robots.

*This paper was recommended for publication by Okamura, Allison uponevaluation of the Associate Editor and Reviewers’ comments. This researchis supported by the Shenzhen Science and Technology Innovation Commis-sion (Grant Number JCYJ2017081853518789), the Guangdong Science andTechnology Plan Guangdong-Hong Kong Cooperation Innovation Platform(Grant Number 2018B050502009) and the National Natural Science Foun-dation of China (Grant Number 61603376) awarded to Dr. Lujia Wang. TheNational Natural Science Foundation of China (Grant Number U1713211) ,theShenzhen Science, Technology and Innovation Commission (Grant NumberJCYJ20160428154842603) and Basic Research Project of Shanghai Scienceand Technology Commission (Grant No. 16JC1401200) awarded to Prof.Ming Liu.

1Boyi liu, Lujia Wang are with Cloud Computing Labof Shenzhen Institutes of Advanced Technology, ChineseAcademy of Sciences. [email protected];[email protected]

2Ming liu is with Department of ECE, Hong Kong University of Scienceand Technology. [email protected]

3Boyi liu is also with the University of Chinese Academy of Sciences.4Cheng-Zhong Xu is with the University of Macau. [email protected]

Human

Human

Robots

Robots

…

Imitation learning Imitation learning

Imitation learning with guides from others Imitation learning with guides from the cloud

Training data

Training data

Fig. 1. The child on the upper left acquires the ability to ride a bicycleby observing an adult. This is the process of imitation learning in humans.Correspondingly, the upper right robot acquires skills by training data. Thisis the process of imitation learning in robots. The bottom left child not onlyacquires bicycling skills by observing an adult, but also gets helps from anadult. This makes his learning more efficient. Inspired by this, in this work,FIL enables the bottom right robot not only acquires skills by training data,but also gets knowledge from other robots through the cloud robotic system.

The cloud fuses knowledge which is from local robots.However, data heterogeneity hinders the process. This issueis generally regarded as a major challenge of cloud roboticsystems [5]. To overcome this issue, a novel framework namedFIL has been proposed. It increases imitation learning oflocal robots in heterogeneous data condition. As shown inFig.1, it is inspired by the case that humans can learn moreeffectively if they have external guidance. With FIL, a robot iscapable of taking advantage of knowledge from other robots.The student/teacher imitation learning in local robots will beincrseased with the guide model provided by FIL. To evaluateit, we conduct an autonomous driving task. Experimental resultindicates that FIL enables robots to absorb knowledge fromother robots and apply the knowledge to increase imitationlearning efficiency and accuracy. Videos of the results canbe found on the supplementary website1. Overall, this papermakes the following contributions:

• We present a novel framework named FIL. It provides aknowledge fusion mechanism for cloud robotic systems.

• We propose a knowledge fusion algorithm in FIL. Itenables the cloud to fuse knowledge from local robots andgenerate guide models for robots with service requests.

• Based on transfer learning, we present a knowledge trans-

1The video is available at https://sites.google.com/view/federated-imitation

arX

iv:1

912.

1220

4v1

[cs

.RO

] 2

4 D

ec 2

019

fer scheme to facilitate local robots acquiring knowledgefrom the cloud.

II. RELATED WORK

In this work, FIL is proposed to increase the learningefficiency of each local robot in the cloud robotic system. Thecloud generates a shared model and provides guidance servicesfor each local robot. Methods that can achieve similar goalsinclude transfer learning and meta-learning. Compared withthese two approaches, FIL achieves heterogeneous knowledgefusion without raw data sharing. The related work are intro-duced as follows.

A. Transfer learning

Transfer learning aims at improving the performance oftarget learners on target domains by transferring the knowledgecontained in different but related source domains, which issimilar to the goal of FIL. In recent years, transfer learning re-search communities are mainly focused on deep transfer learn-ing [6]. The techniques used in deep transfer learning includefour categories: instances-based transfer learning, mapping-based transfer learning, network-based transfer learning, andadversarial based transfer learning [7].

The above approaches have made efforts for knowledgetransferring between domains. Unfortunately, the differencebetween domains in a cloud robotic system can be large.Sometimes the data are collected with different kinds of sen-sors. Therefore, the datasets of local robots in the cloud roboticsystem may be heterogeneous. It is not possible to directlyuse the transfer learning technology. Therefore, the proposedframework first fuses heterogeneous knowledge rather thanusing transfer learning technology directly.

B. Meta-learning

Meta-learning and the proposed framework have the sameultimate aim. Meta-learning, or learning to learn, is the scienceof systematically observing how different machine learningapproaches perform on a wide range of learning tasks, andthen learning from this experience, or meta-data, to learn newtasks much faster than otherwise possible [8]. Applications ofmeta-learning in robotics have achieved good results. Amongthese, One-Shot Visual Imitation Learning [9] and Domain-Adaptive Meta Learning (DAML) [10] by Abbeel’s lab arerepresentative. Compared with the former that can enablerobots to perform imitation learning from action videos ofrobots, DAML is an improved approach, which enables therobot to imitate human actions directly. DAML enables arobot learning to visually recognize and manipulate a newobject after observing just one video demonstration froma human user. To enable this, DAML uses a meta-trainingphase where it acquires a rich prior over human imitation,using both human and robot demonstrations involving otherobjects. DAML extends a prior meta-learning approach toallow for learning cross domain correspondences and includesa temporal adaptation loss function.

However, the phase of meta-training is essential in DAMLor other meta-learning approaches. It means that we have toobtain data of all local robots because the step of calculating

loss to update gradients requires all task data (correspondinglocal data in the cloud robotic system). It is an impossible taskconsidering limited communication. The more effect of meta-learning requires more data and a higher-capacity model tosupport. Nevertheless, it is difficult for the cloud to get all thedata of local robots. Therefore, meta-learning approaches areunsuitable for the cloud robotic system in this work, althoughthey may get higher accuracy.

C. Knowledge sharing in cloud robotic system

Knowledge sharing is the key in the work, which meanslocal robots share knowldge with each other. For this purpose,[11] presented a mental simulation service for the existingOPENEASE cloud engine. With this service available, agentscan delegate computationally-expensive simulations to morepowerful computing services. In addition, they created a SWI-Prolog library so that the developers and robots can describethe world state, self abilities and the problem. [12] proposeda framework for robots to reason on a central knowledge basewhich has ontologies and execution logs from other robots.This approach achieves knowledge exchanging with cloudrobotic. [13] presented a platform that supports large-scaleon-demand data collection for robot. All of the above threeapproaches can realize the fusion of local knowledge. Unfor-tunately, they can’t fuse the knowledge from heterogeneousdata, which is addressed by this work.

III. METHODOLOGY

In this section, we will present the details of the pro-posed framework and algorithms, which includes knowledgeacquiring technology, framework of FIL, knowledge fusionalgorithm and knowledge transferring algorithm.

A. Knowledge acquiring by imitation learning

Local robots acquire knowledge through imitation learningin FIL. Imitation learning is commonly posed either as be-havioural cloning [14] or as inverse reinforcement learning[15], both of which require demonstrations. Imitation learninghas empowered recent advances in learning robotic manip-ulation tasks by addressing shortcomings of reinforcementlearning such as exploration [16] and reward specification [17].The knowledge acquiring approach used in FIL of local robotsbelongs to behavioural cloning, which focuses on learning theexperts policy using supervised learning. The way behaviouralcloning works is quite simple. Given demonstrations of robots,we will divide these into state-action pairs. We treat these pairsas i.i.d. examples and finally, we apply supervised learning.

B. Framework of FIL

The framework of FIL is performed in Cloud-Robot-Environment setup. There are local robots, cloud servers,communication services and computing device. Local robotslearn skills through imitation learning and the cloud serverfuses knowledge. We develop a federated learning algorithmto fuse private models into the shared model in the cloud. Withthe shared model, the cloud server is capable of generatingguide models corresponding to requests of local robots. Afterthat, the local robots perform transfer learning based on the

guide model. Finally, the final policy will be quickly obtained.As illustrated in Fig.2, we will explain the methodology in FILwith the example of a self-driving task.

Policy

Netw

ork

Training RGB images Driving through imitation learning

Company A

Policy

Netw

ork

Training depth images Driving through imitation learning

Company B

Policy

Netw

ork

Driving through imitation learning

Company C

Training semantic images

Cloud

Transmit without raw data

Policy Network

Training RGB images


Agent A

Policy Network

Training depth images


Agent B

Policy Network

Training semantic images


Agent C

Cloud

Training RGB images

Local agent

Guide Model

Pol

icy

Net

wor

kTransfer Learning

Fig. 2. FIL framework. The work assumes that there are three agents to fulfilthe task. They perform imitation learning to acquire the policy models withheterogeneous data: RGB images, depth images and semantic segmentationimages. Neither can the raw training data be shared between agents norbetween agents and clouds. Only the parameters of the policy models willbe uploaded and fused, and then the cloud will provide guide models whilerobots request.

Based on the self-driving task, we typically collected threetypes of data. The three agents use three different types ofdataset separately. Datasets of local robots are labeled butwill not be sent to the cloud. Three different policy modelswill obtained by local training. RGB images will be trainedby Agent A, and a private policy model (Private Model A)will be obtained. Actions of robots will be determined by theoutput which might be some actions or parameters. Similarprocesses occur in agent B and agent C. Outputs of thethree private models are with same types. The inputs to eachmodel are different: RGB images, depth images, and semanticsegmentation images. Then the parameters of all three modelswill be uploaded to the cloud and fused there. Henceforth, thecloud will be capable of generating guide models for differenttypes of input. When a local robot requests a service, the cloudwill provide a guide model in correspondence with the typeof sensor data. FIL can be performed either online or offline.As presented in Algorithm 1, the whole framework can besummarized as following steps:• Step1: Imitation learning performed by local robots;• Step2: Parameters (of private models) Transmitting;• Step3: Fusing (in the cloud) knowledge;• Step4: Responding to the local requests and generating

guide models for them.Noted that step1 and step 2 are simultaneous while FILperforms online. Labels of the cloud data will be updatedsimultaneously.

Algorithm 1: Processing Algorithm in FILInitialize action-value Q-network with random weights θ ;Input: n: number of local robots ; q: update frequency.while cloud server is running do

d θnt ← robotn performs imitation learningif t%q == 0 then

for i = 0; i < n; i++ doSend θit to the cloud;

endlabels=fuse(θ1t , θ2t ,· · · ,θnt )

endif service request=True then

Generate θcloud base on labels; Send θcloud tolocal robots;

θlocal=transfer(θcloud).end

end

C. Knowledge fusion algorithm in the cloud

Knowledge fusion has been a focus of many approachessince the end of the 20th century. These studies focus onthe construction of knowledge bases [18] [19] or knowledgerepresentations [20] [21]. However, the above approaches thatdefine the knowledge representation of the local robots areunsuitable for cloud robotic systems. We cannot determinehow local robots obtain data and express knowledge. In ourwork, the proposed framework aims to make local robotlearn efficiently and smart. In another word, it is one of thedemonstrations of robot lifelong learning.

Existing machine learning approaches fuse knowledge bycentralizing storage of training data. Consequently, these ap-proaches are unattainable in cloud robotic systems with largescale local datasets considering the limited communications.In [22], the federated learning system was introduced. Therein,the mobile devices perform computation of model traininglocally on their training data according to the model releasedby the model owner. Such design enables mobile device tocollaboratively learn a shared model while keeping all thetraining data on the device. Similarly, multi-sensor data fusionis required by many robotic tasks, such as the navigation ofself-driving cars, mobile robot SLAM. But it is impossibleto upload all kinds of sensor data to the cloud. Therefore,we propose a federated imitation learning framework in cloudrobotic systems to improve the ability of robots. In theproposed framework and algorithm, there is no need to uploadraw sensor data to the cloud, only the parameters of the modelsare shared in the cloud.

Fig.3 presents the knowledge fusion algorithm. Primaryresponsibility of the cloud is to label scenes. Before this,some data collecting for cloud training is necessary, and thetypes of these data should at least cover the types of localdatasets. For example, in the self-driving task of this example,the RGB images, depth images and semantic segmentationimages should be collected for the cloud. Thus, one scenehas three types of sensor data. Each has a correspondinguploaded private model which provide its own suggestion

Scene 1 Scene 2 Scene 3 Scene n

…

…

…

…

Steer 11

Steer 21

Steer n1

Steer 1m

Steer 2m

Steer nm

Output Matrix (knowledge storage)

… …

…

…

…

Knowledge Fusion

Steer =

Label 1Scene 1 Label 2Scene 2 … Label nScene n

Cloud policy

for RGBSteer 16 Steer 26 Steer n6…

Label=Calculate(Steer 1, Steer 2,…

Steer n)

Steer 1

Steer 2

Steer n

…=

The dataset in the cloud

Cloud

policies

Private policy

for RGBSteer 15 Steer 25 Steer n5…

Cloud policy

for depthSteer 16 Steer 26 Steer n6…

Private policy

for depthSteer 15 Steer 25 Steer n5…

Cloud policy

for seg. Steer 16 Steer 26 Steer n6…

Private policy

for seg. Steer 15 Steer 25 Steer n5…

Fig. 3. Knowledge fusion algorithm deployed on the cloud in a self-driving case. Agents obtain private models by performing imitation learning. The cloudstored different types of sensor data of many scenes. Input corresponding sensor data in the cloud dataset to private models. Then calculate the numericalcharacteristics of private models outputs to label scene. Noted that these data is not uploaded from local but collected by the cloud, and the cloud dataset ismuch larger than each local dataset. With multi types of sensor data, the cloud is capable of generating guide models corresponding to the sensor type of thelocal robot.

of labelling to the cloud. And then, these suggestions willbe congregated to produce a final label for this scene. Thecalculation approach of labelling can draw on some methods ofensemble learning which can be defined according to differentapplication scenarios, in this case, we choose the medianof outputs. As private models will output the steer of theagent. Agents usually make decisions of following the road,turning, obstacle avoiding, etc. Generally speaking, the outputof the model includes two extreme cases: turning and noturning, where errors occur most. While Median can avoidextreme values in the evaluation. For example, if there were5 local models and outputs of them for one scene are: -0.1,-0.3, 0.4, 0.4, 0.5. We will take 0.4 as the label of currentscene. As data being labeled, cloud models will be trainedimmediately. Formula (1) to Formula (5) have summarizedthe whole process:

RempDi

(θ) =1N

N

∑n=1

L(

yi(n), f

(xi(n);θ

))(1)

In the Formula (1), Di ={(

x(n),y(n))}N

n=1is the dataset of

the local robot i. L represents the loss function.θ repensentsparameters of models. xi are original data and yi are labels. Nis the number of samples of the dataset. Remp

Direpresents the

empirical risk in training sets.

θ∗i = argmin

θ

RstructD (θ) (2)

The Formula (2) presents training targets of local robots. Itis the structure risk minimization criteria. Rstruct

D representsstructure risk in datasets.

lin = fθ i (scenein) (3)

In the above formula, scenein is the training data in the cloud.i represents sensor types, n represents the number.

Mein = Median(lin) (4)

θ∗i(cloud) = argmin

θ

1M

M

∑n=1

L(

Mein, f(

scene(n)in ;θ

))+

12

λ‖θ‖2

(5)

In Formula (4) and (5), Mein is the median of lin, θ ∗i(cloud) isthe training targets in the cloud. M represents the number ofsample data. θ is the regularization term of L2 norm, whichis used to reduce the parameter space and avoid over-fitting,lambda is used to control the intensity of regularization. yis true labels in the dataset. l means the prediction from themodel we trained. ”in” means the n-th scene of the i-th typeof data.

Noted that we only use the shared model in the cloud as aguide model for local robots. The shared model maintainedin the cloud is a cautious policy model, which means itwill not make serious mistakes in some private unstructuredenvironments but the action might not be the best. Thus, it isnecessary for every local robot to train its own policy modelbased on the shared model received from the cloud. This isthe transfer learning process in FIL.

D. Transfer the shared model

There have been a lot of valuable studies on transfer learn-ing. Current research mainly focuses on transferring throughthe relationship between source domain distribution and targetdomain distribution. This method is unsuitable for cloudrobotic systems because it requires raw data of local robots.Under the constraints of the above conditions, Layer Transferis the transferable learning method that can be implemented.Layer Transfer means that some layers in the model trained bysource data are copied directly and the remaining layers aretrained by target data. The advantage of this is that target dataonly needs fewer parameters to be trained, thus avoiding over-fitting. It has faster training speed and higher accuracy. Ondifferent tasks, the layers that need to be transferred are oftendifferent. For example, in speech recognition, we usually copythe last layers and retrain the first layers. This is because thefirst layers of speech recognition neural network are the way torecognize the speaker’s pronunciation, and the last layers arethe recognition. The latter layers have nothing to do with thespeaker. In image recognition, we usually copy the front layersand retrain the back layers. This is because the first layers ofthe image recognition neural network are to identify whether

there is a basic geometric structure, so it can be transferred.The latter layers are often abstract and cannot be transferred.So, which layers to be transferred are case by case.

Cloud model

Fig. 4. A transfer learning approach of FIL

As presented in Fig.4, in the work, we use front layersas feature extractors in the case of imitation learning. Thedecision model in the cloud can be used as the initial model forlocal training. In this way, cloud model can play a guiding role.It can speed up local training and increase the accuracy of localrobots. In the training of local robots, the feature extractionlayer is frozen and only the full connection layers are trained.If necessary, it can also adjust the relevant parameters in theprocess of back propagation. For example, some work mayincrease learning rate. After transfer learning, local robotssuccessfully utilize knowledge from other robots in cloudrobotic systems.

IV. EXPERIMENTS

In this section, we will present our experimental setupand the results. To verify the effectiveness of FIL, we haveto answer two questions: 1) Is FIL capable of generatingan effective shared model based on shared knowledge incloud robotic systems? 2) have the shared model of FILimproved the learning process or accuracy of local robots?To answer the first question, we have conducted experimentsto generate cloud models and compare its performance withgeneral models. To answer the second question, we then haveconducted experiments to compare the learning process andaccuracy of local robots with FIL and without FIL.

A. Experimental setup

Self-driving car can be regarded as an advanced robot. Soit is enough to use cars to verify robot control algorithms. Inthe work, we have used Microsoft AirSim and CARLA as oursimulator to evaluate the presented approach. In addition tohave high-quality environments with realistic vehicle physics,AirSim and CARLA have a python API which allows for easydata collection and control.

In order to collect training data, a human driver is presentedwith a first-person view of the environment (central camera).The simulated vehicle is controled using the keyboard bythe driver. The car should be kept at a speed around 6m/s,collisions with cars or pedestrians should strive to be avoided,but traffic lights and stop signs will not be considered. AsFig.5 presents, we have used three different types of sensordata: RGB images, depth images and semantic segmentationimages. Any of these three types of sensor data can make

(a) Data collection for local

Data collection for local robots (Three different environments corresponding to three different types of data)

Data collection for the cloud (Collect data simultaneously in may environments)

(b) Data collection for the cloud

Fig. 5. As presented in subfigure (a), we collected training data for localrobots in three different environments corresponding to three different typesof data. As presented in subfigure (b), we collected different types butsimultaneous data in many different environments.

the agent to perform obstacle avoidance tasks within tolerableerrors. The policy network mainly consists of convolutionlayers and fully connected layers. Then a linear layer followedby a softmax, and the value function by a linear layer. Itsarchitecture presented in Fig. 6 is similar to VGG-16.

64 64144

256

conv1

64 64112

conv2

256 256 256

56

conv3

512 512 51228

conv4

14512 512 512

conv51

100

fc6

150

fc7

1 10

fc8

1

steer

Fig. 6. Network architecture of imitation learning in experiments

As for the cloud robotic systems, the server and agentcommunicate via HTTP requests, with the data server utilizedthe Django framework to respond to asynchronous requests.Our cloud programs run on the Microsoft Azure cloud. Weconduct local robot experiments with a single NVIDIA QuadroRTX 6000, which allows us to run our simulator to receivephoto-realistic images for training.

B. Evaluation for the shared-model generating method in FIL

In this section, the robot (car) will challenge the taskssuch as avoiding collisions and making timely turns. Theobservations (images) are recorded by one central camera. Therecorded control signal is the steering angle. The steering angleis scaled between -1 and 1, with extreme values correspondingto full left and full right, respectively. Considering the actualdriving, we transfer the steering angle between -0.69 radiansand 0.69 radians.

Once the three private policy networks been trained, thecloud will work to fuse their knowledge. As mentioned before,we assume that there are three companies train their policiesby imitation learning with heterogeneous sensor data. Sharingthe training data between agents or sending the raw datato the cloud is forbidden. So, the cloud server only getsthe parameters of the three local networks and performs theknowledge fusion algorithm. Different types of sensor datais collected simultaneously from different environments in

Straight

Turns

Local policy trained

by RGB Images

Policy generated in

cloud for rgb images

Local policy trained

by depth images

Policy generated in

cloud for depth images

Local policy trained by

segmentation Images

Policy generated in cloud

for segmentation images

Fig. 7. Performance comparison of shared models and original local models. The red lines indicate the steering angles output by the policy model. The bluelines indicate the steering angles of human driving. Performance of models are mainly reflected in turning and going straight. There are two rows in the table,the first row is performance of each model in turning task, the second row is the performance of each model in going straight task. The agents are not trainedseparately in these two tasks because the output of the policy model is the steering angle rather than actions.

CARLA and Airsim before that. Then every scene in cloudwill be labelled. The cloud generates local policy network 1for company 1 based on RGB images, local policy network 2for company 2 based on semantic segmentation images, andlocal policy network 3 for company 3 based on depth images.Then the parameters of these three private networks will beuploaded to the cloud. Finally, the cloud gets labels of datasetsin the cloud. It will generate policy networks correspondingto different sensor requests. In this experiment, three policynetworks will be generated. They are named cloud policies.The process in the cloud is unsupervised learning. There is nomanual labels but cloud generation. For evaluation, we labeledsome scenes to mark the result. In the simulation environment,the controller uses the policy network to control the robot.

TABLE IRESULTS OF LOCAL POLICIES AND CLOUD POLICIES.

Controller Hit theobstacle

Missturns

Mistakesin straight

Local controller for RGB images 3.45% 12% 16.67%Cloud controller for RGB images 0.69% 0 0Local controller for depth images 0 20% 0Cloud controller for depth images 0 4% 0

Local controller for segmentation images 0 12% 6.67%Cloud controller for segmentation images 0 4% 0

Fig. 7 presents results of these six policy networks in somemain challenging scenes to the car. From Fig.7, it is clear thatthe cloud model presents higher accuracy than local models.So that it can avoid errors of local models training from singletraining set collected by one type of sensors. We conducted3 experiments, each one sets a different starting point. Thenwe evaluated the performance of robots in obstacle avoidance,turning and straight forward tasks. The results are summarizedin Table 1. It can be seen from the experimental results that thecloud knowledge improves the local controller that is trainedusing general imitation learning. The controller based on cloudpolicies performs better. Especially the controller for RGBimages.

C. Evaluation for the knowledge-transfer ability of FIL

We conducted the experiment as illustrated in Fig. 8 toevaluate the knowledge-transfer ability in FIL. As presented

avoid obstacle

go straight

make a turn

Fig. 8. Challenges in the self-driving task

in Fig. 9, we compare the six models in a neighborhoodenvironment. Corresponding to every type of training data, weobtained a pair of policies: a transferred policy and a generalpolicy. So, there are three pairs of policies generated in theexperiment. The performance of controllers based on thesepolicies in key challenging tasks are presented in Fig. 10. Theresults are summarized in Table 2. From the results, we cansee that the imitation learning models obtained in cloud roboticsystem perform significantly better in accuracy, compared withgeneral models that trained by traditional imitation learningwithout shared knowledge. FIL improves the training processof imitation learning with the help of shared knowledge.There is a pre-trained model from the cloud for transfer inlocal imitation learning. So, there is no need for local robotsto learn from scratch. We present the comparison of trainprocess in Fig 10. From the figure, we can see that thetransferred policies have lower error starting point and theerror value. Local policy models transferred by FIL also havebetter generalization. Controllers based on transferred policiesfrom FIL perform better in different weather compared withpolicies trained by general imitation learning. As presentedin Fig.12, we conducted the experiments for controllers indifferent weathers. The results are presented in the last threerows of the Table 2. The results showed that the model fromFIL could improve the accuracy of the controller from generalimitation learning in bad weather.

V. CONCLUSION

In this work, we propose an imitation learning frameworkfor cloud robotic systems with heterogeneous sensor data shar-ing, named Federated Imitation Learning (FIL). FIL is capable

Generated policy for RGB

Generated policy for

depth

Generated policy for

segmentation

Transfered policy for RGB

Transfered policy for

depth

Transfered policy for

segmentation

VS

VS

VS

Imitation policy for RGB

Imitation policy for depth

Imitation policy for

segmentation

Private policy for Environment 1

with rgb images

Environment 1

Environment 2

Environment 3


with depth images


with segmentation images

Fig. 9. Procedure of the experiment for evaluating knowledge-transfer ability. Firstly, we performed imitation learning in three different environments withthree types of sensor data (RGB images, depth images and semantic segmentation images) and got three policies. Secondly, the parameters of three privatepolicies were sent to the cloud. The cloud generated labels of its own dataset. Here is something to explained is that there are semantic segmentation differencebetween CARLA and Airsim, so there is a unifying process for semantic segmentation images. Thirdly, three policies corresponding to different sensor datawere generated. After that, we performed transferring learning with corresponded type of images and obtained transferred policies. For comparison, we alsoconduct the general imitation learning experiment on the right part of the image. Finally, we compare performances of these policies in a new environment(neighborhood in Airsim).

Avoid cars

in a distance

Straight

Avoid

nearby cars

Along the

road

Turn

Standard policy

for RGB

Transferred

policy for RGB

Transferred policy

for depthStandard policy for

segmentation

Transferred policy for

segmentation

Standard policy

for depth

Fig. 10. Performance comparison of general policies and transferred policies. The red lines indicate the steering angles output by the general policies. Theblue lines indicate the steering angles of human driving.

TABLE IIPERFORMANCE COMPARISON OF STANDARD CONTROLLERS AND TRANSFERRED CONTROLLERS IN DIFFERENT BAD WEATHER CONDITIONS

Controller Error rate in normal Error rate in rain Error rate in snow Error rate in fog Error rate in dust

Standard controller for RGB images 17.39% 26.09% 30.43% 34.78% 52.17%Transferred controller for RGB images 4.35% 8.70% 17.39% 31.82% 39.13%Standard controller for depth images 13.04% 4.35% 8.70% 8.70% 17.39%

Transferred controller for depth images 10.87% 4.35% 8.70% 8.70% 15.22%Standard controller for segmentation images 2.17% 2.17% 6.52% 21.74% 36.36%

Transferred controller for segmentation images 2.17% 2.17% 5.43% 17.39% 31.82%

Bold values are winning results

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

error rate

1 11 21 31 41 51 61

standard_val_loss transferred_val_loss

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 11 21 31 41 51 61


0

0.01

0.02

0.03

0.04

0.05

0.06

1 11 21 31 41 51 61


Comparison of RGB imagesComparison of depth imagesComparison of segmentation images

error rate error rate

StepsSteps Steps

Fig. 11. Training process comparison of general learning and transfer learning in FIL

(a) Rain (b) Snow (c) Fog (d) Dust

Fig. 12. Bad weather conditions, rain, snow, fog, sand and dust. Weconducted data collection and comparison experiments in these environments

of improving imitation learning efficiency and accuracy of lo-cal robots by taking advantage of knowledge from other robotsin the cloud robotic system. Additionally, we propose theknowledge fusion algorithm and introduce a transfer method inFIL. Our approach is able to fuse heterogeneous knowledge oflocal robots. Finally, the framework and algorithms are verifiedin a self-driving task.

For future work, we expect to research the scalabilityproblem of the platform. How to scale FIL if there are moreautonomous cars or more types of sensor data? Although FILis capable of dealing this issue by simpily expanding the clouddataset, further work on convergence justification of the fusionprocess is needed.

REFERENCES

[1] K. Mlling, J. Kober, O. Kroemer, and J. Peters, “Learning to select andgeneralize striking movements in robot table tennis,” The InternationalJournal of Robotics Research, vol. 32, no. 3, pp. 263-279, 2013.

[2] M. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, and B.Boots, “Towards robust skill generalization: Unifying learning fromdemonstration and motion planning,” in International Conference onRobot Learning, 2018.

[3] T. Zhang et al., “Deep imitation learning for complex manipulation tasksfrom virtual reality teleoperation,” in IEEE International Conference onRobotics and Automation (ICRA), 2018, pp. 18.

[4] B. Liu, L. Wang, M. Liu, and C. Xu, “Lifelong federated reinforcementlearning: a learning architecture for navigation in cloud robotic systems,”IEEE Robotics and Automation Letters (RA-L), vol. 4, no. 4, pp. 4555- 4562, 2019.

[5] Robotics VO. A Roadmap for U.S. Robotics: From Internet to Robotics.Accessed online October 23, 2015.

[6] Chuanqi Tan, Fuchun Sun, Tao Kong, et al. A Survey on Deep TransferLearning, 27th International Conference on Artificial Neural Networks,pp. 270-279, 2018.

[7] Zhuang, Fuzhen, et al. “A Comprehensive Survey on Transfer Learning,”arXiv preprint, arXiv:1911.02685, 2019.

[8] Joaquin Vanschoren. “Meta Learning: A survey,” arXiv preprint,arXiv:1911.02685, 2019.

[9] Chelsea Finn, Tianhe Yu, Tianhao Zhang et al. “One-Shot VisualImitation Learning via Meta-Learning,” 1st Annual Conference on RobotLearning, pp.357-368, 2017.

[10] Tianhe Yu, Pieter Abbeel, Sergey Levine et al. “One-Shot Hierarchi-cal Imitation Learning of Compound Visuomotor Tasks,” InternationalConference on Intelligent Robots and Systems (IROS), 2019.

[11] Asil Kaan Bozcuoglu, Gayane Kazhoyan, Yuki Furuta, Simon Stelter,Michael Beetz, Kei Okada, Masayuki Inaba, ”The Exchange of Knowl-edge using Cloud Robotics”, IEEE Robotics and Automation Letters,vol.3, no. 2, pp. 1072-1079, 2018.

[12] Ajay Mandlekar, Yuke Zhu, Animesh Garg et al. “ROBOTURK: ACrowdsourcing Platform for Robotic Skill Learning through Imitation,”Conference on Robot Learning, pp.879-893, 2018.

[13] Asil Kaan Bozcuoglu, Michael Beetz, “A Cloud Service for RoboticMental Simulations”, International Conference on Robotics and Automa-tion (ICRA), pp.26532658, 2017.

[14] F. Codevilla, M. Miiller, A. Lpez, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in IEEE InternationalConference on Robotics and Automation (ICRA), 2018, pp. 19.

[15] Arora S, Doshi P. “A survey of inverse reinforcement learning: Chal-lenges, methods and progress,” arXiv preprint arXiv:1806.06877, 2018.

[16] Nair, Ashvin, et al. “Overcoming exploration in reinforcement learningwith demonstrations.” IEEE International Conference on Robotics andAutomation (ICRA), 2018.

[17] Sermanet, Pierre, Kelvin Xu, and Sergey Levine. “Unsupervised per-ceptual rewards for imitation learning,” Robotics: Science and Systems(RSS), 2017.

[18] Dong X, Gabrilovich E, Heitz G, et al. “Knowledge vault: A web-scale approach to probabilistic knowledge fusion,” 20th ACM SIGKDDinternational conference on Knowledge discovery and data mining.pp.601-610, 2014.

[19] Premaratne K, Dewasurendra D A, Bauer P H. “Evidence combinationin an environment with heterogeneous sources,” IEEE Transactions onSystems, Man, and Cybernetics-Part A: Systems and Humans, Vol.37,No.3, pp.298-309, 2007.

[20] Zhang Y, Saberi M, Chang E. “A semantic-based knowledge fu-sion model for solution-oriented information network development: acase study in intrusion detection field,” Scientometrics, Vol.117, No.2,pp.857-886, 2018.

[21] Ruta, Michele, et al. “A knowledge fusion approach for context aware-ness in vehicular networks,” IEEE Internet of Things Journal, Vol.5,No.4, pp.2407-2419, 2018.

[22] H. Brendan McMahan, Eider Moore, Daniel Ramage et al. “FederatedLearning of Deep Networks using Model Averaging,” InternationalConference on Artificial Intelligence and Statistics (AISTATS), JMLR:W&CP volume 54, 2017.

http://arxiv.org/abs/1911.02685



federated imitation learning: a novel framework for cloud … · 2019-12-30 · complex skills...

Documents