self-supervised generalisation with meta auxiliary learningmeta auxiliary learning (maxl),...

Self-Supervised Generalisation with Meta Auxiliary Learning

Shikun Liu 1 Andrew J. Davison 1 Edward Johns 1

AbstractLearning with auxiliary tasks has been shownto improve the generalisation of a primary task.However, this comes at the cost of manually-labelling additional tasks which may, or may not,be useful for the primary task. We propose a newmethod which automatically learns labels for anauxiliary task, such that any supervised learningtask can be improved without requiring accessto additional data. The approach is to train twoneural networks: a label-generation network topredict the auxiliary labels, and a multi-task net-work to train the primary task alongside the aux-iliary task. The loss for the label-generation net-work incorporates the multi-task network’s per-formance, and so this interaction between thetwo networks can be seen as a form of metalearning. We show that our proposed method,Meta AuXiliary Learning (MAXL), outperformssingle-task learning on 7 image datasets by asignificant margin, without requiring additionalauxiliary labels. We also show that MAXL out-performs several other baselines for generatingauxiliary labels, and is even competitive whencompared with human-defined auxiliary labels.The self-supervised nature of our method leadsto a promising new direction towards automatedgeneralisation. The source code is available athttps://github.com/lorenmt/maxl.

1. IntroductionAuxiliary learning is a method to improve the generalisa-tion ability of a primary task, by training on additional aux-iliary tasks alongside this primary task. If the auxiliarytasks and the primary task are based on similar represen-tations of data, then the sharing of learned features acrossthe tasks results in additional relevant features, which oth-erwise would not have been learned from training only onthe primary task. The broader support of these features,across new interpretations of input data, allows for better

1Department of Computing, Imperial College London. Corre-spondence to: Shikun Liu <[email protected]>.

Input Data NeuralNetwork

Primary TaskPrediction

Auxiliary TaskPrediction

Grou

ndTruth

Label

sdog

Generat

edLabel

s

Figure 1. Illustration of our proposed MAXL framework. We aimto automatically generate auxiliary task labels which maximisethe performance of a primary task.

generalisation of the primary task to unseen data. Auxiliarylearning is similar to multi-task learning (Caruana, 1998),except that only the performance of the primary task is ofimportance, and the auxiliary tasks are included purely toassist the primary task.

We now rethink this generalisation by considering that, fora particular primary task, not all auxiliary tasks are createdequal. In supervised auxiliary learning (Liebel & Korner,2018; Toshniwal et al., 2017), auxiliary tasks can be manu-ally chosen to complement the primary task. However, thisrequires both domain knowledge to choose the auxiliarytasks, and labelled data to train the auxiliary tasks. Unsu-pervised auxiliary learning (Flynn et al., 2016; Zhou et al.,2017; Zhang et al., 2018; Jaderberg et al., 2017) removesthe need for labelled data, but at the expense of a limitedset of auxiliary tasks which may not be beneficial for theprimary task. By combining the merits of both supervisedand unsupervised auxiliary learning, the ideal frameworkwould be one with the flexibility to automatically deter-mine the optimal auxiliary tasks, but without the require-ment of manually-labelled data for these auxiliary tasks.

In this paper, we propose to achieve such a framework witha simple and general meta-learning algorithm, which wecall Meta AuXiliary Learning (MAXL). We first observethat in supervised learning, defining a task can equate todefining the labels for that task. Therefore, for a given pri-mary task, an optimal auxiliary task is one which has opti-mal labels. The goal of MAXL is to automatically discover

arX

iv:1

901.

0893

3v1

[cs

.LG

] 2

5 Ja

n 20

19

https://github.com/lorenmt/maxl


these auxiliary labels using only the labels for the primarytask, by progressively refining the labels to maximise theperformance of the primary task.

The approach is to train two neural networks. First, a multi-task network, which trains the primary task and the auxil-iary task, as in standard auxiliary learning, and second, alabel-generation network, which predicts the labels for theauxiliary task. The key idea behind MAXL is to then usethe performance of the primary task, when trained along-side the auxiliary labels in one iteration, to improve theauxiliary labels for the next iteration. This is achieved bydefining the loss for the label-generation network as a func-tion of the multi-task network’s performance on primarytask training data. The two networks are therefore tightlycoupled and trained end-to-end, and since the learning ofthe auxiliary labels is guided by the learning performanceof the primary task, the interaction between the two net-works can be seen as a form of meta learning.

In our experiments on image classification, we show threekey results. First, MAXL outperforms single-task learningacross seven image datasets, without requiring manually-defined auxiliary labels. Second, MAXL outperformsa number of baseline methods for creating auxiliary la-bels. Third, when the manually-defined hierarchical la-bels of CIFAR-100 are used as auxiliary labels (e.g. ifthe primary task predicts “Dog”, the auxiliary task predicts“Labrador”), MAXL is at least as competitive, despite us-ing no human knowledge to create its auxiliary labels. Thislast result is the most important, and shows that MAXLis able to remove the need for manual-labelling of auxil-iary tasks. This brings the advantages of auxiliary learningto new datasets previously not compatible with auxiliarylearning, due to the lack of auxiliary labels.

2. Related WorkThis work brings together ideas from a number of relatedareas of machine learning.

Multi-task & Transfer Learning The aim of multi-tasklearning (MTL) is to achieve shared representations by si-multaneously training a set of related learning tasks. In thiscase, the learned knowledge used to share across domainsis encoded into the feature representations to improve per-formance of each individual task, since knowledge distilledfrom related tasks are interdependent. The success of deepneural networks has led to some recent methods advanc-ing the multi-task architecture design, such as applying alinear combination of task-specific features (Misra et al.,2016; Doersch & Zisserman, 2017; Kokkinos, 2017). (Liuet al., 2018) applied soft-attention modules as feature selec-tors, allowing learning of both task-shared and task-specificfeatures in an end-to-end manner. Transfer learning is an-

other common approach to improve generalisation, by in-corporating knowledge learned from one or more relateddomains. Pre-training a model with a large-scale datasetsuch as ImageNet (Deng et al., 2009) has become a stan-dard practise in many vision-based applications.

Auxiliary Learning Whilst in multi-task learning thegoal is high test accuracy across all tasks, auxiliary learningdiffers in that high test accuracy is only required for a sin-gle primary task, and the role of the auxiliary tasks is to as-sist in generalisation of this primary task. Applying relatedlearning tasks is one straightforward approach to assist pri-mary tasks. (Toshniwal et al., 2017) applied auxiliary su-pervision with phoneme recognition at intermediate low-level representations to improve the performance of conver-sational speech recognition. (Liebel & Korner, 2018) choseauxiliary tasks which can be obtained with low effort, suchas global descriptions of a scene, to boost the performancefor single scene depth estimation and semantic segmenta-tion. By carefully choosing a pair of learning tasks, we mayalso perform auxiliary learning without ground truth labels,in an unsupervised manner. (Jaderberg et al., 2017) intro-duced a method for improving the learning agents in Atarigames, by building unsupervised auxiliary tasks to predictthe onset of immediate rewards from a short historical con-text. (Flynn et al., 2016; Zhou et al., 2017) proposed im-age synthesis networks to perform unsupervised monoculardepth estimation by predicting the relative pose of multiplecameras. A concurrent work from (Du et al., 2018) pro-posed to use cosine similarity as an adaptive task weight-ing to determine when a defined auxiliary task is useful.Differing from these works which require prior knowledgeto manually define suitable auxiliary tasks, our proposedmethod requires no additional task knowledge, since it gen-erates useful auxiliary knowledge in a purely unsupervisedfashion. The most similar work to ours is (Zhang et al.,2018), in which meta learning was used in auxiliary data se-lection. However, this still requires manually-labelled datafrom which these selections are made, whilst our method isable to generate auxiliary data from scratch.

Meta Learning Meta learning (or learning to learn) aimsto induce the learning algorithm itself. Early works inmeta learning explored automatically learning update rulesfor neural models (Bengio et al., 1990; 1992; Schmidhu-ber, 1992). Recent approaches have focussed on learn-ing optimisers for deep networks based on LSTMs (Ravi& Larochelle, 2016) or synthetic gradients (Andrychow-icz et al., 2016; Jaderberg et al., 2016). Meta learning hasalso been studied for finding optimal hyper-parameters (Liet al., 2017) and a good initialisation for few-shot learning(Finn et al., 2017). (Santoro et al., 2016) also investigatedfew shot learning via an external memory module. (Vinyalset al., 2016; Snell et al., 2017) realised few shot learning


in the instance space via a differentiable nearest-neighbourapproach. Related to meta learning, our framework is de-signed to learn to generate useful auxiliary labels, whichthemselves are used in another learning procedure.

3. Meta Auxiliary LearningWe now introduce our method for automatically generatingoptimal labels for an auxiliary task, which we call MetaAuXiliary Learning (MAXL). In this paper, we only con-sider a single auxiliary task, although our method is generaland could be modified to include several auxiliary tasks.We only focus on classification tasks for both the primaryand auxiliary tasks, but the overall framework could alsobe extended to regression. As such, the auxiliary task is de-fined as a sub-class labelling problem, where each primaryclass is associated with a number of auxiliary classes, in atwo-level hierarchy. For example, if manually-defined la-bels were used, a primary class could be “Dog”, and one ofthe auxiliary classes could be “Labrador”.

3.1. Problem Setup

The goal of MAXL is to generate labels for the auxiliarytask which, when trained alongside a primary task, im-prove the performance of the primary task. To accom-plish this, we train two networks: a multi-task network,which trains on the primary and auxiliary task in a standardmulti-task learning setting, and a label-generation network,which generates the labels for the auxiliary task.

We denote the multi-task network as a function fθ1(x)with parameters θ1 which takes an input x, and the label-generation network as a function gθ2(x) with parameters θ2which takes the same input x. Parameters θ1 are updatedby losses of both the primary and auxiliary tasks, as is stan-dard in auxiliary learning. However, θ2 is updated only bythe performance of the primary task.

In the multi-task network, we apply a hard parameter shar-ing approach (Ruder, 2017) in which we predict both theprimary and auxiliary classes using the shared set of fea-tures θ1. At the final feature layer, fθ1(x), we then furtherapply task-specific layers to output the corresponding pre-diction for each task, using a SoftMax function. We denotethe primary task predictions by f pri

θ1(x), and the auxiliary

task predictions by f auxθ1

(x). And we denote the ground-truth primary task labels by ypri, and the generated auxiliarytask labels by yaux.

In the label-generation network, we define a hierarchi-cal structure ψ which determines the number of auxiliaryclasses for each primary class. We found during exper-iments that sharing auxiliary classes across all primaryclasses did not allow enough flexibility to learn optimalauxiliary labels. Therefore, we assign each primary class

Label-GenerationNetwork

Multi-taskNetwork

x

<latexit sha1_base64="JAhJ72QyRF1aVPzO9TRxJqzPOlU=">AAACKHicbVDLSsNAFJ3UV42vVpdugkVwY0lE0GXRjcsW7APaUCaTm3ToZBJmJmoJ/QK3+hN+jTvp1i9x0kawrRcuHM65l3vu8RJGpbLtmVHa2Nza3invmnv7B4dHlepxR8apINAmMYtFz8MSGOXQVlQx6CUCcOQx6Hrj+1zvPoGQNOaPapKAG+GQ04ASrDTVehlWanbdnpe1DpwC1FBRzWHVqA78mKQRcEUYlrLv2IlyMywUJQym5iCVkGAyxiH0NeQ4Aulmc6dT61wzvhXEQjdX1pz9u5HhSMpJ5OnJCKuRXNVy8j+tn6rg1s0oT1IFnCwOBSmzVGzlb1s+FUAUm2iAiaDaq0VGWGCidDhLV8ALNR/F3F/5RTxf/ipuljvwQdKQT01Th+isRrYOOld1x647reta466Is4xO0Rm6QA66QQ30gJqojQgC9Ire0LvxYXwaX8ZsMVoyip0TtFTG9w/mMaY1</latexit><latexit sha1_base64="JAhJ72QyRF1aVPzO9TRxJqzPOlU=">AAACKHicbVDLSsNAFJ3UV42vVpdugkVwY0lE0GXRjcsW7APaUCaTm3ToZBJmJmoJ/QK3+hN+jTvp1i9x0kawrRcuHM65l3vu8RJGpbLtmVHa2Nza3invmnv7B4dHlepxR8apINAmMYtFz8MSGOXQVlQx6CUCcOQx6Hrj+1zvPoGQNOaPapKAG+GQ04ASrDTVehlWanbdnpe1DpwC1FBRzWHVqA78mKQRcEUYlrLv2IlyMywUJQym5iCVkGAyxiH0NeQ4Aulmc6dT61wzvhXEQjdX1pz9u5HhSMpJ5OnJCKuRXNVy8j+tn6rg1s0oT1IFnCwOBSmzVGzlb1s+FUAUm2iAiaDaq0VGWGCidDhLV8ALNR/F3F/5RTxf/ipuljvwQdKQT01Th+isRrYOOld1x647reta466Is4xO0Rm6QA66QQ30gJqojQgC9Ire0LvxYXwaX8ZsMVoyip0TtFTG9w/mMaY1</latexit><latexit sha1_base64="JAhJ72QyRF1aVPzO9TRxJqzPOlU=">AAACKHicbVDLSsNAFJ3UV42vVpdugkVwY0lE0GXRjcsW7APaUCaTm3ToZBJmJmoJ/QK3+hN+jTvp1i9x0kawrRcuHM65l3vu8RJGpbLtmVHa2Nza3invmnv7B4dHlepxR8apINAmMYtFz8MSGOXQVlQx6CUCcOQx6Hrj+1zvPoGQNOaPapKAG+GQ04ASrDTVehlWanbdnpe1DpwC1FBRzWHVqA78mKQRcEUYlrLv2IlyMywUJQym5iCVkGAyxiH0NeQ4Aulmc6dT61wzvhXEQjdX1pz9u5HhSMpJ5OnJCKuRXNVy8j+tn6rg1s0oT1IFnCwOBSmzVGzlb1s+FUAUm2iAiaDaq0VGWGCidDhLV8ALNR/F3F/5RTxf/ipuljvwQdKQT01Th+isRrYOOld1x647reta466Is4xO0Rm6QA66QQ30gJqojQgC9Ire0LvxYXwaX8ZsMVoyip0TtFTG9w/mMaY1</latexit><latexit sha1_base64="JAhJ72QyRF1aVPzO9TRxJqzPOlU=">AAACKHicbVDLSsNAFJ3UV42vVpdugkVwY0lE0GXRjcsW7APaUCaTm3ToZBJmJmoJ/QK3+hN+jTvp1i9x0kawrRcuHM65l3vu8RJGpbLtmVHa2Nza3invmnv7B4dHlepxR8apINAmMYtFz8MSGOXQVlQx6CUCcOQx6Hrj+1zvPoGQNOaPapKAG+GQ04ASrDTVehlWanbdnpe1DpwC1FBRzWHVqA78mKQRcEUYlrLv2IlyMywUJQym5iCVkGAyxiH0NeQ4Aulmc6dT61wzvhXEQjdX1pz9u5HhSMpJ5OnJCKuRXNVy8j+tn6rg1s0oT1IFnCwOBSmzVGzlb1s+FUAUm2iAiaDaq0VGWGCidDhLV8ALNR/F3F/5RTxf/ipuljvwQdKQT01Th+isRrYOOld1x647reta466Is4xO0Rm6QA66QQ30gJqojQgC9Ire0LvxYXwaX8ZsMVoyip0TtFTG9w/mMaY1</latexit>

f✓1(x)

<latexit sha1_base64="vaOzb09dIwUhNmkeDbmjyRzMELo=">AAACOHicbVDLSsNAFJ34Nj7a6tJNsAi6sCQi6FJ047KCVaENYTK5SQcnkzBzo5bQL3GrP+GfuHMnbv0Cpw/Bth4YOJxzL/fMCXPBNbruuzU3v7C4tLyyaq+tb2xWqrWtG50VikGLZSJTdyHVILiEFnIUcJcroGko4Da8vxj4tw+gNM/kNfZy8FOaSB5zRtFIQbUSB2UHu4A08Pr7TwdBte423CGcWeKNSZ2M0QxqVq0TZaxIQSITVOu25+bol1QhZwL6dqfQkFN2TxNoGyppCtovh8n7zp5RIifOlHkSnaH6d6Okqda9NDSTKcWunvYG4n9eu8D41C+5zAsEyUaH4kI4mDmDGpyIK2AoeoZQprjJ6rAuVZShKWviCoSJ0dNMRlN/UY+Hv45fDhJEoHki+7ZtSvSmK5slN0cNz214V8f1s/NxnStkh+ySfeKRE3JGLkmTtAgjBXkmL+TVerM+rE/razQ6Z413tskErO8fiQ+sAg==</latexit><latexit sha1_base64="vaOzb09dIwUhNmkeDbmjyRzMELo=">AAACOHicbVDLSsNAFJ34Nj7a6tJNsAi6sCQi6FJ047KCVaENYTK5SQcnkzBzo5bQL3GrP+GfuHMnbv0Cpw/Bth4YOJxzL/fMCXPBNbruuzU3v7C4tLyyaq+tb2xWqrWtG50VikGLZSJTdyHVILiEFnIUcJcroGko4Da8vxj4tw+gNM/kNfZy8FOaSB5zRtFIQbUSB2UHu4A08Pr7TwdBte423CGcWeKNSZ2M0QxqVq0TZaxIQSITVOu25+bol1QhZwL6dqfQkFN2TxNoGyppCtovh8n7zp5RIifOlHkSnaH6d6Okqda9NDSTKcWunvYG4n9eu8D41C+5zAsEyUaH4kI4mDmDGpyIK2AoeoZQprjJ6rAuVZShKWviCoSJ0dNMRlN/UY+Hv45fDhJEoHki+7ZtSvSmK5slN0cNz214V8f1s/NxnStkh+ySfeKRE3JGLkmTtAgjBXkmL+TVerM+rE/razQ6Z413tskErO8fiQ+sAg==</latexit><latexit sha1_base64="vaOzb09dIwUhNmkeDbmjyRzMELo=">AAACOHicbVDLSsNAFJ34Nj7a6tJNsAi6sCQi6FJ047KCVaENYTK5SQcnkzBzo5bQL3GrP+GfuHMnbv0Cpw/Bth4YOJxzL/fMCXPBNbruuzU3v7C4tLyyaq+tb2xWqrWtG50VikGLZSJTdyHVILiEFnIUcJcroGko4Da8vxj4tw+gNM/kNfZy8FOaSB5zRtFIQbUSB2UHu4A08Pr7TwdBte423CGcWeKNSZ2M0QxqVq0TZaxIQSITVOu25+bol1QhZwL6dqfQkFN2TxNoGyppCtovh8n7zp5RIifOlHkSnaH6d6Okqda9NDSTKcWunvYG4n9eu8D41C+5zAsEyUaH4kI4mDmDGpyIK2AoeoZQprjJ6rAuVZShKWviCoSJ0dNMRlN/UY+Hv45fDhJEoHki+7ZtSvSmK5slN0cNz214V8f1s/NxnStkh+ySfeKRE3JGLkmTtAgjBXkmL+TVerM+rE/razQ6Z413tskErO8fiQ+sAg==</latexit><latexit sha1_base64="vaOzb09dIwUhNmkeDbmjyRzMELo=">AAACOHicbVDLSsNAFJ34Nj7a6tJNsAi6sCQi6FJ047KCVaENYTK5SQcnkzBzo5bQL3GrP+GfuHMnbv0Cpw/Bth4YOJxzL/fMCXPBNbruuzU3v7C4tLyyaq+tb2xWqrWtG50VikGLZSJTdyHVILiEFnIUcJcroGko4Da8vxj4tw+gNM/kNfZy8FOaSB5zRtFIQbUSB2UHu4A08Pr7TwdBte423CGcWeKNSZ2M0QxqVq0TZaxIQSITVOu25+bol1QhZwL6dqfQkFN2TxNoGyppCtovh8n7zp5RIifOlHkSnaH6d6Okqda9NDSTKcWunvYG4n9eu8D41C+5zAsEyUaH4kI4mDmDGpyIK2AoeoZQprjJ6rAuVZShKWviCoSJ0dNMRlN/UY+Hv45fDhJEoHki+7ZtSvSmK5slN0cNz214V8f1s/NxnStkh+ySfeKRE3JGLkmTtAgjBXkmL+TVerM+rE/razQ6Z413tskErO8fiQ+sAg==</latexit>

Primary TaskSoftMax

Auxiliary TaskSoftMax

Generated TaskMask SoftMax

f

aux

✓1

(x)<latexit sha1_base64="aWqzEL6Ig/H4cJkqgq7ASsRiSZ8=">AAACRXicbVDLSsNAFJ34tr5aXeoiWARdWBIRdFl041LBVqGNYTK5aQcnkzBzoy0hG7/Grf6E3+BHuBO3On0IWj0wcOace7n3niAVXKPjvFpT0zOzc/MLi6Wl5ZXVtXJlvamTTDFosEQk6jqgGgSX0ECOAq5TBTQOBFwFt6cD/+oOlOaJvMR+Cl5MO5JHnFE0kl/eim7yNkIPc5r1isI3ny4g9d1it7fnl6tOzRnC/kvcMamSMc79ilVphwnLYpDIBNW65TopejlVyJmAotTONKSU3dIOtAyVNAbt5cMzCnvHKKEdJco8ifZQ/dmR01jrfhyYyphiV096A/E/r5VhdOzlXKYZgmSjQVEmbEzsQSZ2yBUwFH1DKFPc7GqzLlWUoUnu1xQIOkaPExlO3KLu978dLx9sEILmHVmUSiZEdzKyv6R5UHOdmntxWK2fjONcIJtkm+wSlxyROjkj56RBGHkgj+SJPFsv1pv1bn2MSqescc8G+QXr8wsItbJB</latexit><latexit sha1_base64="aWqzEL6Ig/H4cJkqgq7ASsRiSZ8=">AAACRXicbVDLSsNAFJ34tr5aXeoiWARdWBIRdFl041LBVqGNYTK5aQcnkzBzoy0hG7/Grf6E3+BHuBO3On0IWj0wcOace7n3niAVXKPjvFpT0zOzc/MLi6Wl5ZXVtXJlvamTTDFosEQk6jqgGgSX0ECOAq5TBTQOBFwFt6cD/+oOlOaJvMR+Cl5MO5JHnFE0kl/eim7yNkIPc5r1isI3ny4g9d1it7fnl6tOzRnC/kvcMamSMc79ilVphwnLYpDIBNW65TopejlVyJmAotTONKSU3dIOtAyVNAbt5cMzCnvHKKEdJco8ifZQ/dmR01jrfhyYyphiV096A/E/r5VhdOzlXKYZgmSjQVEmbEzsQSZ2yBUwFH1DKFPc7GqzLlWUoUnu1xQIOkaPExlO3KLu978dLx9sEILmHVmUSiZEdzKyv6R5UHOdmntxWK2fjONcIJtkm+wSlxyROjkj56RBGHkgj+SJPFsv1pv1bn2MSqescc8G+QXr8wsItbJB</latexit><latexit sha1_base64="aWqzEL6Ig/H4cJkqgq7ASsRiSZ8=">AAACRXicbVDLSsNAFJ34tr5aXeoiWARdWBIRdFl041LBVqGNYTK5aQcnkzBzoy0hG7/Grf6E3+BHuBO3On0IWj0wcOace7n3niAVXKPjvFpT0zOzc/MLi6Wl5ZXVtXJlvamTTDFosEQk6jqgGgSX0ECOAq5TBTQOBFwFt6cD/+oOlOaJvMR+Cl5MO5JHnFE0kl/eim7yNkIPc5r1isI3ny4g9d1it7fnl6tOzRnC/kvcMamSMc79ilVphwnLYpDIBNW65TopejlVyJmAotTONKSU3dIOtAyVNAbt5cMzCnvHKKEdJco8ifZQ/dmR01jrfhyYyphiV096A/E/r5VhdOzlXKYZgmSjQVEmbEzsQSZ2yBUwFH1DKFPc7GqzLlWUoUnu1xQIOkaPExlO3KLu978dLx9sEILmHVmUSiZEdzKyv6R5UHOdmntxWK2fjONcIJtkm+wSlxyROjkj56RBGHkgj+SJPFsv1pv1bn2MSqescc8G+QXr8wsItbJB</latexit><latexit sha1_base64="aWqzEL6Ig/H4cJkqgq7ASsRiSZ8=">AAACRXicbVDLSsNAFJ34tr5aXeoiWARdWBIRdFl041LBVqGNYTK5aQcnkzBzoy0hG7/Grf6E3+BHuBO3On0IWj0wcOace7n3niAVXKPjvFpT0zOzc/MLi6Wl5ZXVtXJlvamTTDFosEQk6jqgGgSX0ECOAq5TBTQOBFwFt6cD/+oOlOaJvMR+Cl5MO5JHnFE0kl/eim7yNkIPc5r1isI3ny4g9d1it7fnl6tOzRnC/kvcMamSMc79ilVphwnLYpDIBNW65TopejlVyJmAotTONKSU3dIOtAyVNAbt5cMzCnvHKKEdJco8ifZQ/dmR01jrfhyYyphiV096A/E/r5VhdOzlXKYZgmSjQVEmbEzsQSZ2yBUwFH1DKFPc7GqzLlWUoUnu1xQIOkaPExlO3KLu978dLx9sEILmHVmUSiZEdzKyv6R5UHOdmntxWK2fjONcIJtkm+wSlxyROjkj56RBGHkgj+SJPFsv1pv1bn2MSqescc8G+QXr8wsItbJB</latexit>

g✓2(x)

<latexit sha1_base64="wldqwOPRKbsEdsZiGouzX3HIIoE=">AAACOHicbVDLSsNAFJ34rPHV6tJNsAi6sCRF0KXoxmUFW4U2hMnkNh2cTMLMjVpCvsSt/oR/4s6duPULnD4ErR4YOJxzL/fMCTPBNbruqzU3v7C4tFxZsVfX1jc2q7Wtjk5zxaDNUpGqm5BqEFxCGzkKuMkU0CQUcB3eno/86ztQmqfyCocZ+AmNJe9zRtFIQXUzDooeDgBp0Cz3Hw6Cat1tuGM4f4k3JXUyRSuoWbVelLI8AYlMUK27npuhX1CFnAko7V6uIaPslsbQNVTSBLRfjJOXzp5RIqefKvMkOmP150ZBE62HSWgmE4oDPeuNxP+8bo79E7/gMssRJJsc6ufCwdQZ1eBEXAFDMTSEMsVNVocNqKIMTVm/rkAYGz1JZTTzF3V/+O34xShBBJrHsrRtU6I3W9lf0mk2PLfhXR7VT8+mdVbIDtkl+8Qjx+SUXJAWaRNGcvJInsiz9WK9We/Wx2R0zprubJNfsD6/AIyprAQ=</latexit><latexit sha1_base64="wldqwOPRKbsEdsZiGouzX3HIIoE=">AAACOHicbVDLSsNAFJ34rPHV6tJNsAi6sCRF0KXoxmUFW4U2hMnkNh2cTMLMjVpCvsSt/oR/4s6duPULnD4ErR4YOJxzL/fMCTPBNbruqzU3v7C4tFxZsVfX1jc2q7Wtjk5zxaDNUpGqm5BqEFxCGzkKuMkU0CQUcB3eno/86ztQmqfyCocZ+AmNJe9zRtFIQXUzDooeDgBp0Cz3Hw6Cat1tuGM4f4k3JXUyRSuoWbVelLI8AYlMUK27npuhX1CFnAko7V6uIaPslsbQNVTSBLRfjJOXzp5RIqefKvMkOmP150ZBE62HSWgmE4oDPeuNxP+8bo79E7/gMssRJJsc6ufCwdQZ1eBEXAFDMTSEMsVNVocNqKIMTVm/rkAYGz1JZTTzF3V/+O34xShBBJrHsrRtU6I3W9lf0mk2PLfhXR7VT8+mdVbIDtkl+8Qjx+SUXJAWaRNGcvJInsiz9WK9We/Wx2R0zprubJNfsD6/AIyprAQ=</latexit><latexit sha1_base64="wldqwOPRKbsEdsZiGouzX3HIIoE=">AAACOHicbVDLSsNAFJ34rPHV6tJNsAi6sCRF0KXoxmUFW4U2hMnkNh2cTMLMjVpCvsSt/oR/4s6duPULnD4ErR4YOJxzL/fMCTPBNbruqzU3v7C4tFxZsVfX1jc2q7Wtjk5zxaDNUpGqm5BqEFxCGzkKuMkU0CQUcB3eno/86ztQmqfyCocZ+AmNJe9zRtFIQXUzDooeDgBp0Cz3Hw6Cat1tuGM4f4k3JXUyRSuoWbVelLI8AYlMUK27npuhX1CFnAko7V6uIaPslsbQNVTSBLRfjJOXzp5RIqefKvMkOmP150ZBE62HSWgmE4oDPeuNxP+8bo79E7/gMssRJJsc6ufCwdQZ1eBEXAFDMTSEMsVNVocNqKIMTVm/rkAYGz1JZTTzF3V/+O34xShBBJrHsrRtU6I3W9lf0mk2PLfhXR7VT8+mdVbIDtkl+8Qjx+SUXJAWaRNGcvJInsiz9WK9We/Wx2R0zprubJNfsD6/AIyprAQ=</latexit><latexit sha1_base64="wldqwOPRKbsEdsZiGouzX3HIIoE=">AAACOHicbVDLSsNAFJ34rPHV6tJNsAi6sCRF0KXoxmUFW4U2hMnkNh2cTMLMjVpCvsSt/oR/4s6duPULnD4ErR4YOJxzL/fMCTPBNbruqzU3v7C4tFxZsVfX1jc2q7Wtjk5zxaDNUpGqm5BqEFxCGzkKuMkU0CQUcB3eno/86ztQmqfyCocZ+AmNJe9zRtFIQXUzDooeDgBp0Cz3Hw6Cat1tuGM4f4k3JXUyRSuoWbVelLI8AYlMUK27npuhX1CFnAko7V6uIaPslsbQNVTSBLRfjJOXzp5RIqefKvMkOmP150ZBE62HSWgmE4oDPeuNxP+8bo79E7/gMssRJJsc6ufCwdQZ1eBEXAFDMTSEMsVNVocNqKIMTVm/rkAYGz1JZTTzF3V/+O34xShBBJrHsrRtU6I3W9lf0mk2PLfhXR7VT8+mdVbIDtkl+8Qjx+SUXJAWaRNGcvJInsiz9WK9We/Wx2R0zprubJNfsD6/AIyprAQ=</latexit>

f

pri✓1(x)

<latexit sha1_base64="NyKaTxuKybYh2fN99hg+QvfOhZ0=">AAACRHicbVBNb9NAEB2n0IYAJSlHOFitkMIhkc2lPVblUI5BIh9SYqz1epyssl5bu+O2keULx/4SrvALOPEf+A/cEFfE5gOpSRlppaf33szsvCiXwpDn/XBqew8e7h/UHzUeP3l6+KzZOhqYrNAc+zyTmR5FzKAUCvskSOIo18jSSOIwmr9d6sMr1EZk6gMtcgxSNlUiEZyRpcLmyyQsJzRDYqFffbQQb6jMtaiq9s3rsHnidb1VufeBvwEn553L228A0AtbTnMSZ7xIURGXzJix7+UUlEyT4BKrxqQwmDM+Z1McW6hYiiYoV2dU7ivLxG6SafsUuSv2bkfJUmMWaWSdKaOZ2dWW5P+0cUHJWVAKlReEiq8XJYV0KXOXmbix0MhJLixgXAv7V5fPmGacbHLbk/R1Z2qVNFNxUC43xWjEVG1dVmL0z1M1bIb+bmL3weBN1/e6/nsb5gWsqw4v4Bja4MMpnMM76EEfOHyCz/AFvjrfnZ/OL+f32lpzNj3PYaucP38BhLa0KA==</latexit><latexit sha1_base64="7TJTuUjm7A8V8knamWwgueSpGWc=">AAACRHicbVBNS8NAEN34bf2qetRDUAQ9WBIveix6sMcKVoU2hs1m0i5uNmF3opaQi0d/iUf1R4h/wf/gTbyK21bBqgMLj/fezOy8IBVco+O8WCOjY+MTk1PTpZnZufmF8uLSiU4yxaDBEpGos4BqEFxCAzkKOEsV0DgQcBpcHPT000tQmifyGLspeDFtSx5xRtFQfnk18vMWdgCp7xbnBsI15qniRbF5veWX152K0y/7L3C/wHp1+/D2/vm2VvcXrXIrTFgWg0QmqNZN10nRy6lCzgQUpVamIaXsgrahaaCkMWgv759R2BuGCe0oUeZJtPvsz46cxlp348A4Y4od/Vvrkf9pzQyjPS/nMs0QJBssijJhY2L3MrFDroCh6BpAmeLmrzbrUEUZmuSGJ6mr7bZR4kSGXt7bFILmbTl0WQ7Bt6comQzd34n9BSc7FdepuEcmzH0yqCmyQtbIJnHJLqmSGqmTBmHkhtyRB/JoPVmv1pv1PrCOWF89y2SorI9P9Ia1sg==</latexit><latexit sha1_base64="7TJTuUjm7A8V8knamWwgueSpGWc=">AAACRHicbVBNS8NAEN34bf2qetRDUAQ9WBIveix6sMcKVoU2hs1m0i5uNmF3opaQi0d/iUf1R4h/wf/gTbyK21bBqgMLj/fezOy8IBVco+O8WCOjY+MTk1PTpZnZufmF8uLSiU4yxaDBEpGos4BqEFxCAzkKOEsV0DgQcBpcHPT000tQmifyGLspeDFtSx5xRtFQfnk18vMWdgCp7xbnBsI15qniRbF5veWX152K0y/7L3C/wHp1+/D2/vm2VvcXrXIrTFgWg0QmqNZN10nRy6lCzgQUpVamIaXsgrahaaCkMWgv759R2BuGCe0oUeZJtPvsz46cxlp348A4Y4od/Vvrkf9pzQyjPS/nMs0QJBssijJhY2L3MrFDroCh6BpAmeLmrzbrUEUZmuSGJ6mr7bZR4kSGXt7bFILmbTl0WQ7Bt6comQzd34n9BSc7FdepuEcmzH0yqCmyQtbIJnHJLqmSGqmTBmHkhtyRB/JoPVmv1pv1PrCOWF89y2SorI9P9Ia1sg==</latexit><latexit sha1_base64="u7eg2qhV8ABOWEjIoF/okCfBDgA=">AAACRHicbVDLSsNAFJ3Ud31VXeoiWARdWBI3uhTduKxgW6GNYTK5aYdOJmHmRi0hG7/GrX6E/+A/uBO34vQhWPXCwOGcc++de4JUcI2O82qVZmbn5hcWl8rLK6tr65WNzaZOMsWgwRKRqOuAahBcQgM5CrhOFdA4ENAK+udDvXULSvNEXuEgBS+mXckjzigayq/sRH7ewR4g9d3ixkC4xzxVvCj27w/8StWpOaOy/wJ3AqpkUnV/w6p0woRlMUhkgmrddp0UvZwq5ExAUe5kGlLK+rQLbQMljUF7+eiMwt4zTGhHiTJPoj1if3bkNNZ6EAfGGVPs6d/akPxPa2cYnXg5l2mGINl4UZQJGxN7mIkdcgUMxcAAyhQ3f7VZjyrK0CQ3PUndHXaNEicy9PLhphA078qpy3IIvj1F2WTo/k7sL2ge1Vyn5l461dOzSZqLZJvskn3ikmNySi5InTQIIw/kkTyRZ+vFerPerY+xtWRNerbIVFmfX1cYsio=</latexit>

y<latexit sha1_base64="pmpxilBbwbafOLqdHvy5+dcivQg=">AAACMnicbVDLSsNAFJ3UV62vVpdugkVwY0lE0GXRjcsK9gFtLJPJbTt0MgkzN2oJ+Q23+hH+jO7ErR/h9CHY1gMDh3Pua44fC67Rcd6t3Mrq2vpGfrOwtb2zu1cs7Td0lCgGdRaJSLV8qkFwCXXkKKAVK6ChL6DpD6/HfvMBlOaRvMNRDF5I+5L3OKNopM7ovoPwhGmseNYtlp2KM4G9TNwZKZMZat2SVewEEUtCkMgE1brtOjF6KVXImYCs0Ek0xJQNaR/ahkoagvbSydGZfWyUwO5FyjyJ9kT925HSUOtR6JvKkOJAL3pj8T+vnWDv0ku5jBMEyaaLeomwMbLHCdgBV8BQjAyhTHFzq80GVFGGJqf5SerxtG+cMJKBl443BaB5X879LAX/tyYrmAzdxcSWSeOs4joV9/a8XL2apZknh+SInBCXXJAquSE1UieMxOSZvJBX6836sD6tr2lpzpr1HJA5WN8/ZeqrVg==</latexit><latexit sha1_base64="pmpxilBbwbafOLqdHvy5+dcivQg=">AAACMnicbVDLSsNAFJ3UV62vVpdugkVwY0lE0GXRjcsK9gFtLJPJbTt0MgkzN2oJ+Q23+hH+jO7ErR/h9CHY1gMDh3Pua44fC67Rcd6t3Mrq2vpGfrOwtb2zu1cs7Td0lCgGdRaJSLV8qkFwCXXkKKAVK6ChL6DpD6/HfvMBlOaRvMNRDF5I+5L3OKNopM7ovoPwhGmseNYtlp2KM4G9TNwZKZMZat2SVewEEUtCkMgE1brtOjF6KVXImYCs0Ek0xJQNaR/ahkoagvbSydGZfWyUwO5FyjyJ9kT925HSUOtR6JvKkOJAL3pj8T+vnWDv0ku5jBMEyaaLeomwMbLHCdgBV8BQjAyhTHFzq80GVFGGJqf5SerxtG+cMJKBl443BaB5X879LAX/tyYrmAzdxcSWSeOs4joV9/a8XL2apZknh+SInBCXXJAquSE1UieMxOSZvJBX6836sD6tr2lpzpr1HJA5WN8/ZeqrVg==</latexit><latexit sha1_base64="pmpxilBbwbafOLqdHvy5+dcivQg=">AAACMnicbVDLSsNAFJ3UV62vVpdugkVwY0lE0GXRjcsK9gFtLJPJbTt0MgkzN2oJ+Q23+hH+jO7ErR/h9CHY1gMDh3Pua44fC67Rcd6t3Mrq2vpGfrOwtb2zu1cs7Td0lCgGdRaJSLV8qkFwCXXkKKAVK6ChL6DpD6/HfvMBlOaRvMNRDF5I+5L3OKNopM7ovoPwhGmseNYtlp2KM4G9TNwZKZMZat2SVewEEUtCkMgE1brtOjF6KVXImYCs0Ek0xJQNaR/ahkoagvbSydGZfWyUwO5FyjyJ9kT925HSUOtR6JvKkOJAL3pj8T+vnWDv0ku5jBMEyaaLeomwMbLHCdgBV8BQjAyhTHFzq80GVFGGJqf5SerxtG+cMJKBl443BaB5X879LAX/tyYrmAzdxcSWSeOs4joV9/a8XL2apZknh+SInBCXXJAquSE1UieMxOSZvJBX6836sD6tr2lpzpr1HJA5WN8/ZeqrVg==</latexit><latexit sha1_base64="pmpxilBbwbafOLqdHvy5+dcivQg=">AAACMnicbVDLSsNAFJ3UV62vVpdugkVwY0lE0GXRjcsK9gFtLJPJbTt0MgkzN2oJ+Q23+hH+jO7ErR/h9CHY1gMDh3Pua44fC67Rcd6t3Mrq2vpGfrOwtb2zu1cs7Td0lCgGdRaJSLV8qkFwCXXkKKAVK6ChL6DpD6/HfvMBlOaRvMNRDF5I+5L3OKNopM7ovoPwhGmseNYtlp2KM4G9TNwZKZMZat2SVewEEUtCkMgE1brtOjF6KVXImYCs0Ek0xJQNaR/ahkoagvbSydGZfWyUwO5FyjyJ9kT925HSUOtR6JvKkOJAL3pj8T+vnWDv0ku5jBMEyaaLeomwMbLHCdgBV8BQjAyhTHFzq80GVFGGJqf5SerxtG+cMJKBl443BaB5X879LAX/tyYrmAzdxcSWSeOs4joV9/a8XL2apZknh+SInBCXXJAquSE1UieMxOSZvJBX6836sD6tr2lpzpr1HJA5WN8/ZeqrVg==</latexit>

y<latexit sha1_base64="wyD1LUrzWlVFLm63FXJsRtIZbkM=">AAACMnicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuiy6cVnBPqCNZTK5bYdOJmHmRltCf8OtfoQ/oztx60c4aSvY1gMDh3Pua44fC67Rcd6tldW19Y3N3FZ+e2d3b79QPKjrKFEMaiwSkWr6VIPgEmrIUUAzVkBDX0DDH9xkfuMRlOaRvMdRDF5Ie5J3OaNopPbooY0wxJQmw3GnUHLKzgT2MnFnpERmqHaKVqEdRCwJQSITVOuW68TopVQhZwLG+XaiIaZsQHvQMlTSELSXTo4e2ydGCexupMyTaE/Uvx0pDbUehb6pDCn29aKXif95rQS7V17KZZwgSDZd1E2EjZGdJWAHXAFDMTKEMsXNrTbrU0UZmpzmJ6mns55xwkgGXpptCkDznpz7WQr+b804bzJ0FxNbJvXzsuuU3buLUuV6lmaOHJFjckpcckkq5JZUSY0wEpNn8kJerTfrw/q0vqalK9as55DMwfr+AWsbq1k=</latexit><latexit sha1_base64="wyD1LUrzWlVFLm63FXJsRtIZbkM=">AAACMnicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuiy6cVnBPqCNZTK5bYdOJmHmRltCf8OtfoQ/oztx60c4aSvY1gMDh3Pua44fC67Rcd6tldW19Y3N3FZ+e2d3b79QPKjrKFEMaiwSkWr6VIPgEmrIUUAzVkBDX0DDH9xkfuMRlOaRvMdRDF5Ie5J3OaNopPbooY0wxJQmw3GnUHLKzgT2MnFnpERmqHaKVqEdRCwJQSITVOuW68TopVQhZwLG+XaiIaZsQHvQMlTSELSXTo4e2ydGCexupMyTaE/Uvx0pDbUehb6pDCn29aKXif95rQS7V17KZZwgSDZd1E2EjZGdJWAHXAFDMTKEMsXNrTbrU0UZmpzmJ6mns55xwkgGXpptCkDznpz7WQr+b804bzJ0FxNbJvXzsuuU3buLUuV6lmaOHJFjckpcckkq5JZUSY0wEpNn8kJerTfrw/q0vqalK9as55DMwfr+AWsbq1k=</latexit><latexit sha1_base64="wyD1LUrzWlVFLm63FXJsRtIZbkM=">AAACMnicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuiy6cVnBPqCNZTK5bYdOJmHmRltCf8OtfoQ/oztx60c4aSvY1gMDh3Pua44fC67Rcd6tldW19Y3N3FZ+e2d3b79QPKjrKFEMaiwSkWr6VIPgEmrIUUAzVkBDX0DDH9xkfuMRlOaRvMdRDF5Ie5J3OaNopPbooY0wxJQmw3GnUHLKzgT2MnFnpERmqHaKVqEdRCwJQSITVOuW68TopVQhZwLG+XaiIaZsQHvQMlTSELSXTo4e2ydGCexupMyTaE/Uvx0pDbUehb6pDCn29aKXif95rQS7V17KZZwgSDZd1E2EjZGdJWAHXAFDMTKEMsXNrTbrU0UZmpzmJ6mns55xwkgGXpptCkDznpz7WQr+b804bzJ0FxNbJvXzsuuU3buLUuV6lmaOHJFjckpcckkq5JZUSY0wEpNn8kJerTfrw/q0vqalK9as55DMwfr+AWsbq1k=</latexit><latexit sha1_base64="wyD1LUrzWlVFLm63FXJsRtIZbkM=">AAACMnicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuiy6cVnBPqCNZTK5bYdOJmHmRltCf8OtfoQ/oztx60c4aSvY1gMDh3Pua44fC67Rcd6tldW19Y3N3FZ+e2d3b79QPKjrKFEMaiwSkWr6VIPgEmrIUUAzVkBDX0DDH9xkfuMRlOaRvMdRDF5Ie5J3OaNopPbooY0wxJQmw3GnUHLKzgT2MnFnpERmqHaKVqEdRCwJQSITVOuW68TopVQhZwLG+XaiIaZsQHvQMlTSELSXTo4e2ydGCexupMyTaE/Uvx0pDbUehb6pDCn29aKXif95rQS7V17KZZwgSDZd1E2EjZGdJWAHXAFDMTKEMsXNrTbrU0UZmpzmJ6mns55xwkgGXpptCkDznpz7WQr+b804bzJ0FxNbJvXzsuuU3buLUuV6lmaOHJFjckpcckkq5JZUSY0wEpNn8kJerTfrw/q0vqalK9as55DMwfr+AWsbq1k=</latexit>

y , <latexit sha1_base64="1Q7/7w/gGdd2NJVWnft8iJftUAQ=">AAACOXicbVDLSsNAFJ3UV62vVpdugkVwoSURQZdFNy4r2Ae0sUwmt+3QySTM3FhL6Ke41Y/wS1y6E7f+gNOHYFsPDBzOua85fiy4Rsd5tzIrq2vrG9nN3Nb2zu5evrBf01GiGFRZJCLV8KkGwSVUkaOARqyAhr6Aut+/Gfv1R1CaR/IehzF4Ie1K3uGMopHa+cLwoYXwhGms+Oi0FWvezhedkjOBvUzcGSmSGSrtgpVvBRFLQpDIBNW66ToxeilVyJmAUa6VaIgp69MuNA2VNATtpZPbR/axUQK7EynzJNoT9W9HSkOth6FvKkOKPb3ojcX/vGaCnSsv5TJOECSbLuokwsbIHgdhB1wBQzE0hDLFza0261FFGZq45iepwVnXOGEkAy8dbwpA866c+1kK/m/NKGcydBcTWya185LrlNy7i2L5epZmlhySI3JCXHJJyuSWVEiVMDIgz+SFvFpv1of1aX1NSzPWrOeAzMH6/gH1pa2N</latexit><latexit sha1_base64="1Q7/7w/gGdd2NJVWnft8iJftUAQ=">AAACOXicbVDLSsNAFJ3UV62vVpdugkVwoSURQZdFNy4r2Ae0sUwmt+3QySTM3FhL6Ke41Y/wS1y6E7f+gNOHYFsPDBzOua85fiy4Rsd5tzIrq2vrG9nN3Nb2zu5evrBf01GiGFRZJCLV8KkGwSVUkaOARqyAhr6Aut+/Gfv1R1CaR/IehzF4Ie1K3uGMopHa+cLwoYXwhGms+Oi0FWvezhedkjOBvUzcGSmSGSrtgpVvBRFLQpDIBNW66ToxeilVyJmAUa6VaIgp69MuNA2VNATtpZPbR/axUQK7EynzJNoT9W9HSkOth6FvKkOKPb3ojcX/vGaCnSsv5TJOECSbLuokwsbIHgdhB1wBQzE0hDLFza0261FFGZq45iepwVnXOGEkAy8dbwpA866c+1kK/m/NKGcydBcTWya185LrlNy7i2L5epZmlhySI3JCXHJJyuSWVEiVMDIgz+SFvFpv1of1aX1NSzPWrOeAzMH6/gH1pa2N</latexit><latexit sha1_base64="1Q7/7w/gGdd2NJVWnft8iJftUAQ=">AAACOXicbVDLSsNAFJ3UV62vVpdugkVwoSURQZdFNy4r2Ae0sUwmt+3QySTM3FhL6Ke41Y/wS1y6E7f+gNOHYFsPDBzOua85fiy4Rsd5tzIrq2vrG9nN3Nb2zu5evrBf01GiGFRZJCLV8KkGwSVUkaOARqyAhr6Aut+/Gfv1R1CaR/IehzF4Ie1K3uGMopHa+cLwoYXwhGms+Oi0FWvezhedkjOBvUzcGSmSGSrtgpVvBRFLQpDIBNW66ToxeilVyJmAUa6VaIgp69MuNA2VNATtpZPbR/axUQK7EynzJNoT9W9HSkOth6FvKkOKPb3ojcX/vGaCnSsv5TJOECSbLuokwsbIHgdhB1wBQzE0hDLFza0261FFGZq45iepwVnXOGEkAy8dbwpA866c+1kK/m/NKGcydBcTWya185LrlNy7i2L5epZmlhySI3JCXHJJyuSWVEiVMDIgz+SFvFpv1of1aX1NSzPWrOeAzMH6/gH1pa2N</latexit><latexit sha1_base64="1Q7/7w/gGdd2NJVWnft8iJftUAQ=">AAACOXicbVDLSsNAFJ3UV62vVpdugkVwoSURQZdFNy4r2Ae0sUwmt+3QySTM3FhL6Ke41Y/wS1y6E7f+gNOHYFsPDBzOua85fiy4Rsd5tzIrq2vrG9nN3Nb2zu5evrBf01GiGFRZJCLV8KkGwSVUkaOARqyAhr6Aut+/Gfv1R1CaR/IehzF4Ie1K3uGMopHa+cLwoYXwhGms+Oi0FWvezhedkjOBvUzcGSmSGSrtgpVvBRFLQpDIBNW66ToxeilVyJmAUa6VaIgp69MuNA2VNATtpZPbR/axUQK7EynzJNoT9W9HSkOth6FvKkOKPb3ojcX/vGaCnSsv5TJOECSbLuokwsbIHgdhB1wBQzE0hDLFza0261FFGZq45iepwVnXOGEkAy8dbwpA866c+1kK/m/NKGcydBcTWya185LrlNy7i2L5epZmlhySI3JCXHJJyuSWVEiVMDIgz+SFvFpv1of1aX1NSzPWrOeAzMH6/gH1pa2N</latexit>

Prediction Target

g

✓2(x, y , )

<latexit sha1_base64="Epms109afkLPDajCuu5zCsO0R7U=">AAACVnicbVDLSsQwFE3raxxfVZduioOgoEMrgi4H3bhUcFSYqSVN73SCaVqSW3Uo/Qu/xq1+hP6MmHkIjnogcHLOvbm5J8oF1+h5H5Y9Mzs3v1BbrC8tr6yuOesb1zorFIM2y0SmbiOqQXAJbeQo4DZXQNNIwE10fzb0bx5AaZ7JKxzkEKQ0kbzHGUUjhU4zuSu7CE9YJiCrKjSXPiAND6vdp/3B3djKFa/2u7nme6HT8JreCO5f4k9Ig0xwEa5bTjfOWJGCRCao1h3fyzEoqULOBFT1bqEhp+yeJtAxVNIUdFCOFqvcHaPEbi9T5kh0R+rPjpKmWg/SyFSmFPv6tzcU//M6BfZOgpLLvECQbDyoVwgXM3eYkhtzBQzFwBDKFDd/dVmfKsrQZDn9kno8SIyTZjIOyuGkGDRP5NRmJUTfNVXdZOj/TuwvuT5s+l7TvzxqtE4nadbIFtkmu8Qnx6RFzskFaRNGnskLeSVv1rv1ac/ZC+NS25r0bJIp2M4XalO3EQ==</latexit><latexit sha1_base64="Epms109afkLPDajCuu5zCsO0R7U=">AAACVnicbVDLSsQwFE3raxxfVZduioOgoEMrgi4H3bhUcFSYqSVN73SCaVqSW3Uo/Qu/xq1+hP6MmHkIjnogcHLOvbm5J8oF1+h5H5Y9Mzs3v1BbrC8tr6yuOesb1zorFIM2y0SmbiOqQXAJbeQo4DZXQNNIwE10fzb0bx5AaZ7JKxzkEKQ0kbzHGUUjhU4zuSu7CE9YJiCrKjSXPiAND6vdp/3B3djKFa/2u7nme6HT8JreCO5f4k9Ig0xwEa5bTjfOWJGCRCao1h3fyzEoqULOBFT1bqEhp+yeJtAxVNIUdFCOFqvcHaPEbi9T5kh0R+rPjpKmWg/SyFSmFPv6tzcU//M6BfZOgpLLvECQbDyoVwgXM3eYkhtzBQzFwBDKFDd/dVmfKsrQZDn9kno8SIyTZjIOyuGkGDRP5NRmJUTfNVXdZOj/TuwvuT5s+l7TvzxqtE4nadbIFtkmu8Qnx6RFzskFaRNGnskLeSVv1rv1ac/ZC+NS25r0bJIp2M4XalO3EQ==</latexit><latexit sha1_base64="Epms109afkLPDajCuu5zCsO0R7U=">AAACVnicbVDLSsQwFE3raxxfVZduioOgoEMrgi4H3bhUcFSYqSVN73SCaVqSW3Uo/Qu/xq1+hP6MmHkIjnogcHLOvbm5J8oF1+h5H5Y9Mzs3v1BbrC8tr6yuOesb1zorFIM2y0SmbiOqQXAJbeQo4DZXQNNIwE10fzb0bx5AaZ7JKxzkEKQ0kbzHGUUjhU4zuSu7CE9YJiCrKjSXPiAND6vdp/3B3djKFa/2u7nme6HT8JreCO5f4k9Ig0xwEa5bTjfOWJGCRCao1h3fyzEoqULOBFT1bqEhp+yeJtAxVNIUdFCOFqvcHaPEbi9T5kh0R+rPjpKmWg/SyFSmFPv6tzcU//M6BfZOgpLLvECQbDyoVwgXM3eYkhtzBQzFwBDKFDd/dVmfKsrQZDn9kno8SIyTZjIOyuGkGDRP5NRmJUTfNVXdZOj/TuwvuT5s+l7TvzxqtE4nadbIFtkmu8Qnx6RFzskFaRNGnskLeSVv1rv1ac/ZC+NS25r0bJIp2M4XalO3EQ==</latexit><latexit sha1_base64="Epms109afkLPDajCuu5zCsO0R7U=">AAACVnicbVDLSsQwFE3raxxfVZduioOgoEMrgi4H3bhUcFSYqSVN73SCaVqSW3Uo/Qu/xq1+hP6MmHkIjnogcHLOvbm5J8oF1+h5H5Y9Mzs3v1BbrC8tr6yuOesb1zorFIM2y0SmbiOqQXAJbeQo4DZXQNNIwE10fzb0bx5AaZ7JKxzkEKQ0kbzHGUUjhU4zuSu7CE9YJiCrKjSXPiAND6vdp/3B3djKFa/2u7nme6HT8JreCO5f4k9Ig0xwEa5bTjfOWJGCRCao1h3fyzEoqULOBFT1bqEhp+yeJtAxVNIUdFCOFqvcHaPEbi9T5kh0R+rPjpKmWg/SyFSmFPv6tzcU//M6BfZOgpLLvECQbDyoVwgXM3eYkhtzBQzFwBDKFDd/dVmfKsrQZDn9kno8SIyTZjIOyuGkGDRP5NRmJUTfNVXdZOj/TuwvuT5s+l7TvzxqtE4nadbIFtkmu8Qnx6RFzskFaRNGnskLeSVv1rv1ac/ZC+NS25r0bJIp2M4XalO3EQ==</latexit>

(a) Two networks applied in MAXL


y = 0<latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit><latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit><latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit><latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit>

y = 1<latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit><latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit><latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit><latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit>


y = 0<latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit><latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit><latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit><latexit sha1_base64="RMst+v0LzEh4uaL0NYIBsRIs6Ww=">AAACN3icbVDLSgMxFM34rPXV6tJNsAhuLDMi6EYQ3bisYB/QjiWTuW2DmcyQ3FHL0C9xqx/hp7hyJ279A9OHYFsPBA7n3FdOkEhh0HXfnYXFpeWV1dxafn1jc2u7UNypmTjVHKo8lrFuBMyAFAqqKFBCI9HAokBCPbi/Gvr1B9BGxOoW+wn4Eesq0RGcoZXahe3+XQvhCbNEi8E5dduFklt2R6DzxJuQEpmg0i46hVYY8zQChVwyY5qem6CfMY2CSxjkW6mBhPF71oWmpYpFYPxsdPmAHlglpJ1Y26eQjtS/HRmLjOlHga2MGPbMrDcU//OaKXbO/EyoJEVQfLyok0qKMR3GQEOhgaPsW8K4FvZWyntMM442rOlJ+vGoa50oVqGfDTeFYERXTf0sg+C3ZpC3GXqzic2T2nHZc8vezUnp4nKSZo7skX1ySDxySi7INamQKuEkJc/khbw6b86H8+l8jUsXnEnPLpmC8/0DXcGsMg==</latexit>

y = 1<latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit><latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit><latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit><latexit sha1_base64="35Cc4fhgKu0FHA1ZLgXhGmiIDMc=">AAACN3icbVDLSsNAFJ3UV62Ptrp0M1gEN5ZEBN0IRTcuFawKbSyTyW0dOpmEmRu1hHyJW/0IP8WVO3HrHzitEax6YOBwzn3NCRIpDLrui1OamZ2bXygvVpaWV1artfrahYlTzaHNYxnrq4AZkEJBGwVKuEo0sCiQcBkMj8f+5S1oI2J1jqME/IgNlOgLztBKvVp1dN1FuMcs0SI/pF6v1nCb7gT0L/EK0iAFTnt1p9YNY55GoJBLZkzHcxP0M6ZRcAl5pZsaSBgfsgF0LFUsAuNnk8tzumWVkPZjbZ9COlF/dmQsMmYUBbYyYnhjfntj8T+vk2L/wM+ESlIExb8W9VNJMabjGGgoNHCUI0sY18LeSvkN04yjDWt6kr7bGVgnilXoZ+NNIRgxUFM/yyD4rskrNkPvd2J/ycVu03Ob3tleo3VUpFkmG2STbBOP7JMWOSGnpE04SckDeSRPzrPz6rw571+lJafoWSdTcD4+AV+ErDM=</latexit>

= [2,2]<latexit sha1_base64="NVQbzrcan+IK7t0EDlSunHu7lAs=">AAACTHicbVDRShtBFJ2NrdpYNWkf+zI0FvqgYTcU9EWQ9qUvgqWNCtkl3J29G4fMzmxn7qphyQ/4NX1tP6Lv/Y++lUInMQWjXhg4nHPuvXNPWirpKAx/BY2VJ09X19afNTeeb25tt9ovTp2prMC+MMrY8xQcKqmxT5IUnpcWoUgVnqXjDzP97BKtk0Z/oUmJSQEjLXMpgDw1bO3EhNdUH4Mb888mp2O4jr9WkPEpj0snDwe93V4ybHXCbjgv/hBEC9BhizoZtoNWnBlRFahJKHBuEIUlJTVYkkLhtBlXDksQYxjhwEMNBbqknp8z5W88k/HcWP808Tl7t6OGwrlJkXpnAXTh7msz8jFtUFF+kNRSlxWhFreL8kpxMnyWDc+kRUFq4gEIK/1fubgAC4J8gsuT7NXeyCuF0VlSzzZl6ORIL11WY/rfM236DKP7iT0Ep71uFHajT+86R+8Xaa6zV+w1e8sits+O2Ed2wvpMsBv2jX1nP4Kfwe/gT/D31toIFj0v2VI1Vv8B5LezOA==</latexit><latexit sha1_base64="NVQbzrcan+IK7t0EDlSunHu7lAs=">AAACTHicbVDRShtBFJ2NrdpYNWkf+zI0FvqgYTcU9EWQ9qUvgqWNCtkl3J29G4fMzmxn7qphyQ/4NX1tP6Lv/Y++lUInMQWjXhg4nHPuvXNPWirpKAx/BY2VJ09X19afNTeeb25tt9ovTp2prMC+MMrY8xQcKqmxT5IUnpcWoUgVnqXjDzP97BKtk0Z/oUmJSQEjLXMpgDw1bO3EhNdUH4Mb888mp2O4jr9WkPEpj0snDwe93V4ybHXCbjgv/hBEC9BhizoZtoNWnBlRFahJKHBuEIUlJTVYkkLhtBlXDksQYxjhwEMNBbqknp8z5W88k/HcWP808Tl7t6OGwrlJkXpnAXTh7msz8jFtUFF+kNRSlxWhFreL8kpxMnyWDc+kRUFq4gEIK/1fubgAC4J8gsuT7NXeyCuF0VlSzzZl6ORIL11WY/rfM236DKP7iT0Ep71uFHajT+86R+8Xaa6zV+w1e8sits+O2Ed2wvpMsBv2jX1nP4Kfwe/gT/D31toIFj0v2VI1Vv8B5LezOA==</latexit><latexit sha1_base64="NVQbzrcan+IK7t0EDlSunHu7lAs=">AAACTHicbVDRShtBFJ2NrdpYNWkf+zI0FvqgYTcU9EWQ9qUvgqWNCtkl3J29G4fMzmxn7qphyQ/4NX1tP6Lv/Y++lUInMQWjXhg4nHPuvXNPWirpKAx/BY2VJ09X19afNTeeb25tt9ovTp2prMC+MMrY8xQcKqmxT5IUnpcWoUgVnqXjDzP97BKtk0Z/oUmJSQEjLXMpgDw1bO3EhNdUH4Mb888mp2O4jr9WkPEpj0snDwe93V4ybHXCbjgv/hBEC9BhizoZtoNWnBlRFahJKHBuEIUlJTVYkkLhtBlXDksQYxjhwEMNBbqknp8z5W88k/HcWP808Tl7t6OGwrlJkXpnAXTh7msz8jFtUFF+kNRSlxWhFreL8kpxMnyWDc+kRUFq4gEIK/1fubgAC4J8gsuT7NXeyCuF0VlSzzZl6ORIL11WY/rfM236DKP7iT0Ep71uFHajT+86R+8Xaa6zV+w1e8sits+O2Ed2wvpMsBv2jX1nP4Kfwe/gT/D31toIFj0v2VI1Vv8B5LezOA==</latexit><latexit sha1_base64="NVQbzrcan+IK7t0EDlSunHu7lAs=">AAACTHicbVDRShtBFJ2NrdpYNWkf+zI0FvqgYTcU9EWQ9qUvgqWNCtkl3J29G4fMzmxn7qphyQ/4NX1tP6Lv/Y++lUInMQWjXhg4nHPuvXNPWirpKAx/BY2VJ09X19afNTeeb25tt9ovTp2prMC+MMrY8xQcKqmxT5IUnpcWoUgVnqXjDzP97BKtk0Z/oUmJSQEjLXMpgDw1bO3EhNdUH4Mb888mp2O4jr9WkPEpj0snDwe93V4ybHXCbjgv/hBEC9BhizoZtoNWnBlRFahJKHBuEIUlJTVYkkLhtBlXDksQYxjhwEMNBbqknp8z5W88k/HcWP808Tl7t6OGwrlJkXpnAXTh7msz8jFtUFF+kNRSlxWhFreL8kpxMnyWDc+kRUFq4gEIK/1fubgAC4J8gsuT7NXeyCuF0VlSzzZl6ORIL11WY/rfM236DKP7iT0Ep71uFHajT+86R+8Xaa6zV+w1e8sits+O2Ed2wvpMsBv2jX1nP4Kfwe/gT/D31toIFj0v2VI1Vv8B5LezOA==</latexit>

<latexit sha1_base64="qq+2R9SQVnz+30b+YnJF4/j+xCY=">AAACNHicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuhTduBEqWhXaWCaTmzo4mQkzN2oJ/Q+3+hH+i+BO3PoNTtoKtnpg4HDOfc0JU8ENet6bMzU9Mzs3X1ooLy4tr6xWqmuXRmWaQZMpofR1SA0ILqGJHAVcpxpoEgq4Cu+OC//qHrThSl5gL4UgoV3JY84oWummjfCI+bmK8ZQ+9juVmlf3BnD/En9EamSERqfqVNqRYlkCEpmgxrR8L8Ugpxo5E9AvtzMDKWV3tAstSyVNwAT54Oy+u2WVyI2Vtk+iO1B/d+Q0MaaXhLYyoXhrJr1C/M9rZRgfBDmXaYYg2XBRnAkXlVtk4EZcA0PRs4Qyze2tLrulmjK0SY1P0g87XeskSkZBXmyKwPCuHPtZDuFPTb9sM/QnE/tLLnfrvlf3z/Zqh0ejNEtkg2ySbeKTfXJITkiDNAkjmjyRZ/LivDrvzofzOSydckY962QMztc31A+sCg==</latexit><latexit sha1_base64="qq+2R9SQVnz+30b+YnJF4/j+xCY=">AAACNHicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuhTduBEqWhXaWCaTmzo4mQkzN2oJ/Q+3+hH+i+BO3PoNTtoKtnpg4HDOfc0JU8ENet6bMzU9Mzs3X1ooLy4tr6xWqmuXRmWaQZMpofR1SA0ILqGJHAVcpxpoEgq4Cu+OC//qHrThSl5gL4UgoV3JY84oWummjfCI+bmK8ZQ+9juVmlf3BnD/En9EamSERqfqVNqRYlkCEpmgxrR8L8Ugpxo5E9AvtzMDKWV3tAstSyVNwAT54Oy+u2WVyI2Vtk+iO1B/d+Q0MaaXhLYyoXhrJr1C/M9rZRgfBDmXaYYg2XBRnAkXlVtk4EZcA0PRs4Qyze2tLrulmjK0SY1P0g87XeskSkZBXmyKwPCuHPtZDuFPTb9sM/QnE/tLLnfrvlf3z/Zqh0ejNEtkg2ySbeKTfXJITkiDNAkjmjyRZ/LivDrvzofzOSydckY962QMztc31A+sCg==</latexit><latexit sha1_base64="qq+2R9SQVnz+30b+YnJF4/j+xCY=">AAACNHicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuhTduBEqWhXaWCaTmzo4mQkzN2oJ/Q+3+hH+i+BO3PoNTtoKtnpg4HDOfc0JU8ENet6bMzU9Mzs3X1ooLy4tr6xWqmuXRmWaQZMpofR1SA0ILqGJHAVcpxpoEgq4Cu+OC//qHrThSl5gL4UgoV3JY84oWummjfCI+bmK8ZQ+9juVmlf3BnD/En9EamSERqfqVNqRYlkCEpmgxrR8L8Ugpxo5E9AvtzMDKWV3tAstSyVNwAT54Oy+u2WVyI2Vtk+iO1B/d+Q0MaaXhLYyoXhrJr1C/M9rZRgfBDmXaYYg2XBRnAkXlVtk4EZcA0PRs4Qyze2tLrulmjK0SY1P0g87XeskSkZBXmyKwPCuHPtZDuFPTb9sM/QnE/tLLnfrvlf3z/Zqh0ejNEtkg2ySbeKTfXJITkiDNAkjmjyRZ/LivDrvzofzOSydckY962QMztc31A+sCg==</latexit><latexit sha1_base64="qq+2R9SQVnz+30b+YnJF4/j+xCY=">AAACNHicbVDLSsNAFJ34rPXV6tJNsAhuLIkIuhTduBEqWhXaWCaTmzo4mQkzN2oJ/Q+3+hH+i+BO3PoNTtoKtnpg4HDOfc0JU8ENet6bMzU9Mzs3X1ooLy4tr6xWqmuXRmWaQZMpofR1SA0ILqGJHAVcpxpoEgq4Cu+OC//qHrThSl5gL4UgoV3JY84oWummjfCI+bmK8ZQ+9juVmlf3BnD/En9EamSERqfqVNqRYlkCEpmgxrR8L8Ugpxo5E9AvtzMDKWV3tAstSyVNwAT54Oy+u2WVyI2Vtk+iO1B/d+Q0MaaXhLYyoXhrJr1C/M9rZRgfBDmXaYYg2XBRnAkXlVtk4EZcA0PRs4Qyze2tLrulmjK0SY1P0g87XeskSkZBXmyKwPCuHPtZDuFPTb9sM/QnE/tLLnfrvlf3z/Zqh0ejNEtkg2ySbeKTfXJITkiDNAkjmjyRZ/LivDrvzofzOSydckY962QMztc31A+sCg==</latexit>

(b) SoftMax versus Mask SoftMax

Figure 2. (a) Illustration of the two networks which make upMAXL. Dashed white boxes represent data generated by neuralnetworks, solid white boxes represent given data, and colouredboxes represent functions. The double arrow represents equiva-lence. (b) Illustration of vanilla SoftMax and Mask SoftMax with2 primary classes. Vanilla SoftMax outputs over all 4 auxiliaryclasses, whereas Mask SoftMax outputs over a balanced hierar-chical structure ψ = [2, 2] to constrain the corresponding predic-tion space by each primary class.

its own set of auxiliary classes, or nodes at the network’soutput. This is achieved by use of a masked SoftMax func-tion to ensure that each output node represents an auxil-iary class corresponding to only one primary class, as de-scribed further in Section 3.3. Given input data x, thelabel-generation network then takes in the hierarchy ψ to-gether with the ground-truth primary task label ypri, andapplies Mask SoftMax to predict the auxiliary labels, de-noted by yaux = ggen

θ2(x, ypri, ψ). A visualisation of the

overall MAXL framework is shown in Figure 2. Note thatwe allow soft assignment for the generated auxiliary labels,rather than enforcing one-hot encoding, which we foundduring experiments enables greater flexibility to learn opti-mal auxiliary labels.

3.2. Model Objectives

The multi-task network is trained in a tightly-coupled man-ner with the label-generation network, with two stages perepoch. In the first stage, the multi-task network is trainedusing the auxiliary labels from the label-generation net-work. In the second stage, the label-generation networkis updated by computing its gradients with respect to theprimary task prediction performance of the multi-task net-


work. We train both networks in an iterative manner untilconvergence.

In the first stage of each epoch, given target auxiliary labelsas determined by the label-generation network, the multi-task network is trained to predict these labels for the auxil-iary task, alongside the ground-truth labels for the primarytask. For both the primary and auxiliary tasks, we applythe focal loss (Lin et al., 2017) with a focusing parameterγ = 2, defined as:

L(y, y) = −y(1− y)γ log(y), (1)

where y is the predicted label and y is the ground-truth la-bel. The focal loss helps to focus on the incorrectly pre-dicted labels, which we found improved performance dur-ing our experimental evaluation compared with the regularcross-entropy log loss.

To update parameters θ1 of the multi-task network, we de-fine the multi-task objective as follows:

argminθ1

(L(f pri

θ1(x(i)), y

pri(i)) + L(f

auxθ1 (x(i)), y

aux(i) ))

(2)

where (i) represents the ith batch from the training data,and yaux

(i) = ggenθ2

(x(i), ypri(i), ψ) is generated by the label-

generation network.

In the second stage of each epoch, the label-generation net-work is then updated by encouraging auxiliary labels tobe chosen such that, if the multi-task network were to betrained using these auxiliary labels, the performance of theprimary task would be maximised on this same trainingdata. Leveraging the performance of the multi-task net-work to train the label-generation network can be consid-ered as a form of meta learning. Therefore, to update pa-rameters θ2 of the label-generation network, we define themeta objective as follows:

argminθ2

L(f priθ+1

(x(i)), ypri(i)) . (3)

Here, θ+1 represents the weights of the multi-task network,were it to be trained with one gradient update using themulti-task loss defined in Equation 2:

θ+1 = θ1 − α∇θ1(L(f pri

θ1(x(i)), y

pri(i)) + L(f

auxθ1 (x(i)), y

aux(i))

),

(4)

where α is the learning rate.

The trick in this meta objective is that we perform a deriva-tive over a derivative (a Hessian matrix) to update θ2, byusing a retained computational graph of θ+1 in order tocompute derivatives with respect to θ2. This second deriva-tive trick was also proposed in several other meta-learning

frameworks such as (Finn et al., 2017) and (Zhang et al.,2018).

However, we found that the generated auxiliary labels caneasily collapse, such that the label-generation network al-ways generates the same auxiliary label. This leaves pa-rameters θ2 in a local minimum without producing any ex-tra useful knowledge. Therefore, to encourage the networkto learn more complex and informative auxiliary tasks, wefurther apply an entropy loss H(yaux) as a regularisationterm in the meta objective. A detailed explanation of theentropy loss and the collapsing label problem is given inSection 3.4.

Finally, the entire MAXL algorithm is defined as follows:

Algorithm 1 The MAXL algorithmInitialise: Network parameters: θ1, θ2; Hierarchical structure: ψInitialise: Learning rate: α, β; Entropy weighting: λwhile not converged do

for each training iteration i do# fetch one batch of training data(x(i), y(i)) ∈ (x, y)# training stepCompute: yaux

(i) = gθ2(x(i), ypri(i), ψ)

Compute: Lpri = L(f priθ1(x(i)), y

pri(i))

Compute: Laux = L(f auxθ1

(x(i)), yaux(i))

Update: θ1 ← θ1 − α∇θ1(Lpri + Laux)

endfor each training iteration i do

# fetch one batch of training data(x(i), y(i)) ∈ (x, y)# retain multi-task training computational graphCompute: yaux

(i) = gθ2(x(i), ypri(i), ψ)

Compute: Lpri = L(f priθ1(x(i)), y

pri(i))

Compute: Laux = L(f auxθ1

(x(i)), yaux(i))

Compute: θ+1 = θ1 − α∇θ1(Lpri + Laux)

# meta-training step (second derivative trick)Compute: Lpri

+ = L(f pri

θ+1(x(i)), y

pri(i))

Update: θ2 ← θ2 − β∇θ2(Lpri

+ + λH(yaux(i))

)end

end

3.3. Mask SoftMax for Hierarchical Predictions

For the output layer of the multi-task and label-generationnetworks, one solution is to share all auxiliary classesacross all primary classes. As such, the label-generationnetwork would predict labels across all auxiliary classes,regardless of the ground-truth primary class. However, dur-ing experiments we found that the learned auxiliary labelsdid not improve primary task generalisation, and auxiliarylabel learning was trapped in a local minimum. To solvethis, we assign each primary class its own distinct set ofauxiliary classes, such that the label-generation network


only predicts labels for this set, and outputs zeros for allother auxiliary classes. This allows the labelling of eachauxiliary class to target performance of a specific primaryclass, rather than the more difficult task of targeting all pri-mary classes. The relationship between a primary class andits set of auxiliary classes can be considered a hierarchywhich we denote ψ, which defines the number of auxiliaryclasses per primary class.

To implement this, we designed a modified SoftMax func-tion, which we call Mask SoftMax, to predict auxiliary la-bels only for certain auxiliary classes. This takes ground-truth primary task label y, and the hierarchy ψ, and createsa binary maskM = B(y, ψ). The mask is zero everywhere,except for ones across the set of auxiliary classes associatedwith y. For example, consider a primary task with 2 classesy = 0, 1, and a hierarchy of ψ = [2, 2] as in Figure 2b. Inthis case, the binary masks are M = [1, 1, 0, 0] for y = 0,and [0, 0, 1, 1] for y = 1.

Applying this mask element-wise to the standard SoftMaxfunction then allows the label-prediction network to assignauxiliary labels only to relevant auxiliary classes:

SoftMax: p(yi) =exp yi∑i exp yi

(5)

Mask SoftMax: p(yi) =expM � yi∑i expM � yi

, (6)

where p(yi) represents the probability of the generated aux-iliary label y over class i, and � represents element-wisemultiplication.

Note that no domain knowledge is required to define the hi-erarchy, and MAXL performs well across a range of valuesfor ψ as shown in Section 4.2.

3.4. The Collapsing Class Problem

During experiments, we found that the label-generationnetwork would often predict close to zero for some aux-iliary classes, regardless of the input. This collapsing classproblem is a local minima in learning, and results in no ad-ditional features being learned from the auxiliary labels.We found that this phenomenon is particularly apparentwhen we either have a large learning rate for training thelabel-generation network, or a large auxiliary class predic-tion space ψ.

To avoid this, we introduced an additional regularisationloss, which we call the entropy loss H(y(i)). This en-courages high entropy across the auxiliary class predictionspace, which in turn encourages the label-prediction net-work to fully utilise all auxiliary classes. The entropy losscalculates the KL divergence between the predicted auxil-iary label space y(i), and a uniform distribution U , for eachith batch. This is equivalent to calculating the entropy of

the predicted label space, and is defined as:

H(y(i)) =K∑k=1

yk log yk, yk =1

N

N∑i=1

y(i). (7)

where K is the total number of auxiliary classes, and N isthe training batch size.

4. ExperimentsIn this section, we present experimental results to evaluateMAXL with respect to several baselines and datasets onimage classification.

4.1. Experimental Setup

Datasets We evaluated on seven different datasets, withvarying sizes and complexities. One of these, CIFAR-100(Krizhevsky & Hinton, 2009), contains a manually-defined2-level hierarchical structure, which we expanded into a4-level hierarchy by manually assigning data for the newlevels (see Appendix A), to create a hierarchy of {3, 10,20, 100} classes. This hierarchy was then used to gen-erate human-defined auxiliary labels. For the other sixdatasets, MNIST (LeCun et al., 1998), SVHN (Goodfel-low et al., 2013), CIFAR-10 (Krizhevsky & Hinton, 2009),STL-10 (Coates et al., 2011), CINIC-10 (Darlow et al.,2018) and PASCAL VOC-2012 (Everingham et al.), inwhich a hierarchy is either not available or difficult to ac-cess, we then ran an ablative analysis of MAXL by study-ing its performance across a range of hierarchical structuresψ[i] = 2, 3, 5, 10, 20,∀i.

Baselines We compared MAXL to a number of base-lines. Single Task trains only with the primary class labeland does not employ auxiliary learning. Random uses aux-iliary learning, and assigns each training image to randomauxiliary classes in a randomly generated well-balancedhierarchy. Nearest Neighbour is a differentiable, nearest-neighbour clustering method (Snell et al., 2017). First,10% of training images are randomly selected, togetherwith their manually-defined auxiliary labels, and embed-ded using a pre-trained ImageNet network (Deng et al.,2009). Clustering then creates a prototype for each aux-iliary class, and the auxiliary label for all other trainingimage is defined by the most similar prototype in this em-bedding. Similar to MAXL, the performance of the clus-tering on the primary task performance is then used to re-fine the clustering over training epochs. Finally, Humanalso uses auxiliary learning, but the auxiliary labels are as-signed using the human-defined hierarchy of CIFAR-100,where the auxiliary classes are at a lower (finer-grained)level than the primary classes. Note that due to the needfor a manually-defined hierarchy, Nearest Neighbour and


Human were only evaluated on CIFAR-100. For CIFAR-100, we use the VGG-16 network architecture (Simonyan& Zisserman, 2014) for all baselines, and both the multi-task and label-generation networks in MAXL. For the othersix datasets, the backbone network architectures are indi-cated in Table 1.

Training For all experiments, we used a learning rateof 0.01 for the multi-task network. For MAXL’s label-generation network, we found that a smaller learning rateof 10−3 was necessary to help prevent the class collaps-ing problem. For all training, we dropped the learningrate by half after every 50 epochs, and trained for a totalof 200 epochs, using vanilla stochastic gradient descent.For the label-generation network, we applied an L1 normweight decay of 5 · 10−4 on the label-generation network,with no regularisation on the multi-task network. We chosethe weighting of the entropy regularisation loss term to beλ = 0.2 based on empirical performance.

4.2. Comparison to Single Task Learning

First, we compare MAXL to a single-task learning base-line, to determine whether MAXL performs better despiteneither method using manually-defined auxiliary labels.Since the power of MAXL lies in its ability to work with-out domain knowledge, we tested MAXL across a range ofhierarchies ψ, to study if it works well without needing totune this hierarchy for each dataset. Here, the hierarchiesare well balanced such that ψ[i] is the same for all i (for allprimary classes). Table 1 shows the test accuracy of MAXLacross several hierarchies, together with test accuracy forsingle-task learning, where only the primary task labels areused. We see that MAXL consistently outperforms single-task learning across all six datasets, despite both methodsusing the same training data. We also see that MAXL out-performs single-task learning across all tested values of ψ,showing the robustness of our method without needing do-main knowledge or hyperparameter search to select an op-timal ψ.

Datasets Backbone SingleMAXL, ψ[i] =

2 3 5 10 20

MNIST 2-layer NN 97.78 97.89 98.04 97.99 97.81 97.95SVHN 2-layer NN 84.28 84.56 85.15 84.77 84.46 84.30CIFAR-10 VGG-16 84.14 88.12 88.37 88.79 88.32 88.53STL-10 VGG-16 66.85 70.11 70.21 70.35 71.87 72.86CINIC-10 ResNet-50 71.77 72.43 72.91 73.03 73.20 73.18PASCAL-VOC12S ResNet-50 30.47 33.52 34.96 34.67 34.24 33.91

Table 1. Comparison of MAXL with single-task learning, acrossa range of hierarchies. The best performance for each dataset ismarked with bold. Note that ’PASCAL-VOC12S’ represents asmaller size version of the standard ’PASCAL-VOC12’ datasetwith the resolution reduced to [96× 96] for faster training.

4.3. Comparison to Auxiliary Learning Baselines

Next, we compare MAXL to a number of baseline meth-ods for generating auxiliary labels using CIFAR-100. Thisdataset has a manually-defined hierarchy, which is used inthe Human and Nearest Neighbour baselines. However,MAXL and the Random baseline do not require any humanknowledge to generate auxiliary labels. Instead, as in Sec-tion 4.2, a hierarchy ψ is defined, by splitting all auxiliaryclasses into sets, with one set per primary class. We cre-ate a well-balanced hierarchy by assigning an equal num-ber of auxiliary classes per primary class. If the numberof primary classes divided by the total number of auxil-iary classes is not an integer, then we allow some primaryclasses to be assigned one more or less auxiliary classesthan other primary classes. We run each experiment threetimes and average the results, and for cases when the hi-erarchy is unbalanced by one auxiliary class, we randomlychoose which primary classes are assigned each number ofauxiliary classes in ψ.

0 100 200Epoch

0.725

0.750

0.775Tes

tPer

form

ance

(Acc

.)

0 100 200Epoch

0.725

0.750

0.775

Tes

tPer

form

ance

(Acc

.)

0 100 200Epoch

0.850

0.875

0.900

Tes

tPer

form

ance

(Acc

.)

0 100 200Epoch

0.850

0.875

0.900

Tes

tPer

form

ance

(Acc

.)

0 100 200Epoch

0.850

0.875

0.900

Tes

tPer

form

ance

(Acc

.)

0 100 200Epoch

0.675

0.700

0.725

Tes

tPer

form

ance

(Acc

.)

Human MAXL Single Task

PRI 3 | AUX 10 PRI 3 | AUX 20

PRI 20 | AUX 100PRI 10 | AUX 100PRI 10 | AUX 20

PRI 3 | AUX 100

Random Nearest Neighbour

Figure 3. Learning curves for the CIFAR-100 test dataset, com-paring MAXL with baseline methods for generating auxiliary la-bels. Our version of CIFAR-100 has a four-level hierarchy of {3,10, 20, 100} classes per level, and we use this to create the hier-archy ψ for auxiliary learning. Learning curves show the meanaccuracy over the last 10 epochs.

Test accuracy curves are presented in Figure 3, using allpossible combinations in the numbers of primary classesand total auxiliary classes in CIFAR-100. We observethat MAXL outperforms both Single Task and Random,with all three methods using the same training data (noground-truth auxiliary labels). We also observe that MAXLoutperforms Nearest Neighbour, despite this baseline us-ing 10% of the manually-defined auxiliary labels from thedataset. We therefore see that MAXL is able to learn auxil-iary tasks effectively by tightly coupling the auxiliary la-bel generation and the primary task training, in a supe-rior manner than when these auxiliary labels are assignedindependently. Finally, we observe that MAXL performssimilarly to Human, despite this baseline using manually-defined auxiliary labels for the entire training dataset. Withperformance of MAXL similar to that of a system using a


human-defined auxiliary labels, we see a strong evidencethat MAXL is able to learn to generalise effectively in aself-supervised manner.

4.4. Understanding the Utility of Auxiliary Labels

In (Du et al., 2018), the cosine similarity between gradi-ents produced by the auxiliary and primary losses was usedto determine the task weighting in the overall loss function.We use this same idea to visualise the utility of a set of aux-iliary labels for improving the performance of the primarytask. Intuitively, a cosine similarity of -1 indicates that theauxiliary labels work against the primary task. A cosinesimilarity of 0 indicates that the auxiliary labels have noimpact on the primary task. And a cosine similarity of 1indicates that the auxiliary labels are learning the same fea-tures as the primary task and so offer no useful information.Therefore, the cosine similarity for the gradient producedfrom optimal auxiliary labels should be between 0 and 1 toensure that they assist the primary task.

Human

MAXL

Random

Nearest Neighbour0 50 100 150 200

Epoch

0.0

0.2

0.4

0.6

Cos

ine

Sim

ilari

ty

Figure 4. Cosine similarity measurement between the auxiliaryloss gradient and primary loss gradient, on the shared represen-tation in the multi-task network. These results are for a hierarchyof 20 primary classes and 100 total auxiliary classes.

In Figure 4, we show the cosine similarity measurementsof gradients in the shared layers of the multi-task network,trained on 20 primary classes and 100 total auxiliary classesfrom CIFAR-100. We observe that all baseline methodsreach their maximal similarity at around 20 epochs, whichthen drops significantly afterwards. In comparison, MAXLproduces auxiliary gradients with high similarity through-out the entire training period. Whilst we cannot say whatthe optimal cosine similarity should be, it is clear thatMAXL producing auxiliary labels affect primary task per-formance in a very different way to the other baselines.Similar trends across all six hierarchies of Figure 3 areshown in Appendix B.

4.5. Visualisations of Generated Knowledge

In Figure 5, we visualise 2D embeddings of examples fromthe CIFAR-100 test dataset, on two different hierarchies.The visualisations are computed using t-SNE (Maaten &Hinton, 2008) on the final feature layer of the multi-tasknetwork, and compared across three methods: our MAXLmethod, the Human baseline, and the Single Task baseline.

This visualisation shows the separability of primary classes

Single Task Human MAXL

Single Task Human MAXL

PRI 3 | AUX 10

PRI 20 | AUX 100

Figure 5. t-SNE visualisation of the learned final layer of themulti-task network, trained on CIFAR-100 with two different hi-erarchies. Colours represent the primary classes.

after being trained with the multi-task network. We seethat both MAXL and Human show better separation of theprimary classes than with Single Task, owing to the gen-eralisation effect of the auxiliary learning. It again showsthe effectiveness of MAXL whilst requiring no additionalhuman knowledge.

We also show examples of images assigned to the sameauxiliary class through MAXL’s label-generation network.Figure 6 shows example images with the highest predic-tion probabilities for three random auxiliary classes fromCIFAR-100, using the hierarchy of 20 primary classes and100 total auxiliary classes (5 auxiliary classes per primaryclass), which showed the best performance of MAXL inFigure 3. In addition, we also present examples on MNIST,in which 3 auxiliary classes were used for each of the 10primary classes.

To our initial surprise, only part of the generated aux-iliary labels visualised in both dataset show human-understandable knowledge. For example, we can observethat the auxiliary classes #1 and #2 of digit nine are clus-tered by the direction of the ‘tail’; auxiliary classes #2and #3 of digit seven are clustered by the distinction ofthe ‘horizontal line’. But in most cases, there are no ob-vious similarities within each auxiliary class in terms ofshape, colour, style, structure or semantic meaning. How-ever, this makes more sense when we re-consider the role ofthe label-generation network, which is to assign auxiliarylabels which assist the primary task, rather than groupingimages in terms of semantic or visual similarity. The label-generation network would therefore be more effective if itwere to group images in terms of a shared aspect of rea-soning which the primary task data does not include. If themulti-task network is then able to improve its ability to de-termine the auxiliary class of an image in such a cluster,then the learned features will help in overcoming this chal-


Auxiliary Class #1 Auxiliary Class #2 Auxiliary Class #3 Primary Class

aquaticmammals

householdfurnitures

invertebrates

naturescenes

flowers

digit 0

digit 3

digit 5

digit 7

digit 9

Figure 6. Visualisation of 5 test examples with the highest prediction probability, for each of 3 randomly selected auxiliary classes, fordifferent primary classes. We present the visualisation for CIFAR-100 (top) when trained with 20 primary classes and 5 auxiliary classesper primary class, and for MNIST (bottom) when trained with 10 primary classes and 3 auxiliary classes per primary class.

lenging aspect of reasoning. It therefore makes sense thatthe examples within an auxiliary class do not share seman-tic or visual similarity, but instead share a more complexunderlying property.

Further, we discovered that the generated auxiliary knowl-edge is not deterministic, since the top predicted candidatesare different when we re-train the network from scratch.We therefore speculate that using a human-defined hierar-chy is just one out of a potentially infinite number of localoptima, and on each run of training the label-generationnetwork produces another of these local optima.

5. Conclusion & Future WorkIn this paper, we have presented Meta AuXiliary Learn-ing (MAXL) for generating optimal auxiliary labels which,when trained alongside a primary task in a multi-task setup,improve the performance of the primary task. Rather thanemploying domain knowledge and human-defined auxil-iary tasks as is typically required, MAXL is self-supervisedand, combined with its general nature, has the potential toautomate the process of generalisation to new levels.

Our evaluation on multiple datasets has shown the perfor-mance of MAXL in an image classification setup, wherethe auxiliary task is to predict sub-class, hierarchical labelsfor an image. We have shown that MAXL significantlyoutperforms other baselines for generating auxiliary labels,and even when human-defined knowledge is used to man-ually construct the auxiliary labels, MAXL is competitive.

The general nature of MAXL also opens up questions abouthow self-supervised auxiliary learning may be used to learngeneric auxiliary tasks, beyond sub-class image classifi-cation. During our experiments, we also ran preliminaryexperiments on predicting arbitrary vectors such that theauxiliary task becomes a regression, but results so far havebeen inconclusive. However, the ability of MAXL to poten-tially learn flexible auxiliary tasks which can automaticallybe tuned for the primary task, now offers an exciting direc-tion towards automated generalisation across a wide rangeof more complex tasks.

AcknowledgementsWe would like to thank Michael Bloesch, Fabian Falck andStephen James for insightful discussions.


ReferencesAndrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W.,

Pfau, D., Schaul, T., Shillingford, B., and De Freitas,N. Learning to learn by gradient descent by gradientdescent. In Advances in Neural Information ProcessingSystems, pp. 3981–3989, 2016.

Bengio, S., Bengio, Y., Cloutier, J., and Gecsei, J. On theoptimization of a synaptic learning rule. In PreprintsConf. Optimality in Artificial and Biological Neural Net-works, pp. 6–8. Univ. of Texas, 1992.

Bengio, Y., Bengio, S., and Cloutier, J. Learning a synap-tic learning rule. Universite de Montreal, Departementd’informatique et de recherche operationnelle, 1990.

Caruana, R. Multitask learning. In Learning to learn, pp.95–133. Springer, 1998.

Coates, A., Ng, A., and Lee, H. An analysis of single-layernetworks in unsupervised feature learning. In Proceed-ings of the fourteenth international conference on artifi-cial intelligence and statistics, pp. 215–223, 2011.

Darlow, L. N., Crowley, E. J., Antoniou, A., and Storkey,A. J. Cinic-10 is not imagenet or cifar-10. arXiv preprintarXiv:1810.03505, 2018.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., andFei-Fei, L. Imagenet: A large-scale hierarchical imagedatabase. In Computer Vision and Pattern Recognition,2009. CVPR 2009. IEEE Conference on, pp. 248–255.Ieee, 2009.

Doersch, C. and Zisserman, A. Multi-task self-supervisedvisual learning. In The IEEE International Conferenceon Computer Vision (ICCV), Oct 2017.

Du, Y., Czarnecki, W. M., Jayakumar, S. M., Pas-canu, R., and Lakshminarayanan, B. Adapting aux-iliary losses using gradient similarity. arXiv preprintarXiv:1812.02224, 2018.

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J.,and Zisserman, A. The PASCAL Visual Object ClassesChallenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.

Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Interna-tional Conference on Machine Learning, pp. 1126–1135,2017.

Flynn, J., Neulander, I., Philbin, J., and Snavely, N. Deep-stereo: Learning to predict new views from the world’simagery. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, pp. 5515–5524, 2016.

Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., andShet, V. Multi-digit number recognition from streetview imagery using deep convolutional neural networks.arXiv preprint arXiv:1312.6082, 2013.

Jaderberg, M., Czarnecki, W. M., Osindero, S., Vinyals,O., Graves, A., Silver, D., and Kavukcuoglu, K. Decou-pled neural interfaces using synthetic gradients. arXivpreprint arXiv:1608.05343, 2016.

Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T.,Leibo, J. Z., Silver, D., and Kavukcuoglu, K. Reinforce-ment learning with unsupervised auxiliary tasks. Inter-national Conference on Learning Representations, 2017.

Kokkinos, I. Ubernet: Training a universal convolutionalneural network for low-, mid-, and high-level vision us-ing diverse datasets and limited memory. In The IEEEConference on Computer Vision and Pattern Recognition(CVPR), July 2017.

Krizhevsky, A. and Hinton, G. Learning multiple layersof features from tiny images. Technical report, Citeseer,2009.

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Pro-ceedings of the IEEE, 86(11):2278–2324, 1998.

Li, Z., Zhou, F., Chen, F., and Li, H. Meta-sgd: Learningto learn quickly for few shot learning. arXiv preprintarXiv:1707.09835, 2017.

Liebel, L. and Korner, M. Auxiliary tasks in multi-tasklearning. arXiv preprint arXiv:1805.06334, 2018.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P.Focal loss for dense object detection. arXiv preprintarXiv:1708.02002, 2017.

Liu, S., Johns, E., and Davison, A. J. End-to-endmulti-task learning with attention. arXiv preprintarXiv:1803.10704, 2018.

Maaten, L. v. d. and Hinton, G. Visualizing data usingt-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.

Misra, I., Shrivastava, A., Gupta, A., and Hebert, M. Cross-stitch networks for multi-task learning. In Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition, pp. 3994–4003, 2016.

Ravi, S. and Larochelle, H. Optimization as a model forfew-shot learning. 2016.

Ruder, S. An overview of multi-task learning in deep neuralnetworks. arXiv preprint arXiv:1706.05098, 2017.

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., andLillicrap, T. One-shot learning with memory-augmented


neural networks. arXiv preprint arXiv:1605.06065,2016.

Schmidhuber, J. Learning complex, extended sequencesusing the principle of history compression. Neural Com-putation, 4(2):234–242, 1992.

Simonyan, K. and Zisserman, A. Very deep convolu-tional networks for large-scale image recognition. arXivpreprint arXiv:1409.1556, 2014.

Snell, J., Swersky, K., and Zemel, R. Prototypical networksfor few-shot learning. In Advances in Neural Informa-tion Processing Systems, pp. 4077–4087, 2017.

Toshniwal, S., Tang, H., Lu, L., and Livescu, K. Multi-task learning with low-level auxiliary tasks for encoder-decoder based speech recognition. arXiv preprintarXiv:1704.01631, 2017.

Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.Matching networks for one shot learning. In Advances inNeural Information Processing Systems, pp. 3630–3638,2016.

Zhang, Y., Tang, H., and Jia, K. Fine-grained visualcategorization using meta-learning optimization withsample selection of auxiliary data. arXiv preprintarXiv:1807.10916, 2018.

Zhou, T., Brown, M., Snavely, N., and Lowe, D. G. Unsu-pervised learning of depth and ego-motion from video.In CVPR, 2017.


A. 4-Level CIFAR-100 Dataset

3 Class 10 Class 20 Class 100 Class

animals

large animals

reptiles crocodile, dinosaur, lizard, snake, turtle

large carnivores bear, leopard, lion, tiger, wolf

large omnivores and herbivores camel, cattle, chimpanzee, elephant, kangaroo

medium animalsaquatic mammals beaver, dolphin, otter, seal, whale

medium-sized mammals fox, porcupine, possum, raccoon, skunk

small animalssmall mammals hamster, mouse, rabbit, shrew, squirrel

fish aquarium fish, flatfish, ray, shark, trout

invertebratesinsects bee, beetle, butterfly, caterpillar, cockroach

non-insect invertebrates crab, lobster, snail, spider, worm

people people baby, boy, girl, man, woman

vegetations vegetations

flowers orchids, poppies, roses, sunflowers, tulips

fruit and vegetables apples, mushrooms, oranges, pears, peppers

trees maple, oak, palm, pine, willow

objects and scenes

household objects

food containers bottles, bowls, cans, cups, plates

household electrical devices clock, keyboard, lamp, telephone, television

household furniture bed, chair, couch, table, wardrobe

construction large man-made outdoor things bridge, castle, house, road, skyscraper

natural scenes large natural outdoor scenes cloud, forest, mountain, plain, sea

vechiclesvehicles 1 bicycle, bus, motorcycle, pickup truck, train

vehicles 2 lawn-mower, rocket, streetcar, tank, tractor

Table 2. Building CIFAR-100 dataset in 4-level hierarchy.

B. Cosine Similarity on CIFAR-100 Dataset

0 50 100 150 200Epoch

0.0

0.2

0.4

0.6

Cos

ine

Sim

ilari

ty

0 50 100 150 200Epoch

0.0

0.2

0.4

0.6

Cos

ine

Sim

ilari

ty

0 50 100 150 200Epoch

0.0

0.2

0.4

0.6

Cos

ine

Sim

ilari

ty

0 50 100 150 200Epoch

0.0

0.2

0.4

0.6

Cos

ine

Sim

ilari

ty

0 50 100 150 200Epoch

0.0

0.2

0.4

0.6

Cos

ine

Sim

ilari

ty

0 50 100 150 200Epoch

0.0

0.2

0.4

0.6

Cos

ine

Sim

ilari

ty

Human MAXL

PRI 3 | AUX 10 PRI 3 | AUX 20

PRI 20 | AUX 100PRI 10 | AUX 100PRI 10 | AUX 20

PRI 3 | AUX 100

Random Nearest Neighbour

Figure 7. Cosine similarity measurement between the auxiliary loss gradient and primary loss gradient, on the shared representation inthe multi-task network. The results are for all six hierarchies defined in CIFAR-100 dataset.

self-supervised generalisation with meta auxiliary learningmeta auxiliary learning (maxl),...

Documents