decision jungles - geekstackgeekstack.net/resources/...decision_jungles_slides.pdf · winn, and...

106
Decision Jungles Tobias Pohlen March 8, 2015 Tobias Pohlen | March 8, 2015 0/60 Decision Jungles Tobias Pohlen | March 8, 2015 1/60

Upload: others

Post on 20-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Decision Jungles

Tobias Pohlen

March 8, 2015

Tobias Pohlen | March 8, 2015 0/60

Decision Jungles

Tobias Pohlen | March 8, 2015 1/60

Page 2: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Literature

I Introduction

I Training

I Implementation Details

I Experiments and Results

Outline

Tobias Pohlen | March 8, 2015 2/60

Page 3: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, JohnWinn, and Antonio Criminisi. Decision Jungles: Compact and rich modelsfor classification. Advances in Neural Information Processing System 26,pages 234-242. Curran Associates, Inc., 2013

I Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, JohnWinn, and Antonio Criminisi. Decision Jungles: Compact and rich modelsfor classification. Supplemental material. 2013

I Piotr Dollár, Piotr’s Image and Video Matlab Toolbox (PMT).http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html.

Literature

Tobias Pohlen | March 8, 2015 3/60

Page 4: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

ObjectiveI Solve the multiclass classification problem

Definition (Multiclass classification problem)

GivenI A training set X = {(x1,y1), ...,(xN ,yN)} ⊂ Rn×{1, ...,C}

I training examples xi ∈ Rn

I class labels yi ∈ {1, ...,C}ProblemI Assign the previously unseen data point x to one of the classes 1, ...,C

The Classification Problem

Tobias Pohlen | March 8, 2015 4/60

Page 5: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Binary decision tree)

A binary decision tree is a binary treeG = (V,E) with the following properties:

An internal node v is augmented with

I Feature dimension dv ∈ {1, ...,n}I Threshold θv ∈ R

A leaf node v is augmented with

I Class label cv

I or class histogram hv : {1, ...,C} 7→ R

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Binary Decision Trees

Tobias Pohlen | March 8, 2015 5/60

Page 6: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Binary decision tree)

A binary decision tree is a binary treeG = (V,E) with the following properties:

An internal node v is augmented with

I Feature dimension dv ∈ {1, ...,n}I Threshold θv ∈ R

A leaf node v is augmented with

I Class label cv

I or class histogram hv : {1, ...,C} 7→ R

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Binary Decision Trees

Tobias Pohlen | March 8, 2015 5/60

Page 7: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Binary decision tree)

A binary decision tree is a binary treeG = (V,E) with the following properties:

An internal node v is augmented with

I Feature dimension dv ∈ {1, ...,n}I Threshold θv ∈ R

A leaf node v is augmented with

I Class label cv

I or class histogram hv : {1, ...,C} 7→ R

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Binary Decision Trees

Tobias Pohlen | March 8, 2015 5/60

Page 8: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Classifier semantics)

A data point x ∈ Rn is assigned to a class bypassing it along the tree according to thesplits defined by dv and θv.

Example

Classify x =

(x1x2

)=

(24

)

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Classifying Data Points

Tobias Pohlen | March 8, 2015 6/60

Page 9: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Classifier semantics)

A data point x ∈ Rn is assigned to a class bypassing it along the tree according to thesplits defined by dv and θv.

Example

Classify x =

(x1x2

)=

(24

)

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Classifying Data Points

Tobias Pohlen | March 8, 2015 6/60

Page 10: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Classifier semantics)

A data point x ∈ Rn is assigned to a class bypassing it along the tree according to thesplits defined by dv and θv.

Example

Classify x =

(x1x2

)=

(24

)

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Classifying Data Points

Tobias Pohlen | March 8, 2015 6/60

Page 11: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Classifier semantics)

A data point x ∈ Rn is assigned to a class bypassing it along the tree according to thesplits defined by dv and θv.

Example

Classify x =

(x1x2

)=

(24

)

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Classifying Data Points

Tobias Pohlen | March 8, 2015 6/60

Page 12: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition (Classifier semantics)

A data point x ∈ Rn is assigned to a class bypassing it along the tree according to thesplits defined by dv and θv.

Example

Classify x =

(x1x2

)=

(24

)

x1 ≤ 3

x2 ≤ 2

1 3

x2 ≤ 3.5

2 3

Classifying Data Points

Tobias Pohlen | March 8, 2015 6/60

Page 13: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Let E be some objective function.

Deterministic decision trees

At each node v, determine dv andθv such that

E(dv,θv) = mind∈{1,...,n},θ∈R

E(d,θ)

Random decision trees

At each node v, determine dv andθv such that

E(dv,θv) = mind∈F ,θ∈R

E(d,θ)

where

I F ⊆ {1, ...,C} is a randomselection of features

Random Decision Trees

Tobias Pohlen | March 8, 2015 7/60

Page 14: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Let E be some objective function.

Deterministic decision trees

At each node v, determine dv andθv such that

E(dv,θv) = mind∈{1,...,n},θ∈R

E(d,θ)

Random decision trees

At each node v, determine dv andθv such that

E(dv,θv) = mind∈F ,θ∈R

E(d,θ)

where

I F ⊆ {1, ...,C} is a randomselection of features

Random Decision Trees

Tobias Pohlen | March 8, 2015 7/60

Page 15: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition

A random forest F = (G1, ...,Gm) is an ensemble of random decision treesGi.

Classification

A data point x ∈ Rn is assigned to the class that receives the most votes.

Random Forests

Tobias Pohlen | March 8, 2015 8/60

Page 16: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition

A random forest F = (G1, ...,Gm) is an ensemble of random decision treesGi.

Classification

A data point x ∈ Rn is assigned to the class that receives the most votes.

Random Forests

Tobias Pohlen | March 8, 2015 8/60

Page 17: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Initially proposed by Breiman in 2001 [1]

I High classification accuracy by learning uncorrelated trees

I Fast training due to random feature selection

I Fast evaluation

Random Forests: Discussion

Tobias Pohlen | March 8, 2015 9/60

Page 18: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Initially proposed by Breiman in 2001 [1]

I High classification accuracy by learning uncorrelated trees

I Fast training due to random feature selection

I Fast evaluation

Random Forests: Discussion

Tobias Pohlen | March 8, 2015 9/60

Page 19: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Initially proposed by Breiman in 2001 [1]

I High classification accuracy by learning uncorrelated trees

I Fast training due to random feature selection

I Fast evaluation

Random Forests: Discussion

Tobias Pohlen | March 8, 2015 9/60

Page 20: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Initially proposed by Breiman in 2001 [1]

I High classification accuracy by learning uncorrelated trees

I Fast training due to random feature selection

I Fast evaluation

Random Forests: Discussion

Tobias Pohlen | March 8, 2015 9/60

Page 21: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I High memory consumption: O(2d)I Memory consumption grows exponentially with the depth d of the trees

I Especially a problem for memory constraint scenarios. E.g.I Embedded systemsI Mobile devices

Random Forests: Problem

Tobias Pohlen | March 8, 2015 10/60

Page 22: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Idea: Instead of a tree graph, use a directed acyclic graph (DAG).

x1 ≤ 3

x2 ≤ 2

x1 ≤ 1

1 2

x2 ≤ 3

1 2

x2 ≤ 3.5

x2 ≤ 3

1 2

x1 ≤ 5

2 3

Decision DAGs: Concept

Tobias Pohlen | March 8, 2015 11/60

Page 23: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Idea: Instead of a tree graph, use a directed acyclic graph (DAG).

x1 ≤ 3

x2 ≤ 2 x2 ≤ 3.5

x1 ≤ 1 x2 ≤ 3 x1 ≤ 5

1 2 3

Decision DAGs: Concept

Tobias Pohlen | March 8, 2015 11/60

Page 24: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Control the memory consumptionby limiting the width of the DAG bya merging schedule

s : N 7→ N,d 7→ s(d)

If s(d)≤ S ∀ d⇒, then thememory consumtion is O(dS),where d is the depth.

x1 ≤ 3

x2 ≤ 2 x2 ≤ 3.5

x1 ≤ 1 x2 ≤ 3 x1 ≤ 5

1 2 3

Decision DAGs: Memory consumption

Tobias Pohlen | March 8, 2015 12/60

Page 25: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

A typical choice is

s : N 7→ N,d 7→ s(d) = min(2d,2D)

where D ∈ N is a constant.

Example (D = 7)

s : N 7→ N,d 7→min(2d,128)

x1 ≤ 3

x2 ≤ 2 x2 ≤ 3.5

x1 ≤ 1 x2 ≤ 3 x1 ≤ 5

1 2 3

Decision DAGs: Memory consumption

Tobias Pohlen | March 8, 2015 12/60

Page 26: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition

A decision DAG is a directedacyclic graph G = (V,E) with thefollwing properties:An internal node v is augmentedwith

I Feature dimensiondv ∈ {1, ...,n}

I Threshold θv ∈ RI Left child node lv ∈ VI Right child node rv ∈ V

x1 ≤ 3

x2 ≤ 2 x2 ≤ 3.5

x1 ≤ 1 x2 ≤ 3 x1 ≤ 5

1 2 3

Decision DAGs: Parameters

Tobias Pohlen | March 8, 2015 13/60

Page 27: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition

A random decision DAG is a decision DAG whose parameters are sampledfrom some probability distribution.

Definition

A decision jungle J = (G1, ...,Gm) is an ensemble of random decision DAGsGi.

Decision jungles were proposed by J. Shotton et al. at NIPS 2013 [2].

Decision Jungles

Tobias Pohlen | March 8, 2015 14/60

Page 28: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Definition

A random decision DAG is a decision DAG whose parameters are sampledfrom some probability distribution.

Definition

A decision jungle J = (G1, ...,Gm) is an ensemble of random decision DAGsGi.

Decision jungles were proposed by J. Shotton et al. at NIPS 2013 [2].

Decision Jungles

Tobias Pohlen | March 8, 2015 14/60

Page 29: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Binary decision trees

At each node v optimize

I the feature dv

I the threshold θv

Decision DAGs

At each node v optimize

I the feature dv

I the threshold θv

I the left child node lvI the right child node rv

Conclusion

The graph structure and the thresholds/features need to be optimizedsimultaneously.

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 15/60

Page 30: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Binary decision trees

At each node v optimize

I the feature dv

I the threshold θv

Decision DAGs

At each node v optimize

I the feature dv

I the threshold θv

I the left child node lvI the right child node rv

Conclusion

The graph structure and the thresholds/features need to be optimizedsimultaneously.

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 15/60

Page 31: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Technically, this is also a decision DAG.

x1 ≤ 3

x2 ≤ 2 x2 ≤ 3.5

x1 ≤ 1 x2 ≤ 3 x1 ≤ 5

1 2 3

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 16/60

Page 32: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Shotton et al. assumed a level-wise graph structure for optimization.

x1 ≤ 3

x2 ≤ 2 x2 ≤ 3.5

x1 ≤ 1 x2 ≤ 3 x1 ≤ 5

1 2 3

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 16/60

Page 33: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

The DAG is trained level-wise. Let s be a merging schedule(e.g. s(d) = min(2d,128)).

1: G← ({root}, /0)2: for d = 1,2,.. do3: Add s(d) new nodes to G4: Initialize the parameters of the former leaf nodes5: Optimize the parameters of the former leaf nodes6: end for

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 17/60

Page 34: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

The DAG is trained level-wise. Let s be a merging schedule(e.g. s(d) = min(2d,128)).

1: G← ({root}, /0)2: for d = 1,2,.. do3: Add s(d) new nodes to G4: Initialize the parameters of the former leaf nodes5: Optimize the parameters of the former leaf nodes6: end for

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 17/60

Page 35: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

0

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 18/60

Page 36: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

x? ≤?

? ?

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 18/60

Page 37: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

x? ≤? x? ≤?

x2 ≤ 4

? ? ?

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 18/60

Page 38: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

x2 ≤ 1 x1 ≤ 6

x2 ≤ 4

x? ≤? x? ≤? x? ≤?

? ? ?

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 18/60

Page 39: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

x2 ≤ 1 x1 ≤ 6

x2 ≤ 4

x1 ≤−1 x2 ≤ 0 x1 ≤ 1

1 2 1

Decision DAGs: Training

Tobias Pohlen | March 8, 2015 18/60

Page 40: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Naming convention:

p1 p2 . . . pk−1 pk Parent nodes

c1 c2

. . .

cl−1 cl Child nodes

I feature dimension dpi

I threshold θpi

I left child node lpi

I right child node rpi

I Spi and Scj are the training setsat nodes pi and cj respectively

Level Optimization

Tobias Pohlen | March 8, 2015 19/60

Page 41: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Goal: Find the optimal parameters for the parent nodes in terms of anobjective function E.

Definition

Let X ⊂ Rn×{1, ...,C} be a training set. The entropy H(X) is defined as

H(X) =−C

∑i=1

p(i) log2 p(i)

where

p(i) =|{(x,y) ∈ X : y = i}|

|X|

Objective Function I

Tobias Pohlen | March 8, 2015 20/60

Page 42: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

The objective function E is defined in terms of the entropies at the childnodes.

E(Θ1, ...,Θk) =l

∑i=1|Sci |H(Sci)

where

I Θi = (dpi ,θpi , lpi ,rpi) are the parameters of pi

Objective Function II

Tobias Pohlen | March 8, 2015 21/60

Page 43: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

The connection between the Θ1, ...,Θk and the Sc1 , ...,Scl becomes apparentwhen looking at the definition of Sci :

Sci =⋃

j=1,..,k : lpj=ci

{(x,y) ∈ Spj : xdpj≤ θpj}∪

⋃j=1,..,k : rpj=ci

{(x,y) ∈ Spj : xdpj> θpj}

Objective Function III

Tobias Pohlen | March 8, 2015 22/60

Page 44: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

1: function LSEARCH(Θp1 , ...,Θpk )2: while something changes do3: for i = 1, ...,k do4: F ← random feature selection5: (dpi ,θpi)← argmind∈F ,θ∈R E(...,Θpi−1 ,(d,θ, lpi ,rpi),Θpi+1 , ...)6: end for7: for i = 1, ...,k do8: lpi ← argminl=c1,...,cl

E(...,Θpi−1 ,(dpi ,θpi , l,rpi),Θpi+1 , ...)9: rpi ← argminr=c1,...,cl

E(...,Θpi−1 ,(dpi ,θpi , lpi ,r),Θpi+1 , ...)10: end for11: end while12: return Θp1 , ...,Θpk

13: end function

LSEARCH Optimization Algorithm

Tobias Pohlen | March 8, 2015 23/60

Page 45: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

This is where the technical section of the paper ends.

Open questions

I Does the algorithm converge to a local minimum?

I Does the algorithm terminate in a finite number of steps?

I How to implement the two minimization steps efficiently?

Main issueI There is no code available

In the following, I present the findings of my research.

Intermediate Discussion

Tobias Pohlen | March 8, 2015 24/60

Page 46: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

This is where the technical section of the paper ends.

Open questions

I Does the algorithm converge to a local minimum?

I Does the algorithm terminate in a finite number of steps?

I How to implement the two minimization steps efficiently?

Main issueI There is no code available

In the following, I present the findings of my research.

Intermediate Discussion

Tobias Pohlen | March 8, 2015 24/60

Page 47: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

This is where the technical section of the paper ends.

Open questions

I Does the algorithm converge to a local minimum?

I Does the algorithm terminate in a finite number of steps?

I How to implement the two minimization steps efficiently?

Main issueI There is no code available

In the following, I present the findings of my research.

Intermediate Discussion

Tobias Pohlen | March 8, 2015 24/60

Page 48: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Theorem

The LSEARCH algorithm terminates.

Termination Theorem

Tobias Pohlen | March 8, 2015 25/60

Page 49: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Proof

I E takes on a finite number of discrete valuesI There are only finitely many combinations of dv, lr and rvI There are infinitely many choices for θv

I We can factorize R using the following relation

x∼ y :⇔∀λ ∈ [0,1] : E(d,x, l,r) = E(d,λx+(1−λ)y, l,r)

I R/∼ is finiteI The joined parameter space is finite

Termination Theorem Proof I

Tobias Pohlen | March 8, 2015 26/60

Page 50: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Proof

I If the algorithm did not terminate, it would cycle through someconfigurations

I Let γ1, ...,γr be those configurations of Θ1, ...,ΘkI γi+1 = LSEARCH(γi) and γ1 = LSEARCH(γr)

I Observation: Parameters only change when the objective functiondecreases

I Hence

E(γ1)> E(γ2)> .. . > E(γr−1)> E(γr)

Termination Theorem Proof II

Tobias Pohlen | March 8, 2015 27/60

Page 51: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Proof

I But because of the cycling, it must also hold

E(γr)> E(γ1)

Therefore

E(γ1)> E(γ1)

Hence, the algorithm terminates.

Termination Theorem Proof III

Tobias Pohlen | March 8, 2015 28/60

Page 52: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

From the termination proof, the following theorem follows immidiately.

Theorem

The LSEARCH optimization algorithm converges to a local minimum of theobjective function in a finite number of iterations.

Proof

Termination theorem + Only parameters which decrease the objectivefunction are accepted.

Optimality Theorem

Tobias Pohlen | March 8, 2015 29/60

Page 53: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

From the termination proof, the following theorem follows immidiately.

Theorem

The LSEARCH optimization algorithm converges to a local minimum of theobjective function in a finite number of iterations.

Proof

Termination theorem + Only parameters which decrease the objectivefunction are accepted.

Optimality Theorem

Tobias Pohlen | March 8, 2015 29/60

Page 54: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Efficiently implementing the algorithm is not trivial

I Evaluating the objective function is expensive

E(Θ1, ...,Θk) =l

∑i=1|Sci |H(Sci)

I First the Sci have to be determinedI Then the entropies have to be calculated

I Exploit the problem structure in order to find an efficient implementation

Implementation

Tobias Pohlen | March 8, 2015 30/60

Page 55: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

1: (dpi ,θpi)← argmind∈F ,θ∈R E(...,Θpi−1 ,(d,θ, lpi ,rpi),Θpi+1 , ...)

First we note that only Slpiand Srpi

can change.

Corollary

It holdsargmind∈F ,θ∈R

E(...,Θpi−1 ,(d,θ, lpi ,rpi),Θpi+1 , ...)

= argmind∈F ,θ∈R

l

∑i=1|Sci |H(Sci)

= argmind∈F ,θ∈R

|Slpi|H(Slpi

)+ |Srpi|H(Srpi

)

Threshold Optimization I

Tobias Pohlen | March 8, 2015 31/60

Page 56: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

1: (dpi ,θpi)← argmind∈F ,θ∈R E(...,Θpi−1 ,(d,θ, lpi ,rpi),Θpi+1 , ...)

First we note that only Slpiand Srpi

can change.

Corollary

It holdsargmind∈F ,θ∈R

E(...,Θpi−1 ,(d,θ, lpi ,rpi),Θpi+1 , ...)

= argmind∈F ,θ∈R

l

∑i=1|Sci |H(Sci)

= argmind∈F ,θ∈R

|Slpi|H(Slpi

)+ |Srpi|H(Srpi

)

Threshold Optimization I

Tobias Pohlen | March 8, 2015 31/60

Page 57: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

pi−1 pi . . . pk−1 pk Parent nodes

rpi

ObservationI There is only a constant contribution from the other parents

I Only the contribution from pi varies

IdeaI Precompute the contribution from the other parent nodes in histograms

Threshold Optimization II

Tobias Pohlen | March 8, 2015 32/60

Page 58: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Testing multiple thresholds for a fixed feature dimension efficientlyI Sort the training set according the the feature dimension

I Subsequently test thresholds between neighboring points

xd• • • • • •• •

θ

I At each iteration, only a single data points moves from the right to the leftchild node

I This technique is due to Piotr Dollár [3]

Threshold Optimization III

Tobias Pohlen | March 8, 2015 33/60

Page 59: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Testing multiple thresholds for a fixed feature dimension efficientlyI Sort the training set according the the feature dimension

I Subsequently test thresholds between neighboring points

xd• • • • • •• •

θ

I At each iteration, only a single data points moves from the right to the leftchild node

I This technique is due to Piotr Dollár [3]

Threshold Optimization III

Tobias Pohlen | March 8, 2015 33/60

Page 60: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Testing multiple thresholds for a fixed feature dimension efficientlyI Sort the training set according the the feature dimension

I Subsequently test thresholds between neighboring points

xd• • • • • •• •

θ

I At each iteration, only a single data points moves from the right to the leftchild node

I This technique is due to Piotr Dollár [3]

Threshold Optimization III

Tobias Pohlen | March 8, 2015 33/60

Page 61: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Testing multiple thresholds for a fixed feature dimension efficientlyI Sort the training set according the the feature dimension

I Subsequently test thresholds between neighboring points

xd• • • • • •• •

θ

I At each iteration, only a single data points moves from the right to the leftchild node

I This technique is due to Piotr Dollár [3]

Threshold Optimization III

Tobias Pohlen | March 8, 2015 33/60

Page 62: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Testing multiple thresholds for a fixed feature dimension efficientlyI Sort the training set according the the feature dimension

I Subsequently test thresholds between neighboring points

xd• • • • • •• •

θ

I At each iteration, only a single data points moves from the right to the leftchild node

I This technique is due to Piotr Dollár [3]

Threshold Optimization III

Tobias Pohlen | March 8, 2015 33/60

Page 63: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Testing multiple thresholds for a fixed feature dimension efficientlyI Sort the training set according the the feature dimension

I Subsequently test thresholds between neighboring points

xd• • • • • •• •

θ

I At each iteration, only a single data points moves from the right to the leftchild node

I This technique is due to Piotr Dollár [3]

Threshold Optimization III

Tobias Pohlen | March 8, 2015 33/60

Page 64: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Testing multiple thresholds for a fixed feature dimension efficientlyI Sort the training set according the the feature dimension

I Subsequently test thresholds between neighboring points

xd• • • • • •• •

θ

I At each iteration, only a single data points moves from the right to the leftchild node

I This technique is due to Piotr Dollár [3]

Threshold Optimization III

Tobias Pohlen | March 8, 2015 33/60

Page 65: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

In SummaryI Precompute the contributions from the other parents

I Subsequently test different thresholds

I These steps allow us to evaluate the objective function at each iteration inconstant time

NotesI The steps are proven to be correct

I Derivations are rather technical

I See the seminar paper for formal details

Threshold Optimization: Discussion

Tobias Pohlen | March 8, 2015 34/60

Page 66: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

In SummaryI Precompute the contributions from the other parents

I Subsequently test different thresholds

I These steps allow us to evaluate the objective function at each iteration inconstant time

NotesI The steps are proven to be correct

I Derivations are rather technical

I See the seminar paper for formal details

Threshold Optimization: Discussion

Tobias Pohlen | March 8, 2015 34/60

Page 67: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Kincet Body Dataset [5]

I Estimate a human pose from asingle depth image

I 31 classes

Image by Shotton et al. [2]

Experiments from the Paper

Tobias Pohlen | March 8, 2015 35/60

Page 68: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Image by Shotton et al. [2]

Results: Test accuracy

Tobias Pohlen | March 8, 2015 36/60

Page 69: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Image by Shotton et al. [2]

Results: Feature evaluations

Tobias Pohlen | March 8, 2015 37/60

Page 70: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

ConclusionsI Decision DAGs trained using the LSEARCH algorithm...

I consume less memory than binary decision treesI perform significantly better when compared to trees of the same size (i.e.

same number of nodes)I The proposed DAG structure works better than trees of fixed width

I Fixed-width tree: At each level, only split the M nodes that have the highestentropy

QuestionsI How do decision jungles perform compared to random forests

(disregarding model size)?I Training time?I Absolute test accuracy?I Evaluation time?

Interpretation

Tobias Pohlen | March 8, 2015 38/60

Page 71: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

ConclusionsI Decision DAGs trained using the LSEARCH algorithm...

I consume less memory than binary decision treesI perform significantly better when compared to trees of the same size (i.e.

same number of nodes)I The proposed DAG structure works better than trees of fixed width

I Fixed-width tree: At each level, only split the M nodes that have the highestentropy

QuestionsI How do decision jungles perform compared to random forests

(disregarding model size)?I Training time?I Absolute test accuracy?I Evaluation time?

Interpretation

Tobias Pohlen | March 8, 2015 38/60

Page 72: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

ConclusionsI Decision DAGs trained using the LSEARCH algorithm...

I consume less memory than binary decision treesI perform significantly better when compared to trees of the same size (i.e.

same number of nodes)I The proposed DAG structure works better than trees of fixed width

I Fixed-width tree: At each level, only split the M nodes that have the highestentropy

QuestionsI How do decision jungles perform compared to random forests

(disregarding model size)?I Training time?I Absolute test accuracy?I Evaluation time?

Interpretation

Tobias Pohlen | March 8, 2015 38/60

Page 73: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Decision jungle results are obtained using my LibJungle C++ library [4]I Efficient multi-threaded implementation of decision jungles

I Baseline results are obtained using Piotr Dollár’s MATLAB Toolbox [3]I Very efficient and well tested implementation of random forestsI Fair comparison: Crucial parts are implemented in C

My Experiments

Tobias Pohlen | March 8, 2015 39/60

Page 74: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Decision jungle results are obtained using my LibJungle C++ library [4]I Efficient multi-threaded implementation of decision jungles

I Baseline results are obtained using Piotr Dollár’s MATLAB Toolbox [3]I Very efficient and well tested implementation of random forestsI Fair comparison: Crucial parts are implemented in C

My Experiments

Tobias Pohlen | March 8, 2015 39/60

Page 75: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Handwritten digits 0-9 (10 classes)I Grayscale imagesI 28×28 pixels

I 60,000 training images

I 10,000 test images

I Available under http://yann.lecun.com/exdb/mnist/

Evaluation Data: MNIST Data Set

Tobias Pohlen | March 8, 2015 40/60

Page 76: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Algorithm

1: function LSEARCH(Θp1 , ...,Θpk )2: while something changes do3: ...4: end while5: return Θp1 , ...,Θpk

6: end function

ExperimentI We set an iteration limit on the outer while-loop in the LSEARCH

optimization algorithm

I Evaluate the performance of a single DAG vs. a single tree

Experiment 1: Iteration Limit

Tobias Pohlen | March 8, 2015 41/60

Page 77: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

0 20 40 60

0.85

0.9

0.95

1

Max Iterations

Test

Acc

urac

ySingle DAGSingle Tree

Results: Test Accuracy

Tobias Pohlen | March 8, 2015 42/60

Page 78: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

0 20 40 6020

30

40

50

60

Max Iterations

Dep

thSingle DAGSingle Tree

Results: Depth

Tobias Pohlen | March 8, 2015 43/60

Page 79: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

0 20 40 600

100

200

300

400

Max Iterations

Trai

ning

Tim

e(s

)Single DAGSingle Tree

Results: Training Time

Tobias Pohlen | March 8, 2015 44/60

Page 80: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

0 10 20 30 40 500

0.2

0.4

0.6

0.8

Levels

Trai

ning

Err

orI = 5

I = 15I = 55

Single Tree

Results: Convergence Speed

Tobias Pohlen | March 8, 2015 45/60

Page 81: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

ProsI DAGs outperform trees by a large margin

I DAGs consume considerable less memory

ConsI Evaluation time for DAGs is twice the time for trees

I Training DAGs takes significantly longer than training trees

Experiment 1: Interpretation

Tobias Pohlen | March 8, 2015 46/60

Page 82: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

ProsI DAGs outperform trees by a large margin

I DAGs consume considerable less memory

ConsI Evaluation time for DAGs is twice the time for trees

I Training DAGs takes significantly longer than training trees

Experiment 1: Interpretation

Tobias Pohlen | March 8, 2015 46/60

Page 83: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

QuestionI How do decision jungles perform compared to random forests?

ExperimentI Train up to 30 DAGs/trees

I Evaluate the performance of the ensemble each time after adding aDAG/tree

I Perform the experiment for different depth limits (10,15,45)

I Perform the experiment with bagging and without bagging

Experiment 2: Ensembles

Tobias Pohlen | March 8, 2015 47/60

Page 84: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

QuestionI How do decision jungles perform compared to random forests?

ExperimentI Train up to 30 DAGs/trees

I Evaluate the performance of the ensemble each time after adding aDAG/tree

I Perform the experiment for different depth limits (10,15,45)

I Perform the experiment with bagging and without bagging

Experiment 2: Ensembles

Tobias Pohlen | March 8, 2015 47/60

Page 85: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

5 10 15 20 25 300.8

0.85

0.9

0.95

1

Ensemble Size

Test

Acc

urac

y

L = 10L = 15L = 45

Random Forest

Results: Without Bagging

Tobias Pohlen | March 8, 2015 48/60

Page 86: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

5 10 15 20 25 300.8

0.85

0.9

0.95

1

Ensemble Size

Test

Acc

urac

y

L = 10L = 15L = 45

Random Forest

Results: With Bagging

Tobias Pohlen | March 8, 2015 49/60

Page 87: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Algorithm

1: G← ({root}, /0)2: for d = 1,2,.. do3: Add s(d) new nodes to G4: Initialize the parameters of the former leaf nodes5: Optimize the parameters of the former leaf nodes6: end for

Two possibilitiesI Initialize parameters randomly

I Initialize lpi and rpi such that parent nodes with high entropy do not havecommon child nodes

GoalI Speed up convergence

Experiment 3

Tobias Pohlen | March 8, 2015 50/60

Page 88: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Algorithm

1: G← ({root}, /0)2: for d = 1,2,.. do3: Add s(d) new nodes to G4: Initialize the parameters of the former leaf nodes5: Optimize the parameters of the former leaf nodes6: end for

Two possibilitiesI Initialize parameters randomly

I Initialize lpi and rpi such that parent nodes with high entropy do not havecommon child nodes

GoalI Speed up convergence

Experiment 3

Tobias Pohlen | March 8, 2015 50/60

Page 89: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Algorithm

1: G← ({root}, /0)2: for d = 1,2,.. do3: Add s(d) new nodes to G4: Initialize the parameters of the former leaf nodes5: Optimize the parameters of the former leaf nodes6: end for

Two possibilitiesI Initialize parameters randomly

I Initialize lpi and rpi such that parent nodes with high entropy do not havecommon child nodes

GoalI Speed up convergence

Experiment 3

Tobias Pohlen | March 8, 2015 50/60

Page 90: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

10 20 30 40 500

0.1

0.2

0.3

Levels Trained

Trai

ning

Err

orRandom initialization

Deterministic initialization

Results: Convergence speed

Tobias Pohlen | March 8, 2015 51/60

Page 91: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

10 15 20 25 30

0.7

0.8

0.9

1

Levels Trained

Test

Acc

urac

y

Random initializationDeterministic initialization

Results: Test accuracy

Tobias Pohlen | March 8, 2015 52/60

Page 92: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

We compare the test accuracy of decision jungles and random forests.

Data set Size Features Attributes #DAGs

MNIST 60,000/10,000 784 numerical 8/15USPS 3,823/1,797 64 numerical 8/15CONNECT 4 67,557/- 42(126) categorical 8/15LETTER RECOG. 20,000/- 16 numerical 8/15SHUTTLE 43,500/14,500 9 numerical 8/15

Data sets are from the UCI Machine Learning Repository [6].

Experiment 4: Various Data Sets

Tobias Pohlen | March 8, 2015 53/60

Page 93: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Decision jungles Random forests8 DAGs 8 Trees

Data set Mean Stdev. Mean Stdev.

MNIST 95.72% 0.13% 95.14% 0.20%USPS 94.65% 0.5% 94.44% 0.30%CONNECT 4 81.17% 0.22% 80.99% 0.46%LETTER RECOGNITION 94.73% 0.57% 94.29% 0.43%SHUTTLE 99.98% 0.01% 99.99% 0.00%

DAGs are trained without bagging.

Experiment 4: Results I

Tobias Pohlen | March 8, 2015 54/60

Page 94: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Decision jungles Random forests15 DAGs 15 Trees

Data set Mean Stdev. Mean Stdev.

MNIST 96.38% 0.09% 96.23% 0.16%USPS 95.95% 0.2% 95.93% 0.52%CONNECT 4 81.98% 0.15% 81.47% 0.66%LETTER RECOGNITION 95.73% 0.55% 95.58% 0.48%SHUTTLE 99.99% 0.01% 99.99% 0.01%

DAGs are trained without bagging.

Experiment 4: Results II

Tobias Pohlen | March 8, 2015 55/60

Page 95: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I C++ implementation of decision jungles

I Implements all speed-ups discussed in the seminar paper

I Can be used as a static library

I Open source license (BSD)

I Available under https://bitbucket.org/geekStack/libjungle

LibJungle C++ Library

Tobias Pohlen | March 8, 2015 56/60

Page 96: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 97: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 98: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 99: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 100: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 101: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 102: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 103: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

I Goal: Find memory efficient alternative to random forests

I Idea: Use DAGs and limit their width

I Train ensembles of random decision DAGs (called decision jungle)

I Train a DAG level-wise by minimizing an objective function

I Efficiently implement the optimization using histogramsI Decision jungles perform as well as random forests

I Sometimes even better

I Evaluation is twice as expensive

I Training takes significantly longer

Summary

Tobias Pohlen | March 8, 2015 57/60

Page 104: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Questions are welcome

Seminar paper available under geekstack.net/paper

Thanks for your attention

Tobias Pohlen | March 8, 2015 58/60

Page 105: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Leo Breiman.Random Forests.Machine Learning 45, 2001.

Shotton, Jamie and Sharp, Toby and Kohli, Pushmeet and Nowozin,Sebastian and Winn, John and Criminisi, Antonio.Decision Jungles: Compact and Rich Models for Classification.Advances in Neural Information Processing Systems 26, 2013.

Piotr Dollár.Piotr’s Image and Video Matlab Toolbox (PMT).http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html.

Further Reading I

Tobias Pohlen | March 8, 2015 59/60

Page 106: Decision Jungles - geekStackgeekstack.net/resources/...decision_jungles_slides.pdf · Winn, and Antonio Criminisi. Decision Jungles: Compact and rich models for classification. Advances

Tobias Pohlen.LibJungle - Decision Jungle Library.https://bitbucket.org/geekStack/libjungle.

Jamie Shotton and Ross Girshick and Andrew Fitzgibbon and TobySharp and Mat Cook and Mark Finocchio and Richard Moore andPushmeet Kohli and Antonio Criminisi and Alex Kipman and AndrewBlake.Efficient Human Pose Estimation from Single Depth Images.IEEE Trans. Pattern Anal. Mach. Intell., 35, pages 2821-2840, 2013.

K. Bache and M. Lichman.UCI Machine Learning Repository.http://archive.ics.uci.edu/ml.

Further Reading II

Tobias Pohlen | March 8, 2015 60/60