learning(large)scale(( hierarchical(models( ·...

36
Learning LargeScale Hierarchical Models Russ Salakhutdinov Department of Computer Science and Statistics University of Toronto

Upload: others

Post on 04-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Learning  Large-­‐Scale    Hierarchical  Models  

Russ  Salakhutdinov  

Department of Computer Science and Statistics !University of Toronto  

Page 2: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Rela8onal  Data/    Social  Network  

Product    Recommenda8on  

Drama8c  increase  in  both  computa8onal  power  and  the  amount  of  data  available  from  web,  video  cameras,  laboratory  measurements.  

Discovering  Structure  

Images  &  Video   Text  &  Speech   Gene  Expression  Medical  Imaging  

Page 3: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Rela8onal  Data/    Social  Network  

Product    Recommenda8on  

Drama8c  increase  in  both  computa8onal  power  and  the  amount  of  data  available  from  web,  video  cameras,  laboratory  measurements.  

Discovering  Structure  

•   Discover  underlying  structure,  seman8c  rela8ons,  or  invariances  from  data.  •   Robust,  adap8ve  models  models  that  can  deal  with  missing  measurements,  nonsta8onary  distribu8ons,  mul8modal  data.      

sunset,  pacific  ocean,  beach,  seashore  

Mul8modal  Data  

Images  &  Video   Text  &  Speech   Gene  Expression  Medical  Imaging  

Page 4: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Rela8onal  Data/    Social  Network  

Product    Recommenda8on  

Drama8c  increase  in  both  computa8onal  power  and  the  amount  of  data  available  from  web,  video  cameras,  laboratory  measurements.  

Discovering  Structure  

•   Discover  underlying  structure,  seman8c  rela8ons,  or  invariances  from  data.  •   Robust,  adap8ve  models  models  that  can  deal  with  missing  measurements,  nonsta8onary  distribu8ons,  mul8modal  data.      

sunset,  pacific  ocean,  beach,  seashore  

Mul8modal  Data  

Images  &  Video   Text  &  Speech   Gene  Expression  Medical  Imaging  

Deep  Learning  

Page 5: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Research  Direc8ons  

•   Deep  Learning  

•   Learning  More  Structured  Models:  Transfer  and  One-­‐Shot  Learning  

•   Mul8modal  Learning  

Page 6: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Image  

Low-­‐level  features:  Edges  

Input:  Pixels  

Example:  Deep  Boltzmann  Machines  

(Salakhutdinov & Hinton, AIStats 2009, Neural Computation 2012)!

Page 7: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Image  

Higher-­‐level  features:  Combina8on  of  edges  

Low-­‐level  features:  Edges  

Input:  Pixels  

Example:  Deep  Boltzmann  Machines  •   Learn  hierarchies  of  nonlinear  features.  •   Unsupervised  feature  learning  –  no  need          to  rely  on  human-­‐craSed  input  features.  

(Salakhutdinov & Hinton, AIStats 2009, Neural Computation 2012)!

Page 8: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Image  

Higher-­‐level  features:  Combina8on  of  edges  

Input:  Pixels  

Example:  Deep  Boltzmann  Machines  •   Learn  hierarchies  of  nonlinear  features.  •   Unsupervised  feature  learning  –  no  need          to  rely  on  human-­‐craSed  input  features.  

(Salakhutdinov & Hinton, AIStats 2009, Neural Computation 2012)!

BoTom-­‐up  +  Top-­‐down  

Page 9: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Bag  of  words  

Reuters  dataset:  804,414    newswire  stories:  unsupervised  

Deep  Genera8ve  Model  

Page 10: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Legal/JudicialLeading Economic Indicators

European Community Monetary/Economic

Accounts/Earnings

Interbank Markets

Government Borrowings

Disasters and Accidents

Energy Markets

Bag  of  words  

Reuters  dataset:  804,414    newswire  stories:  unsupervised  

Deep  Genera8ve  Model  

Page 11: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Deep  Genera8ve  Model  

NeZlix  dataset:    480,189  users    17,770  movies    Over  100  million  ra8ngs  

(Salakhutdinov et. al. ICML 2007 )!

Page 12: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Deep  Genera8ve  Model  

NeZlix  dataset:    480,189  users    17,770  movies    Over  100  million  ra8ngs  

Learned  features:  ``genre’’  

Fahrenheit  9/11  Bowling  for  Columbine  The  People  vs.  Larry  Flynt  Canadian  Bacon  La  Dolce  Vita  

Independence  Day  The  Day  ASer  Tomorrow  Con  Air  Men  in  Black  II  Men  in  Black  

Friday  the  13th  The  Texas  Chainsaw  Massacre  Children  of  the  Corn  Child's  Play  The  Return  of  Michael  Myers  

Scary  Movie  Naked  Gun    Hot  Shots!  American  Pie    Police  Academy  

State-­‐of-­‐the-­‐art  performance    on  the  NeZlix  dataset.     (Salakhutdinov et. al. ICML 2007 )!

Page 13: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Deep  Genera8ve  Model  

25  ms  windowed  frames  

•  Speech  Recogni8on:  Spoken  Query  Detec8on:  For  each  keyword,  es8mate  uTerance’s  probability  of  containing  that  keyword.      

(Zhang, Salakhutdinov, Chang, Glass, ICASSP, 2012)!

Page 14: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Deep  Genera8ve  Model  

25  ms  windowed  frames  

Learning  Algorithm   AVG  EER  GMM  Unsupervised     16.4%  DBM  Unsupervised     14.7%  DBM  (1%  labels)   13.3%  DBM  (30%  labels)   10.5%  DBM  (100%  labels)   9.7%  

•  Speech  Recogni8on:  Spoken  Query  Detec8on:  For  each  keyword,  es8mate  uTerance’s  probability  of  containing  that  keyword.      

(Zhang, Salakhutdinov, Chang, Glass, ICASSP, 2012)!

Page 15: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Research  Direc8ons  

•   Deep  Learning  

•   Learning  More  Structured  Models:  Transfer  and  One-­‐Shot  Learning  

•   Mul8modal  Learning  

Page 16: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Face  Recogni8on  

Due  to  extreme  illumina8on  varia8ons,  deep  models  perform  quite  poorly  on  this  dataset.    

Yale  B  Extended  Face  Dataset  4  subsets  of  increasing  illumina8on  varia8ons  

Page 17: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Consider  More  Structured  Models:  undirected  +  directed  models.  

Deep  Lamber8an  Model  

Deep  Undirected  

Directed  

Combines  the  elegant  proper8es  of  the  Lamber8an  model  with  the  Gaussian  DBM  model.  

Observed  Image  

Inferred  

(Tang et. Al., ICML 2012, Tang et. al. CVPR 2012)!

Page 18: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Deep  Lamber8an  Model  

Observed  Image  

Image  albedo  

Surface    normals  

Light    source  

(Tang et. Al., ICML 2012, Tang et. al. CVPR 2012)!

Page 19: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Deep  Lamber8an  Model  

Observed  Image  

Image  albedo  

Surface    normals  

Light    source  

Gaussian  Deep    Boltzmann  Machine   Albedo  DBM:  

Pretrained  using  Toronto  Face  Database  

Transfer  Learning  

Inference:  Varia8onal  Inference.  Learning:  Stochas8c  Approxima8on  

Inferred  

(Tang et. Al., ICML 2012, Tang et. al. CVPR 2012)!

Page 20: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Face  Religh8ng  

One  Test  Image  

Face  Religh8ng  Observed  Inferred  albedo  

Page 21: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Face  Religh8ng  

One  Test  Image  

Face  Religh8ng  Observed  Inferred  albedo  

What  about  building  structured  models  for  transfer  learning?  

Page 22: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Transfer  Learning  

Millions  of  unlabeled  images    

Some  labeled  images  

Bicycle  

Elephant  

Dolphin  

Tractor  

Background  Knowledge  

Learn  novel  concept  from  one  example  

Learn  to  Transfer  Knowledge  

Page 23: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Transfer  Learning  

Test:    What  is  this?  

Millions  of  unlabeled  images    

Some  labeled  images  

Bicycle  

Elephant  

Dolphin  

Tractor  

Background  Knowledge  

Learn  novel  concept  from  one  example  

Learn  to  Transfer  Knowledge  

Page 24: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Learning  Category  Hierarchy  Learning  to  share  the  knowledge  across  many  visual  categories.  

Learned  low-­‐level  generic  features  

…  

…  Learned  higher-­‐level  features  

Deep  Boltzmann  Machine  using  4  million  images  

(Salakhutdinov et. al., PAMI 2012, Srivastava and Salakhutdinov, 2013)!

Page 25: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Learning  Category  Hierarchy  Learning  to  share  the  knowledge  across  many  visual  categories.  

Learned  low-­‐level  generic  features  

…  

…  Learned  higher-­‐level  features  

Deep  Boltzmann  Machine  using  4  million  images  

“global”  

woman  

“human”  “fruit”  “aqua8c  animal”  

Learned  super-­‐class  hierarchy  

shark   ray  turtle  dolphin   baby   man  girl  pear  orange  apple  

Hierarchical  Bayes  

(Salakhutdinov et. al., PAMI 2012, Srivastava and Salakhutdinov, 2013)!

Page 26: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Research  Direc8ons  

•   Deep  Learning  

•   Learning  More  Structured  Models:  Transfer  Learning  

•   Mul8modal  Learning  

Page 27: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Data  –  Collec8on  of  Modali8es  •   Mul8media  content  on  the  web  -­‐  image  +  text  +  audio.  

•   Biomedical  Imaging  

•   Robo8cs  applica8ons.  

Audio  Vision  

Touch  sensors  Motor  control  

sunset,  pacificocean,  bakerbeach,  seashore,  ocean  

car,  automobile  

fMRI   PET  Scan   EEG   X-­‐ray  

Page 28: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Shared  Concept  “Modality-­‐free”  representa8on    

“Modality-­‐full”  representa8on    

“Concept”  

sunset,  pacific  ocean,  baker  beach,  seashore,  

ocean  

Page 29: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

0  0  1  0  

0  Dense,  real-­‐valued  image  features  

Gaussian  model   Undirected  Topic  Model  

Mul8modal  DBM  

Word  counts  

(Srivastava and Salakhutdinov, NIPS 2013)!

Page 30: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Mul8modal  DBM  

0  0  1  0  

0  Dense,  real-­‐valued  image  features  

Gaussian  model  

Word  counts  

Undirected  Topic  Model  

(Srivastava and Salakhutdinov, NIPS 2013)!

Page 31: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Gaussian  model  

0  0  1  0  

0  

Mul8modal  DBM  

Word  counts  

Dense,  real-­‐valued  image  features  

Undirected  Topic  Model  

(Srivastava and Salakhutdinov, NIPS 2013)!

BoTom-­‐up  +  

Top-­‐down  

Page 32: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Text  Generated  from  Images  Image   Given  Text   Text  generated  by  the  model  

beach,  sea,  surf,  strand,  shore,  wave,  seascape,  sand,  ocean,  waves  

portrait,  girl,  woman,  lady,  blonde,  preTy,  gorgeous,  expression,  model  

night,  noTe,  traffic,  light,  lights,  parking,  darkness,  lowlight,  nacht,  glow  

fall,  autumn,  trees,  leaves,  foliage,  forest,  woods,  branches,  path  

pentax,  k10d,  pentaxda50200,  kangarooisland,  sa,  australiansealion  

mickikrimmel,  mickipedia,  headshot  

unseulpixel,  naturey,  crap  

<  no  text>  

Page 33: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Mul8modal  DBMs  

Samples  drawn  aSer  every  50  steps  of  Gibbs  updates  

Page 34: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Impact  of  Deep  Learning  

•   Speech  Recogni8on  

•   Computer  Vision  

•   Neuroscience  

•   Recommender  Systems    

•   Beginning:  Drug  Discovery  and  Medical  Image  Analysis    

Page 35: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto

Thank  you  

Page 36: Learning(Large)Scale(( Hierarchical(Models( · Learning(Large)Scale((Hierarchical(Models(Russ(Salakhutdinov(Department of Computer Science and Statistics ! University of Toronto