complex models for big data

16
Max Welling UvA Complex Models for Big Data

Upload: data-science-research-center

Post on 26-May-2015

370 views

Category:

Technology


1 download

DESCRIPTION

Max Welling (http://www.ics.uci.edu/~welling/) describes the how big data, massive simulation and advanced models go together to help us start solving challenging problems. He also describes his links to other computer science disciplines within the DSRC.

TRANSCRIPT

  • 1. DS RCData Science Research CenterComplex Models for Big DataMax Welling UvA

2. DS RCThe Four ParadigmsWe have added big data to computer simulation, experiment and theory.Not replaced it 3. DS RCBig SimulationComputer simulations have become increasingly complex (e.g. weather, earthquake models)The Computational Wall: If a model has hundreds of parameters, how can we:1) Find the parameter values that match the observations best? 2) Determine if we underfit (model too simple) or overfit (model too complex)? 3) Compare two models? 4. DS RCParameter Inference Parameter UpdateParametersSimulationObservations 5. DS RCChallenge IThe posterior probability in closed form.can not be computedSolution: Markov Chain Monte Carlo Sampling (MCMC) 6. DS RCChallenge IIWe cannot run MCMC because the likelihood is not given in closed form (but rather as a simulation)Solution: Likelihood Free MCMC (or Approximate Bayesian Computation)Run many simulations and compare samples With observations. Source: Csillery, Katalin, et al. "Approximate Bayesian computation (ABC) in practice."Trends in ecology & evolution 25.7 (2010): 410-418. 7. DS RCChallenge IIIWe need thousands of simulations to infer the posterior (infeasible if every simulation takes a day or so) Ted MeedsIf surrogate ~ log(P) with high confidence then use surrogate to draw sample. If not: simulate until enough confidence.Surrogate of log(P)Solution: Learn log(P) using Gaussian Process Surrogate functions (GPS) 8. D S Two Kinds of Complex Model RCMachine LearningComputational Science Model CapacityLet the model speakLet the data speak 9. DS RC3x Exponential Growth in Machine LearningComputer PowerData VolumeModel Capacity 10. D S Growth in Model Capacity RC 2020-2050 Human Brain (N=+/- 100T)?Model Capacity over Time2009: Hintons Deep Belief Net (+/- N=10M)2013: Google/Y! (N=+/- 10B)1943: First NN (+/- N=10)1988: NetTalk (+/- N=20K) 11. D S Deep Learning: Neural Nets Strike R C Back(again) 1970: NN discredited (Minsky & Papert)2 layers 1943: NN invented (McCulloch & Pitts)-Model Size: 10B parameters -Used by: Yahoo!, Google, Microsoft, Baidu, IBM, Scyfer 1986: Backpropagation (Rumelhart, Hinton & Williams )1995: SVM (Vapnik)3 layers2009: Deep Learning (Hinton)many layers 12. DS RCParadox Why does model capacity grow exponentially? Raw Information: O(N)Predictive Information: log(N)Noise ? 13. DS RCBig Challenges from Industry Scyfer connects industry to academia: -inspire academia w/ relevant problems -deliver ML products to industry -host student projects -provide employment for our students = VALORISATIONWhat industry needs.What academics are interested in. 14. DS RCIntelligent Autonomous Systems Lab - UvAVisual AnalyticsShimon WhitesonLeo DorstBusiness AnalyticsDecision Theory(Geometric Algebra)Understand and decide(Reinforcement Learning & Planning)Joris Mooij (Causality) Distributed ProcessingDataReasoningKnowledge representati onLarge Scale DatabasesStore and process Software Eng. System / Network Eng.Analyze and modelMultimedia RetrievalModeling and simulationInformation RetrievalMachine LearningBen Krse (Ambient Robotics)Dariu Gavrilla (Human-aware Intelligent Systems)Max Welling (Machine Learning) 15. DS RCOur Future NeedVisual AnalyticsShimon WhitesonLeo DorstBusiness AnalyticsDecision Theory(Geometric Algebra)Understand and decide(Reinforcement Learning & Planning)Joris Mooij (Causality) Distributed ProcessingDataReasoningKnowledge representati onLarge Scale DatabasesStore and process Software Eng. System / Network Eng.Analyze and modelMultimedia RetrievalModeling and simulationInformation RetrievalMachine LearningBen Krse (Ambient Robotics)Dariu Gavrilla (Human-aware Intelligent Systems)Max Welling (Machine Learning) 16. DS RCQuestions?