converging professors raphael hauser & massimiliano …€¦ · techniques to design novel...

3
Could you offer an insight into your backgrounds? What sparked your respective interests in convex optimisation and machine learning? RH: I first started working on convex optimisation during my doctoral studies in the late 1990s while at Cornell University. The research area was expanding at the time due to important algorithmic breakthroughs that made it possible to solve problems with thousands of decision variables. It was also discovered that many optimisation problems that are non- convex can be convexified by introducing extra variables. The corresponding higher- dimensional convex problems were easier and more robust to solve than the lower- dimensional non-convex problems. The field has evolved since then, and it is now possible to solve problem instances with hundreds of millions of variables. MP: I began working on machine learning during my PhD studies at the University of Genoa in Italy. My interest in the subject started when I read a paper entitled ‘Support vector networks’ (later called ‘Support vector machines’) by Corinna Cortes and Vladimir Vapnik. The paper describes an interesting method to find a decision function that optimally separates a set of data points belonging to two different classes. In a nutshell, a support vector machine involves solving a convex optimisation problem whose solution corresponds to the optimal decision function associated with a set of binary-labelled data points. How did your collaboration begin? RH: We met when we were visiting Professors Felipe Cucker and Steve Smale from the City University of Hong Kong, who were both interested in condition numbers for convex optimisation problems, an area in which I was working, and in machine learning theory, which is Pontil’s area of expertise. Can you discuss machine learning and your early projects working together in the research field? RH&MP: Machine learning is an area in which large-scale convex optimisation models appear naturally. This made it very exciting to start working together. Our first joint project was focused on an algorithm that searches for an optimal kernel among a convex set of kernels to learn a particular task. This project led us to discuss structured sparsity in machine learning. This refers to optimisation problems whose decision variables can be expressed as a parametric family with a particular structure that is naturally associated with the underlying application, and in which solutions are preferred such that very few of the parameters are non-zero. Optimisation models with structured sparsity constraints are generally intractable, but breakthroughs on compressed sensing have led to a huge and important body of work on sparsity, causing regularisation terms in optimisation problems. Many of these models enable one to replace an intractable problem with a sparsity constraint by a tractable convex problem that typically solves the intractable problem. These occur naturally in the context of machine learning. Can you describe your more recent collaborative work with singular value decomposition (SVD)? RH: In work with Professor Charles Micchelli, Pontil discovered structured sparsity models that fit several existing and new problems in machine learning. Pontil and Micchelli also identified an algorithm for solving such problems using a type of block coordinate descent that required an SVD in each inner iteration. Since SVDs occur at each iteration, they form a bottleneck calculation that limits the size of the problem that the algorithm can handle. At Oxford, we concentrated on developing a novel parallel algorithm for the computation of leading-part SVDs of large- scale dense matrices to reduce communication and node synchronicity requirements as much as possible. Once finished, Dr Daniel Goodman at the University of Manchester, UK, ran a set of experiments on a high-performance computer that showed our new method leads to significant speed up, not only on a parallel machine, but also when the individual local calculations were run sequentially. What have been the most challenging aspects of your research? Have you been able to overcome these challenges? MP: A fundamental difficulty in machine learning is studying the assumptions that the underlying data rely upon as these capture important properties and lead to efficient computational algorithms. Often, these assumptions have a simple and intuitive interpretation, but their implementation is non-trivial. We have been able to find new convex relaxations to use in matrix completion problems and that prove useful in application. At the same time, we were able to derive guarantees on the generalisation error of the learning algorithms under study. A small generalisation error means that the solution found by the algorithm will fit well on the training data as well as new points drawn from the same distribution of the training data. Professors Raphael Hauser and Massimiliano Pontil discuss their ongoing research that is uniting the fields of convex optimisation and machine learning to derive algorithmic solutions to a host of practical problems Converging disciplines PROFESSORS RAPHAEL HAUSER & MASSIMILIANO PONTIL 70 INTERNATIONAL INNOVATION

Upload: others

Post on 17-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Converging PROFESSORS RAPHAEL HAUSER & MASSIMILIANO …€¦ · techniques to design novel statistical tests for sequence alignment. Determining the similarity ... natural language

Could you offer an insight into your backgrounds? What sparked your respective interests in convex optimisation and machine learning?

RH: I first started working on convex optimisation during my doctoral studies in the late 1990s while at Cornell University. The research area was expanding at the time due to important algorithmic breakthroughs that made it possible to solve problems with thousands of decision variables. It was also discovered that many optimisation problems that are non-convex can be convexified by introducing extra variables. The corresponding higher-dimensional convex problems were easier and more robust to solve than the lower-dimensional non-convex problems. The field has evolved since then, and it is now possible to solve problem instances with hundreds of millions of variables.

MP: I began working on machine learning during my PhD studies at the University of Genoa in Italy. My interest in the subject started when I read a paper entitled ‘Support vector networks’ (later called ‘Support vector machines’) by Corinna Cortes and Vladimir Vapnik. The paper describes an interesting method to find a decision function that optimally separates a set of data points belonging to two different classes. In a nutshell, a support vector machine involves solving a convex optimisation problem whose solution

corresponds to the optimal decision function associated with a set of binary-labelled data points.

How did your collaboration begin?

RH: We met when we were visiting Professors Felipe Cucker and Steve Smale from the City University of Hong Kong, who were both interested in condition numbers for convex optimisation problems, an area in which I was working, and in machine learning theory, which is Pontil’s area of expertise.

Can you discuss machine learning and your early projects working together in the research field?

RH&MP: Machine learning is an area in which large-scale convex optimisation models appear naturally. This made it very exciting to start working together. Our first joint project was focused on an algorithm that searches for an optimal kernel among a convex set of kernels to learn a particular task. This project led us to discuss structured sparsity in machine learning. This refers to optimisation problems whose decision variables can be expressed as a parametric family with a particular structure that is naturally associated with the underlying application, and in which solutions are preferred such that very few of the parameters are non-zero.

Optimisation models with structured sparsity constraints are generally intractable, but breakthroughs on compressed sensing have led to a huge and important body of work on sparsity, causing regularisation terms in optimisation problems. Many of these models enable one to replace an intractable problem with a sparsity constraint by a tractable convex problem that typically solves the intractable problem. These occur naturally in the context of machine learning.

Can you describe your more recent collaborative work with singular value decomposition (SVD)?

RH: In work with Professor Charles Micchelli, Pontil discovered structured sparsity models that fit several existing and new problems in machine learning. Pontil and Micchelli also identified an algorithm for solving such problems using a type of block coordinate descent that required an SVD in each inner iteration. Since SVDs occur at each iteration, they form a bottleneck calculation that limits the size of the problem that the algorithm can handle. At Oxford, we concentrated on developing a novel parallel algorithm for the computation of leading-part SVDs of large-scale dense matrices to reduce communication and node synchronicity requirements as much as possible. Once finished, Dr Daniel Goodman at the University of Manchester, UK, ran a set of experiments on a high-performance computer that showed our new method leads to significant speed up, not only on a parallel machine, but also when the individual local calculations were run sequentially.

What have been the most challenging aspects of your research? Have you been able to overcome these challenges?

MP: A fundamental difficulty in machine learning is studying the assumptions that the underlying data rely upon as these capture important properties and lead to efficient computational algorithms. Often, these assumptions have a simple and intuitive interpretation, but their implementation is non-trivial. We have been able to find new convex relaxations to use in matrix completion problems and that prove useful in application. At the same time, we were able to derive guarantees on the generalisation error of the learning algorithms under study. A small generalisation error means that the solution found by the algorithm will fit well on the training data as well as new points drawn from the same distribution of the training data.

Professors Raphael Hauser and Massimiliano Pontil discuss their ongoing research that is uniting the fields of convex optimisation and machine learning to derive algorithmic solutions to a host of practical problems

Converging disciplines

PRO

FESS

ORS

RAP

HAE

L H

AUSE

R &

MAS

SIM

ILIA

NO

PO

NTI

L

70 INTERNATIONAL INNOVATION

Page 2: Converging PROFESSORS RAPHAEL HAUSER & MASSIMILIANO …€¦ · techniques to design novel statistical tests for sequence alignment. Determining the similarity ... natural language

MACHINE LEARNING IS the science of getting a computer to learn without explicitly programming it to do so. It is a pervasive discipline, powering much of what we do every day. It underlies the ranking of the results returned during a Google search and filters junk mail into a spam folder on email systems. It forms the basis of artificial intelligence, and in the future, neural network algorithms that mimic how the human brain works could bring about truly intelligent machines.

Recent advances in machine learning theory have had significant impact on numerical optimisation, and scientists are forming strong connections across fields, paving the way for theoretical and algorithmic breakthroughs, and creating the potential to make machine learning methodologies applicable to many practical problems. A joint project between Professor Raphael Hauser of the University of Oxford and Professor Massimiliano Pontil of University College London (UCL) aims to develop structured sparsity methods for machine learning and convex optimisation. The project will facilitate and strengthen connections between the fields, transforming both mathematics and computer science.

ALGORITHM DEVELOPMENT

Many optimisation algorithms used in machine learning contain approximate singular value decompositions (SVDs) that factorise a matrix, a technique in linear algebra, as a bottleneck computation. They are the part of the program that determines its speed. In his part of the project, Hauser is working to develop highly scalable, loosely-coupled, communication-minimising, parallel algorithms to compute a leading part SVD. His team has already identified a class of these algorithms and is currently completing their convergence analysis.

Hauser is also working to apply machine learning techniques to design novel statistical tests for sequence alignment. Determining the similarity of sequences is hugely important, particularly in bioinformatics. Arranging the sequences of DNA, RNA or proteins to identify similar regions can help scientists understand their functional, structural

and evolutionary relationships. Beyond biology, sequence alignment techniques are also useful for natural language processing and analysing financial data. By exploiting the microstructure of optimal sequence alignments, Hauser and his collaborators Professor Henry Matzinger and Dr Saba Amsalu from Georgia Institute of Technology, USA, aim to improve statistical tests on sequence homology. He has already made significant progress in this regard, proving that the asymptotic empirical distribution is unique for virtually all scoring functions. He has also developed an iterative sampling algorithm which effectively determines the order of fluctuation.

ENCOURAGING EXCHANGE

As Professor of Computer Science at UCL, Pontil researches machine learning, with a focus on regularisation methods, convex optimisation and statistical estimation. He is aiming to make significant UK contributions to these interdisciplinary fields.

Many machine learning techniques involve problems of minimising an objective function over a large set of parameters. The objective function is often convex, and as a result, ideas from the field of convex optimisation are becoming increasingly important for the development of learning algorithms. In the past, machine learning has used ‘off-the-shelf’ methods, not fully exploiting the theory underlying the field. There is a clear need for more communication between machine learning and numerical optimisation. Linking these synergistic communities will have many benefits, enabling core optimisation to be applied more easily in machine learning while opening new doors in optimisation.

STRUCTURED SPARSITY

The interplay between these two fields is particularly important in the use of sparsity-inducing optimisation problems. When the number of model parameters is greater than the number of observations, having a sparse choice of parameters is important for fast and accurate learning – a principle driving the use of sparsity-inducing models. Pontil believes there is therefore

an opportunity to develop novel algorithms for matrix learning problems under structured sparsity constraints.

Structured sparsity has proven useful in many contexts, including collaborative filtering and multitask learning. These kinds of optimisation problems, however, have only recently been addressed in machine learning and several fundamental issues remain unsolved. The most pressing of these are algorithms that exploit underlying sparsity assumptions and statistical analysis of learning methods; Pontil plans to tackle both.

CLOSING THE GAPS

The project is expected to proceed until August 2014, but the pair have plans that extend far beyond that. “There are many loose ends in the theoretical analysis of our SVD algorithm. We are also working on an implementation that will allow us to increase the size of problem we can handle,” Hauser explains. In the long term, this work might prove useful for data mining approaches; the analysis of large amounts of data to extract hitherto unknown patterns.

The interaction between machine learning and optimisation is just beginning, and is garnering interest from many governments, businesses and organisations. Simultaneously, it is becoming clear that convexity may require assumptions that are not suitable for many practical uses. As a result, there is growing interest in wider classes of non-convex learning models. Pontil’s work on learning low-rank tensors is a primary example of a non-convex model and tensor representations. It has important applications in many areas, including user modelling, natural language processing and computer vision. This is not the only route Pontil hopes to explore: “Deep learning, which is closely related to older multilayer neural networks, provides another example of non-convex learning algorithms,” he explains. “This is exciting for companies involved in data science such as Google and Facebook. Furthermore, distributed and stochastic optimisation schemes for machine learning could have huge impact in scaling up learning algorithms to big data.”

A collaborative research project between the University of Oxford and University College London in the UK is making great strides in machine learning, a discipline with increasingly important roles in numerous STEM subjects

Learning from example

WWW.RESEARCHMEDIA.EU 71

PROFESSORS RAPHAEL HAUSER & MASSIMILIANO PONTIL

Page 3: Converging PROFESSORS RAPHAEL HAUSER & MASSIMILIANO …€¦ · techniques to design novel statistical tests for sequence alignment. Determining the similarity ... natural language

MARKETING SCIENCE – machine learning can help tackle the common problem in marketing science: predicting how likely it is that a customer will buy a particular product, based on their buying history. A product can be described by a series of variables, and a function models the relationships between a product and the customer’s desire to buy that product, represented by an integer score. It can often be assumed that the function depends linearly on the variables, and that only a few of those variables are relevant to the score. This leads to sparsity constraints on the parameters associated with the function. Data may be available for several customers and can be used to give recommendations. This has been studied intensely under the name of ‘collaborative filtering’ and has important implications for the retail industry, underlying the recommendations of familiar companies including Amazon and Netflix.

MULTITASK LEARNING – collaborative filtering is an example of multitask learning. It is a machine learning framework that solves multiple related tasks simultaneously by exploiting the similarities between them. Outside of collaborative filtering, multitask learning also has applications in bioinformatics, computer vision, and natural language processing, among others.

SEQUENCE ALIGNMENT – finding pairs of homologous sequences is important in both genetics and natural language processing. However, the method conventionally used to solve it can lead to false positives. Hauser and his collaborators realised that the microstructure of optimal sequence alignments can be used to replace the standard one-dimensional score with a score containing more detailed information. This allows better separation of homologous and non-homologous pairs, resulting in fewer false results, and has been analysed both theoretically and practically.

RISK MANAGEMENT – Hauser has developed risk management systems that can be used in situations where data is available that describes individual risks (such as different business lines in a company), but whose interdependence is poorly understood. Basing capital charge (reserve capital to absorb losses and remain solvent) on a ‘pessimal’ overall risk exposure – in other words, the worst possible value a risk measure can take – is often judged to be unrealistic by practitioners and rejected. Applying a tensor regularisation term investigated by Pontil and collaborators, Hauser and his team developed a risk-management approach that is able to pessimise the overall risk exposure over a set of probability distributions that look much more realistic, rendering the approach more acceptable to practitioners.

ASSORTED APPLICATIONS

The methods Hauser and Pontil are investigating, and in some cases developing, could be used to solve many problems of practical importance:

STRUCTURED SPARSITY METHODS IN MACHINE LEARNING AN CONVEX OPTIMISATION

OBJECTIVES

• To approximate leading part singular value decomposition of large-scale matrices to improve parallel algorithms

• To design statistical significance tests with optimal sequence alignments that utilise machine learning techniques

• To develop a framework for learning matrices that exhibit structured sparsity constraints

KEY COLLABORATORS

Professor Heinrich Matzinger; Dr Saba Amsalu, Georgia Institute of Technology, USA • Dr Daniel Goodman, University of Manchester, UK • Sheng Fang; Sergey Shahverdyan, University of Oxford, UK • Professor Charles Micchelli, SUNY Albany, USA • Professor Alexandre Tsybakov, Ecole Polytechnique, France • Andreas Maurer, Munich, Germany

FUNDING

Engineering and Physical Sciences Research Council (EPSRC)

CONTACT

Raphael Hauser Associate Professor in Numerical Mathematics

T +44 01865 615 308 E [email protected]

Massimiliano PontilProfessor of Computer Science

T +44 02076 790 129E [email protected]

RAPHAEL HAUSER received his PhD in Operations Research from Cornell University, USA. He is currently Associate Professor in Numerical Mathematics in the numerical Analysis and Mathematical Finance Groups at the Oxford Mathematical Institute, and the Tanaka Fellow in Applied Mathematics at Pembroke College Oxford.

MASSIMILIANO PONTIL received his PhD from the University of Genoa, Italy. He is Professor of Computer Science at University College London. His research interests are in the area of machine learning with a focus on regularisation methods, convex optimisation and statistical estimation.

72 INTERNATIONAL INNOVATION

INTELLIGENCE