information-theoretic methods in data science

Cambridge University Press978-1-108-42713-5 — Information-Theoretic Methods in Data ScienceEdited by Miguel R. D. Rodrigues , Yonina C. Eldar FrontmatterMore Information

www.cambridge.org© in this web service Cambridge University Press

Information-Theoretic Methods in Data Science

Learn about the state-of-the-art at the interface between information theory and data

science with this first unified treatment of the subject. Written by leading experts

in a clear, tutorial style, and using consistent notation and definitions throughout, it

shows how information-theoretic methods are being used in data acquisition, data

representation, data analysis, and statistics and machine learning.

Coverage is broad, with chapters on signal data acquisition, data compression,

compressive sensing, data communication, representation learning, emerging topics in

statistics, and much more. Each chapter includes a topic overview, definition of the key

problems, emerging and open problems, and an extensive reference list, allowing readers

to develop in-depth knowledge and understanding.

Providing a thorough survey of the current research area and cutting-edge trends, this

is essential reading for graduate students and researchers working in information theory,

signal processing, machine learning, and statistics.

Miguel R. D. Rodrigues is Professor of Information Theory and Processing in the Depart-

ment of Electronic and Electrical Engineering, University College London, and a Turing

Fellow at the Alan Turing Institute, London.

Yonina C. Eldar is a Professor in the Faculty of Mathematics and Computer Science at

the Weizmann Institute of Science, a Fellow of the IEEE and Eurasip, and a member of

the Israel Academy of Sciences and Humanities. She is the author of Sampling Theory

(Cambridge, 2015), and co-editor of Compressed Sensing in Radar Signal Processing

(Cambridge, 2019), Compressed Sensing (Cambridge, 2012), and Convex Optimization

in Signal Processing and Communications (Cambridge, 2009).

www.cambridge.org/9781108427135

www.cambridge.org



“How should we model the opportunities created by access to data and to computing,

and how robust and resilient are these models? What information is needed for analysis,

how is it obtained and applied? If you are looking for answers to these questions, then

there is no better place to start than this book.”

Robert Calderbank, Duke University

“This pioneering book elucidates the emerging cross-disciplinary area of information-

theoretic data science. The book’s contributors are leading information theorists who

compellingly explain the close connections between the fields of information theory

and data science. Readers will learn about theoretical foundations for designing data

collection and analysis systems, from data acquisition to data compression, computation,

and learning.”

Al Hero, University of Michigan


www.cambridge.org



Information-Theoretic Methodsin Data Science

Edited by

MIGUEL R. D. RODRIGUESUniversity College London

YONINA C. ELDARWeizmann Institute of Science


www.cambridge.org



University Printing House, Cambridge CB2 8BS, United Kingdom

One Liberty Plaza, 20th Floor, New York, NY 10006, USA

477 Williamstown Road, Port Melbourne, VIC 3207, Australia

314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India

79 Anson Road, #06-04/06, Singapore 079906

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of

education, learning, and research at the highest international levels of excellence.

www.cambridge.org

Information on this title: www.cambridge.org/9781108427135

DOI: 10.1017/9781108616799

© Cambridge University Press 2021

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,

no reproduction of any part may take place without the written

permission of Cambridge University Press.

First published 2021

Printed in the United Kingdom by TJ International Ltd, Padstow Cornwall

A catalogue record for this publication is available from the British Library.

ISBN 978-1-108-42713-5 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy

of URLs for external or third-party internet websites referred to in this publication

and does not guarantee that any content on such websites is, or will remain,

accurate or appropriate.


www.cambridge.org



To my husband Shalomi and children Yonatan, Moriah, Tal, Noa, and Roei for their

boundless love and for filling my life with endless happiness

YE

To my wife Eduarda and children Isabel and Diana for their unconditional love,

encouragement, and support

MR


www.cambridge.org




www.cambridge.org



Contents

Preface page xiii

Notation xvii

List of Contributors xix

1 Introduction to Information Theory and Data Science 1

Miguel R. D. Rodrigues, Stark C. Draper, Waheed U. Bajwa, and Yonina C. Eldar

1.1 Classical Information Theory: A Primer 2

1.2 Source Coding: Near-Lossless Compression of Binary Sources 4

1.3 Channel Coding: Transmission over the Binary Symmetric Channel 8

1.4 Linear Channel Coding 12

1.5 Connecting Information Theory to Data Science 15

1.6 Information Theory and Data Acquisition 18

1.7 Information Theory and Data Representation 20

1.8 Information Theory and Data Analysis and Processing 26

1.9 Discussion and Conclusion 35

References 36

2 An Information-Theoretic Approach to Analog-to-Digital Compression 44

Alon Kipnis, Yonina C. Eldar, and Andrea J. Goldsmith

2.1 Introduction 44

2.2 Lossy Compression of Finite-Dimensional Signals 48

2.3 ADX for Continuous-Time Analog Signals 49

2.4 The Fundamental Distortion Limit 55

2.5 ADX under Uniform Sampling 63

2.6 Conclusion 69

References 70

3 Compressed Sensing via Compression Codes 72

Shirin Jalali and H. Vincent Poor

3.1 Compressed Sensing 72

3.2 Compression-Based Compressed Sensing 74


www.cambridge.org



viii Contents

3.3 Definitions 76

3.4 Compressible Signal Pursuit 77

3.5 Compression-Based Gradient Descent (C-GD) 81

3.6 Stylized Applications 85

3.7 Extensions 92

3.8 Conclusions and Discussion 100

References 100

4 Information-Theoretic Bounds on Sketching 104

Mert Pilanci

4.1 Introduction 104

4.2 Types of Randomized Sketches 105

4.3 Background on Convex Analysis and Optimization 108

4.4 Sketching Upper Bounds for Regression Problems 110

4.5 Information-Theoretic Lower Bounds 114

4.6 Non-Parametric Problems 121

4.7 Extensions: Privacy and Communication Complexity 124

4.8 Numerical Experiments 126

4.9 Conclusion 127

A4.1 Proof of Theorem 4.3 128

A4.2 Proof of Lemma 4.3 129

References 130

5 Sample Complexity Bounds for Dictionary Learning from Vector-

and Tensor-Valued Data 134

Zahra Shakeri, Anand D. Sarwate, and Waheed U. Bajwa


5.2 Dictionary Learning for Vector-Valued Data 137

5.3 Dictionary Learning for Tensors 149

5.4 Extensions and Open Problems 157

References 160

6 Uncertainty Relations and Sparse Signal Recovery 163

Erwin Riegler and Helmut Bölcskei


6.2 Uncertainty Relations in (Cm,‖ · ‖2) 165

6.3 Uncertainty Relations in (Cm,‖ · ‖1) 173

6.4 Sparse Signal Separation 181

6.5 The Set-Theoretic Null-Space Property 184

6.6 A Large Sieve Inequality in (Cm,‖ · ‖2) 187

6.7 Uncertainty Relations in L1 and L2 189


www.cambridge.org



Contents ix

6.8 Proof of Theorem 6.4 189

6.9 Results for ||| · |||1 and ||| · |||2 192

References 193

7 Understanding Phase Transitions via Mutual Information and MMSE 197

Galen Reeves and Henry D. Pfister


7.2 Problem Setup and Characterization 200

7.3 The Role of Mutual Information and MMSE 207

7.4 Proving the Replica-Symmetric Formula 215

7.5 Phase Transitions and Posterior Correlation 219

7.6 Conclusion 223

A7.1 Subset Response for the Standard Linear Model 223

References 225

8 Computing Choice: Learning Distributions over Permutations 229

Devavrat Shah

8.1 Background 229

8.2 Setup 232

8.3 Models 234

8.4 Sparse Model 236

8.5 Random Utility Model (RUM) 247

8.6 Discussion 255

References 258

9 Universal Clustering 263

Ravi Kiran Raman and Lav R. Varshney

9.1 Unsupervised Learning in Machine Learning 263

9.2 Universality in Information Theory 265

9.3 Clustering and Similarity 271

9.4 Distance-Based Clustering 276

9.5 Dependence-Based Clustering 280

9.6 Applications 291

9.7 Conclusion 292

References 293

10 Information-Theoretic Stability and Generalization 302

Maxim Raginsky, Alexander Rakhlin, and Aolin Xu


10.2 Preliminaries 305


www.cambridge.org



x Contents

10.3 Learning Algorithms and Stability 308

10.4 Information Stability and Generalization 311

10.5 Information-Theoretically Stable Learning Algorithms 317

References 327

11 Information Bottleneck and Representation Learning 330

Pablo Piantanida and Leonardo Rey Vega

11.1 Introduction and Overview 330

11.2 Representation and Statistical Learning 333

11.3 Information-Theoretic Principles and Information Bottleneck 339

11.4 The Interplay between Information and Generalization 347

11.5 Summary and Outlook 356

References 356

12 Fundamental Limits in Model Selection for Modern Data Analysis 359

Jie Ding, Yuhong Yang, and Vahid Tarokh


12.2 Fundamental Limits in Model Selection 360

12.3 An Illustration on Fitting and the Optimal Model 363

12.4 The AIC, the BIC, and Other Related Criteria 365

12.5 The Bridge Criterion – Bridging the Conflicts between the AIC

and the BIC 369

12.6 Modeling-Procedure Selection 375

12.7 Conclusion 379

References 380

13 Statistical Problems with Planted Structures: Information-Theoretical

and Computational Limits 383

Yihong Wu and Jiaming Xu


13.2 Basic Setup 384

13.3 Information-Theoretic Limits 386

13.4 Computational Limits 401

13.5 Discussion and Open Problems 412

A13.1 Mutual Information Characterization of Correlated Recovery 415

A13.2 Proof of (13.7)⇒ (13.6) and Verification of (13.7)

in the Binary Symmetric SBM 418

References 419


www.cambridge.org



Contents xi

14 Distributed Statistical Inference with Compressed Data 425

Wenwen Zhao and Lifeng Lai


14.2 Basic Model 427

14.3 Interactive Communication 438

14.4 Identity Testing 445

14.5 Conclusion and Extensions 451

References 452

15 Network Functional Compression 455

Soheil Feizi and Muriel Médard


15.2 Problem Setup and Prior Work 458

15.3 Network Functional Compression 463

15.4 Discussion and Future Work 473

15.5 Proofs 476

References 484

16 An Introductory Guide to Fano’s Inequality with Applications

in Statistical Estimation 487

Jonathan Scarlett and Volkan Cevher


16.2 Fano’s Inequality and Its Variants 492

16.3 Mutual Information Bounds 495

16.4 Applications – Discrete Settings 499

16.5 From Discrete to Continuous 510

16.6 Applications – Continuous Settings 516

16.7 Discussion 523

References 525

Index 529


www.cambridge.org




www.cambridge.org



Preface

Since its introduction in 1948, the field of information theory has proved instrumen-

tal in the analysis of problems pertaining to compressing, storing, and transmitting

data. For example, information theory has allowed analysis of the fundamental limits

of data communication and compression, and has shed light on practical communica-

tion system design for decades. Recent years have witnessed a renaissance in the use

of information-theoretic methods to address problems beyond data compression, data

communications, and networking, such as compressive sensing, data acquisition, data

analysis, machine learning, graph mining, community detection, privacy, and fairness.

In this book, we explore a broad set of problems on the interface of signal processing,

machine learning, learning theory, and statistics where tools and methodologies orig-

inating from information theory can provide similar benefits. The role of information

theory at this interface has indeed been recognized for decades. A prominent example is

the use of information-theoretic quantities such as mutual information, metric entropy

and capacity in establishing minimax rates of estimation back in the 1980s. Here we

intend to explore modern applications at this interface that are shaping data science in

the twenty-first century.

There are of course some notable differences between standard information-theoretic

tools and signal-processing or data analysis methods. Globally speaking, information

theory tends to focus on asymptotic limits, using large blocklengths, and assumes the

data is represented by a finite number of bits and viewed through a noisy channel.

The standard results are not concerned with complexity but focus more on fundamen-

tal limits characterized via achievability and converse results. On the other hand, some

signal-processing techniques, such as sampling theory, are focused on discrete-time rep-

resentations but do not necessarily assume the data is quantized or that there is noise in

the system. Signal processing is often concerned with concrete methods that are opti-

mal, namely, achieve the developed limits, and have bounded complexity. It is natural

therefore to combine these tools to address a broader set of problems and analysis which

allows for quantization, noise, finite samples, and complexity analysis.

This book is aimed at providing a survey of recent applications of information-

theoretic methods to emerging data-science problems. The potential reader of this book

could be a researcher in the areas of information theory, signal processing, machine

learning, statistics, applied mathematics, computer science or a related research area, or


www.cambridge.org



xiv Preface

a graduate student seeking to learn about information theory and data science and to

scope out open problems at this interface. The particular design of this volume ensures

that it can serve as both a state-of-the-art reference for researchers and a textbook for

students.

The book contains 16 diverse chapters written by recognized leading experts world-

wide, covering a large variety of topics that lie on the interface of signal processing, data

science, and information theory. The book begins with an introduction to information

theory which serves as a background for the remaining chapters, and also sets the nota-

tion to be used throughout the book. The following chapters are then organized into four

categories: data acquisition (Chapters 2–4), data representation and analysis (Chapters

5–9), information theory and machine learning (Chapters 10 and 11), and information

theory, statistics, and compression (Chapters 12–15). The last chapter, Chapter 16, con-

nects several of the book’s themes via a survey of Fano’s inequality in a diverse range

of data-science problems. The chapters are self-contained, covering the most recent

research results in the respective topics, and can all be treated independently of each

other. A brief summary of each chapter is given next.

Chapter 1 by Rodrigues, Draper, Bajwa, and Eldar provides an introduction to infor-

mation theory concepts and serves two purposes: It provides background on classical

information theory, and presents a taster of modern information theory applied to

emerging data-science problems.

Chapter 2 by Kipnis, Eldar, and Goldsmith extends the notion of rate-distortion the-

ory to continuous-time inputs deriving bounds that characterize the minimal distortion

that can be achieved in representing a continuous-time signal by a series of bits when

the sampler is constrained to a given sampling rate. For an arbitrary stochastic input and

given a total bitrate budget, the authors consider the lowest sampling rate required to

sample the signal such that reconstruction of the signal from a bit-constrained represen-

tation of its samples results in minimal distortion. It turns out that often the signal can

be sampled at sub-Nyquist rates without increasing the distortion.

Chapter 3 by Jalali and Poor discusses the interplay between compressed sensing and

compression codes. In particular, the authors consider the use of compression codes to

design compressed sensing recovery algorithms. This allows the expansion of the class

of structures used by compressed sensing algorithms to those used by data compression

codes, which is a much richer class of inputs and relies on decades of developments in

the field of compression.

Chapter 4 by Pilanci develops information-theoretical lower bounds on sketching for

solving large statistical estimation and optimization problems. The term sketching is

used for randomized methods that aim to reduce data dimensionality in computationally

intensive tasks for gains in space, time, and communication complexity. These bounds

allow one to obtain interesting trade-offs between computation and accuracy and shed

light on a variety of existing methods.

Chapter 5 by Shakeri, Sarwate, and Bajwa treats the problem of dictionary learning,

which is a powerful signal-processing approach for data-driven extraction of features

from data. The chapter summarizes theoretical aspects of dictionary learning for vector-

and tensor-valued data and explores lower and upper bounds on the sample complexity


www.cambridge.org



Preface xv

of dictionary learning which are derived using information-theoretic tools. The depen-

dence of sample complexity on various parameters of the dictionary learning problem

is highlighted along with the potential advantages of taking the structure of tensor data

into consideration in representation learning.

Chapter 6 by Riegler and Bölcskei presents an overview of uncertainty relations for

sparse signal recovery starting from the work of Donoho and Stark. These relations are

then extended to richer data structures and bases, which leads to the recently discovered

set-theoretic uncertainty relations in terms of Minkowski dimension. The chapter also

explores the connection between uncertainty relations and the “large sieve,” a family

of inequalities developed in analytic number theory. It is finally shown how uncertainty

relations allow one to establish fundamental limits of practical signal recovery problems

such as inpainting, declipping, super-resolution, and denoising of signals.

Chapter 7 by Reeves and Pfister examines high-dimensional inference problems

through the lens of information theory. The chapter focuses on the standard linear model

for which the performance of optimal inference is studied using the replica method from

statistical physics. The chapter presents a tutorial of these techniques and presents a new

proof demonstrating their optimality in certain settings.

Chapter 8 by Shah discusses the question of learning distributions over permutations

of a given set of choices based on partial observations. This is central to capturing

choice in a variety of contexts such as understanding preferences of consumers over

a collection of products based on purchasing and browsing data in the setting of retail

and e-commerce. The chapter focuses on the learning task from marginal distributions

of two types, namely, first-order marginals and pair-wise comparisons, and provides a

comprehensive review of results in this area.

Chapter 9 by Raman and Varshney studies universal clustering, namely, clustering

without prior access to the statistical properties of the data. The chapter formalizes the

problem in information theory terms, focusing on two main subclasses of clustering that

are based on distance and dependence. A review of well-established clustering algo-

rithms, their statistical consistency, and their computational and sample complexities is

provided using fundamental information-theoretic principles.

Chapter 10 by Raginsky, Rakhlin, and Xu introduces information-theoretic measures

of algorithmic stability and uses them to upper-bound the generalization bias of learn-

ing algorithms. The notion of stability implies that its output does not depend too

much on any individual training example and therefore these results shed light on the

generalization ability of modern learning techniques.

Chapter 11 by Piantanida and Vega introduces the information bottleneck principle

and explores its use in representation learning, namely, in the development of com-

putational algorithms that learn the different explanatory factors of variation behind

high-dimensional data. Using these tools, the authors obtain an upper bound on the

generalization gap corresponding to the cross-entropy risk. This result provides an inter-

esting connection between mutual information and generalization, and helps to explain

why noise injection during training can improve the generalization ability of encoder

models.


www.cambridge.org



xvi Preface

Chapter 12 by Ding, Yang, and Tarokh discusses fundamental limits of inference

and prediction based on model selection principles from modern data analysis. Using

information-theoretic tools the authors analyze several state-of-the-art model selec-

tion techniques and introduce two recent advances in model selection approaches, one

concerning a new information criterion and the other concerning modeling-procedure

selection.

Chapter 13 by Wu and Xu provides an exposition on some of the methods for deter-

mining the information-theoretical as well as computational limits for high-dimensional

statistical problems with a planted structure. Planted structures refer to a ground truth

structure (often of a combinatorial nature) which one is trying to discover in the presence

of random noise. In particular, the authors discuss first- and second-moment methods for

analyzing the maximum likelihood estimator, information-theoretic methods for proving

impossibility results using mutual information and rate-distortion theory, and techniques

originating from statistical physics. To investigate computational limits, they describe

randomized polynomial-time reduction schemes that approximately map planted-clique

problems to the problem of interest in total variation distance.

Chapter 14 by Zhao and Lai considers information-theoretic models for distributed

statistical inference problems with compressed data. The authors review several research

directions and challenges related to applying these models to various statistical learn-

ing problems. In these applications, data are distributed in multiple terminals, which

can communicate with each other via limited-capacity channels. Information-theoretic

tools are used to characterize the fundamental limits of the classical statistical inference

problems using compressed data directly.

Chapter 15 by Feizi and Médard treats different aspects of the network functional

compression problem. The goal is to compress a source of random variables for the

purpose of computing a deterministic function at the receiver where the sources and

receivers are nodes in a network. Traditional data compression schemes are special cases

of functional compression, in which the desired function is the identity function. It is

shown that for certain classes of functions considerable compression is possible in this

setting.

Chapter 16 by Scarlett and Cevher provides a survey of Fano’s inequality and its use

in various statistical estimation problems. In particular, the chapter overviews the use of

Fano’s inequality for establishing impossibility results, namely, conditions under which

a certain goal cannot be achieved by any estimation algorithm. The authors present

several general-purpose tools and analysis techniques, and provide representative exam-

ples covering group testing, graphical model selection, sparse linear regression, density

estimation, and convex optimization.

Within the chapters, the authors point to various open research directions at the

interface of information theory, data acquisition, data analysis, machine learning, and

statistics that will certainly see increasing attention in the years to come.

We would like to end by thanking all the authors for their contributions to this book

and for their hard work in presenting the material in a unified and accessible fashion.


www.cambridge.org



Notation

z scalar (or value of random variable Z)

Z random variable

z vector (or value of random vector Z)

Z matrix (or random vector)

zi ith entry of vector z

Zi, j (i, j)th entry of matrix Z

Zn = (Z1, . . . ,Zn) sequence of n random variables

zn = (z1, . . . ,zn) value of sequence of n random variables Zn

Zj

i=(

Zi, . . . ,Z j

)

sequence of j− i+1 random variables

zj

i=(

zi, . . . ,z j

)

value of sequence of j− i+1 random variables Zj

i

‖ · ‖p p-norm

(·)T transpose operator

(·)∗ conjugate Hermitian operator

(·)† pseudo-inverse of the matrix argument

tr(·) trace of the square matrix argument

det(·) determinant of the square matrix argument

rank(·) rank of the matrix argument

range(·) range span of the column vectors of the matrix argument

λmax(·) maximum eigenvalue of the square matrix argument

λmin(·) minimum eigenvalue of the square matrix argument

λi(·) ith largest eigenvalue of the square matrix argument

I identity matrix (its size is determined from the context)

0 matrix with zero entries (its size is determined from the context)

T standard notation for sets

|T | cardinality of set T

R set of real numbers

C set of complex numbers

Rn set of n-dimensional vectors of real numbers

Cn set of n-dimensional vectors of complex numbers

j imaginary unit

Re(x) real part of the complex number x

Im(x) imaginary part of the complex number x

|x| modulus of the complex number x

arg(x) argument of the complex number x

E[·] statistical expectation

P[·] probability measure


www.cambridge.org



xviii Notation

H(·) entropy

H(·|·) conditional entropy

h(·) differential entropy

h(·|·) conditional differential entropy

D(·‖·) relative entropy

I(·; ·) mutual information

I(·; ·|·) conditional mutual information

N(

µ,σ2)

scalar Gaussian distribution with mean µ and variance σ2

N(µ,Σ) multivariate Gaussian distribution with mean µ and covariance

matrix Σ


www.cambridge.org



Contributors

Waheed U. Bajwa

Department of Electrical and Computer

Engineering

Rutgers University

Helmut Bölcskei

Department of Information Technology

and Electrical Engineering

Department of Mathematics

ETH Zürich

Volkan Cevher

Laboratory for Information and

Inference Systems

Institute of Electrical Engineering

School of Engineering, EPFL

Jie Ding

School of Statistics

University of Minnesota

Stark C. Draper


Engineering

University of Toronto

Yonina C. Eldar

Faculty of Mathematics and Computer

Science

Weizmann Institute of Science

Soheil Feizi

Department of Computer Science

University of Maryland, College Park

Andrea J. Goldsmith

Department of Electrical Engineering

Stanford University

Shirin Jalali

Nokia Bell Labs

Alon Kipnis

Department of Statistics

Stanford University

Lifeng Lai


Engineering

University of California, Davis

Muriel Médard


and Computer Science

Massachussets Institute of

Technology

Henry D. Pfister


Engineering

Duke University

Pablo Piantanida

Laboratoire des Signaux et Systèmes

Université Paris Saclay

CNRS-CentraleSupélec

Montreal Institute for Learning

Algorithms (Mila), Université

de Montréal


www.cambridge.org



xx List of Contributors

Mert Pilanci


Stanford University

H. Vincent Poor


Princeton University

Maxim Raginsky


Engineering

University of Illinois at

Urbana-Champaign

Alexander Rakhlin

Department of Brain & Cognitive

Sciences

Massachusetts Institute of Technology

Ravi Kiran Raman


Engineering


Urbana-Champaign

Galen Reeves

Department of Electrical and

Computer Engineering

Department of Statistical ScienceDuke University

Leonardo Rey Vega


School of Engineering

Universidad de Buenos Aires

CSC-CONICET

Erwin Riegler

Department of Information Technology

and Electrical Engineering

ETH Zürich

Miguel R. D. Rodrigues

Department of Electronic and Electrical

Engineering

University College London

Anand D. Sarwate



Rutgers University

Jonathan Scarlett

Department of Computer Science and

Department of Mathematics

National University of Singapore

Devavrat Shah

Department of Electrical Engineering and

Computer Science

Massachusetts Institute of Technology

Zahra Shakeri

Electronic Arts

Vahid Tarokh


Engineering

Duke University

Lav R. Varshney


Engineering

University of Illinois at Urbana-Champaign

Yihong Wu

Department of Statistics and Data Science

Yale University


www.cambridge.org



List of Contributors xxi

Aolin Xu


Engineering


Urbana-Champaign

Jiaming Xu

Fuqua School of Business

Duke University

Yuhong Yang

School of Statistics

University of Minnesota

Wenwen Zhao



University of California, Davis


www.cambridge.org




www.cambridge.org

information-theoretic methods in data science

Documents