5. nguyen - science of information

27
Science of Information, Computation and Fusion 15 March 2011 Tristan Nguyen Program Manager AFOSR/RSL Air Force Office of Scientific Research AFOSR Distribution A: Approved for public release; distribution is unlimited. 88ABW-2011-0776

Upload: afosr

Post on 28-Mar-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5. Nguyen - Science of Information

Science of Information,

Computation and Fusion15 March 2011

Tristan Nguyen

Program Manager

AFOSR/RSL

Air Force Office of Scientific Research

AFOSR

Distribution A: Approved for public release; distribution is unlimited. 88ABW-2011-0776

Page 2: 5. Nguyen - Science of Information

2

2011 AFOSR SPRING REVIEW2311J/Z PORTFOLIO OVERVIEW

NAME: Tristan Nguyen

BRIEF DESCRIPTION OF PORTFOLIO:

To develop scientific foundations for information analytics from

disparate sources through the fundamental understanding of data,

structures of information, and their control for collection.

LIST SUB-AREAS IN PORTFOLIO:

• Control of Information Collection and Fusion

• High-dimensional Data Analytics for Information Discovery

• Sensor Fusion for Detection, Scheduling, Management

• Network Analysis and Coding

• Foundations of Science of Information and Computation

• Mathematics of Signal, Data, Information

Page 3: 5. Nguyen - Science of Information

3

Scientific Challenges

Some Technical Hurdles:

• How can meaningful information be inferred from

massive, high-dimensional data that are collected from

heterogeneous sources?

• What is “information” and how can it be computed? Is

there a new theory of information?

• Lack of understanding of the gaps between the worlds

of sensor processing and information science.

• Lack of (physical/conceptual) models to describe

information, its dynamics, and its collection process.

Page 4: 5. Nguyen - Science of Information

4

Transformational Opportunities

Direct Potential Payoffs:

• Advance from sensing to information and better the

collection-processing chain.

• Develop an automated reasoning capability that can

manipulate or compute information.

• Develop a new information system and new concepts

for management of information to tame the data

deluge.

• Enhance human-machine performance through

efficient, semi-autonomous algorithmic procedures.

Page 5: 5. Nguyen - Science of Information

5

Other Organizations That Fund Related Work

• ARO

- MURI (PM: Liyi Dai): Opportunistic Sensing for Object

and Activity Recognition from Multi-Modal, Multi-

Platform Data

- MURI (PM: Liyi Dai): Value of Information for

Distributed Data Fusion (FY11)

- MURI (PM: John Lavery): Network-based Hard/Soft

Information Fusion

- MURI (PM: Harry Chang): Quantum Stochastics and

Control (FY11)

Page 6: 5. Nguyen - Science of Information

6

• ONR (PM: TBD, 6.1 & 6.2)

- Information Representation, Integration, Processing

• DDR&E (Director: Robin Quinlan, 6.2)

- DoD‟s Advanced Math Challenge

• NSF & DTRA (Director: Leland Jameson, PMs: Ngai

Wong, Christian Whitchurch, Brandi Vann)

- Algorithms for Threat Detection

Other Organizations That Fund Related Work (Continued)

Page 7: 5. Nguyen - Science of Information

7

Program Trends

• Information space formulation for control and

collection

• Foundation for a new theory of information and its

computation

• Data analytics and data fusion

• New mathematics for signal, data, information

• Sensor fusion for detection, scheduling,

management

• Network analysis and coding

Page 8: 5. Nguyen - Science of Information

8

FY06 MURI: Integrated Fusion, Performance Prediction and Sensor Management for ATE

Objective: An integrated theory for

adaptive ATE that simultaneously

addresses

• Information fusion

• Sensor management and control

• Directed sensor signal processing

Approach:

• Optimal, robust information fusion with graphical models and

information-theoretic performance metrics

• Adaptive front-end signal processing to extract optimal feature sets

from sparse aperture data

• Dynamic sensor resource management and control strategies for

platform trajectories to achieve objectives

Team: The Ohio State University

(Randy Moses, PI), MIT, Boston

University, University of

Michigan, University of Florida

AFRL Participation: RY, RW, RI

Recent Transitions

Page 9: 5. Nguyen - Science of Information

9

OSU MURI team participation in AFRL

Gotcha data collection exercises

• November 2007: Layered sensing collection

• August 2008: AFRL Radar/EO sensing

• Throughout 2009: Analysis of initial data

releases in conjunction with RY

• Data also benefits:

• Other university programs

• ATR Center research

Recent Transitions (Cont‟d)

Technology Transfer

• AFOSR MURI: DARPA‟s ATIF, POSSE; SBIRs with AF

• Army STTR Topic Number A10a-T020: Topological Data Analysis and Wide

Area Detection of Chemical and Biological Contamination

Page 10: 5. Nguyen - Science of Information

10

Program Roadmap

Symbolic Conceptual

Salient

featuresBehaviors,

meaningEvents

Structures,

relationships Signal

magnitudes

More abstract

Major Themes

• Understanding of “Information Spectrum”:

- Data analytics & data fusion to extract meaningful

information

- Formulation of a new theory of information

- Find a computing science for this new theory

• Understanding how “information” can adaptively be used to

guide collection of data:

- Formalization & computation of information states

- Find a new control theory for information

Page 11: 5. Nguyen - Science of Information

11

Program Roadmap

Highlights of Some Projects

• Control of Information Collection and Fusion (akin to DDDAS)

• Data Analytics (beyond dimension reduction & classification)

- Ideal Point Topic Model (statistical modeling of data)

- Non-parametric Regression (algorithmic modeling)

- Topological Data Fusion (algorithmic modeling)

• Information & Computation

- Homotopical & Higher Algebraic Structures of Type Theory

Guiding Principles

• More than discovery of algorithms

• Understanding capabilities/limitations of scientific approaches

Page 12: 5. Nguyen - Science of Information

12

Control of Information Collection and Fusion

AFOSR FY2010 MURI TOPIC # 18

Control of Information Collection and Fusion

Goal: This MURI will forge a rigorous new perspective on

the joint control of multiple information sources of disparate

types to simultaneously achieve quantified informational and

physical objectives…

Innovative Ideas:

- Inference global information from local measurements

- Formulation of information states

- Study the relations between information states and

physical states

- Discovery of a new framework for control of information

• Jadbabaie

• Koditschek

• Kumar

• Ribeiro

• Berkeley

• Ramachandran

• Sastry

• Tomlin

• Minnesota

• Jindal

• Roumeliotis

• Illinois

• Baryshnikov

• Melbourne

• Howard

•Moran

Page 13: 5. Nguyen - Science of Information

13

Control of Information Collection and Fusion (Cont‟d)

Some Technical Tools:

Shape Spaces

Path Spaces Spaces of

classification

Piecewise Linear

(PL) Representation

Page 14: 5. Nguyen - Science of Information

14

[ Friedman, ArXiv „08]

Realization Functor

Simplicial

Set

Functor

PL Network

Attributes, Beliefs, Motivation

Agency, Biases, Data

Control of Information Collection and Fusion (Cont‟d)

Singular Set Functor

?

Analytic Network

?

Basic Questions: 1. Evolution of information states?

2. Open, closed loops?

3. Estimation & computation of information states?

Page 15: 5. Nguyen - Science of Information

15

Ideal Point Topic ModelPI: D. Blei (Princeton U), L. Carin (Duke U)

Motivation: The ideal point model is a model devised to discover voting patterns

Goals: 1. Predict missing votes (incomplete data) or preferences on new items

2. Infer legislative behaviors based on observed roll call data

Ideal Point Model

Observed roll

call data

Users/Voters

Item

s/B

ills

Limitation

Page 16: 5. Nguyen - Science of Information

16

Ideal Point Topic Model (Cont‟d)PI: D. Blei (Princeton U), L. Carin (Duke U)

New: Ideal Point Topic model incorporates topics that are automatically generated

Users/Voters

Item

s/B

ills

Predicted

Missing Votes

New Item

Goals: 1. Predict missing votes (incomplete data) or preferences on new items

2. Infer legislative behaviors based on observed roll call data

3. Associate topics with legislative behaviors

Page 17: 5. Nguyen - Science of Information

17

Ideal Point Topic Model (Cont‟d)PIs: D. Blei (Princeton U), L. Carin (Duke U)

Past Work: Topic Modeling (D. Blei et al., ~„03)

Goal: Given a corpus of documents, determine

the topics in it.

Ideas in Topic Modeling: The corpus is

consisted of documents, each of which is

consisted of topics which are groups of words

Discovered Topics

Number of words Number of

documents

Number of

topics

Assumptions: D, K, N are known a priori

Page 18: 5. Nguyen - Science of Information

18

Ideal Point Topic Model (Cont‟d)PIs: D. Blei (Princeton U), L. Carin (Duke U)

Innovation:

- A combination of generative and

discriminative probabilistic models is

used to automatically identify and

group variables based on observed

data.

- Modular design enables addition

of new and/or heterogeneous data

into the model to improve prediction.

General Applications:

- Versatile model for predicting

individuals and their preferences.

- Versatile model for many types of

digital data simultaneously.Generative Model &

de Finetti‟s Exchangeability

Discriminative Model

Page 19: 5. Nguyen - Science of Information

19

Non-parametric Regression in High Dimensions

PIs: M. Wainwright, B. Yu, G. Rashkutti (UCB)

Motivation: Non-parametric regression is one of the most important tools in

statistical learning theory which are used, as an “algorithmic model”, for data

analysis (evaluation, prediction, interpolation, etc.)

Goals: 1. Derive a result in high dimensions when only a smaller amount of data

samples is given.

2. Study information-theoretic limits.

Requirement: Efficient construction of provably guaranteed optimal solution

Example: Discovery of linkages in social networks

• p individuals are observed over n different events, yielding a data matrix X of

dimension n x p where n << p

• Unknown graph G with edges E specifies dependencies:

• Learn structure of G by solving a sequence of convex programs

Page 20: 5. Nguyen - Science of Information

20

High Dimension (p) Sparsity, Low-dimensional Structure (s)

Small Sample Size (n << p) Complexity of Algorithmic Model

Theme: Compressive SensingInnovative

Approach

Assume: unknown , where |S| = s and univariate

Procedure:

Non-parametric Regression in High Dimensions (Cont‟d)

PIs: M. Wainwright, B. Yu, G. Rashkutti (UCB)

Page 21: 5. Nguyen - Science of Information

21

Simulation & Empirical

Validation of Theorem

Note: in practice,

p >> n >> log (p)

asserts how much data is needed

Non-parametric Regression in High Dimensions (Cont‟d)

PIs: M. Wainwright, B. Yu, G. Rashkutti (UCB)

Page 22: 5. Nguyen - Science of Information

22

Topological Methods for Data FusionPI: G. Carlsson et al. (Stanford U)

Data Analysis

• probing local measurements of an

abstract entities in an environment;

quantification

• Source separation using different

sensing modes or measurements

• Semantic representation and reasoning

• Fusion or integration (local-to-global,

Gestalt), integration of modular

components

• Invariants of an object (Gestalt), Spaces

of more abstract structures

• Multi-scale interpretation

sheaves of sets, functions, maps,

mathematical structures on a topological

space; valuation

Maps between topological spaces and

their fibers

Formal semantics, logic, type theory

Gluing, colimits, invariants (spatial,

topological, metric, etc.)

Configuration spaces, moduli spaces

Persistent topology, multiscale analysis

Mathematical Tools

Goal: To fuse data sets for analytics tasks and information extraction

Idea: Pure mathematics offers “natural” tools at higher-level analysis

Page 23: 5. Nguyen - Science of Information

23

Topological Methods for Data Fusion (Cont‟d)

PI: G. Carlsson et al. (Stanford U)

Ontology Integration

Current Practices in Data Fusion or Integration:

1. Matching of points or concepts via geometric or graph methods.

2. Lack of ideas for merging previously matched concepts.

3. Ideas in integration of ontologies and database schemas are not

sufficient tools to handle digital data (a great deal of humans in

the loop).

4. Lack of objective qualitative or quantitative measurements of

performance.

Page 24: 5. Nguyen - Science of Information

24

Topological Methods for Data Fusion(Cont‟d)

PI: G. Carlsson (Stanford U)

Refinement & M-Alignment of

Ontologies (Mossakowski et al., „10)

Fusion of overlapped samples &

topological bootstrapping

Innovation: 1. Matching and merging of concepts and points in data

2. Analysis of the effects of maps between datasets

3. Introduction of refinement via multi-resolutions

4. Incorporation of ideas in statistical learning to improve

refinement

5. Introduction of means to ensure quality of performance

6. Analysis of possible obstructions to fusion

similarity

Page 25: 5. Nguyen - Science of Information

25

Homotopical & Higher Algebraic Structures of Type Theory

PI: S. Awodey (CMU)

Past Scientific Discoveries:

Computation

(Simply Typed λ-Calculus)

Types

Terms

Term Reductions

Propositional Logic

Formulas

Proofs

Transformations of Proofs

Curry-Howard

Correspondence

Type theory encodes logical structures & their interpretation

• Martin-Löf originated a constructive (dependent) type theory.

• Thierry Coquand implemented a dependent type theory Coq, a computer

verification system & proof assistant.

Some Applications: Interactive/automated theorem provers are used in

formal mathematical proof, formal verification, automated reasoning;

dependently typed assembly languages; semi-automated information retrieval

from data bases; proof-carrying code.

Goal: Discover a new framework for computation of information

Page 26: 5. Nguyen - Science of Information

26

Homotopical & Higher Algebraic Structures of Type Theory (Cont‟d)

PI: S. Awodey (CMU)

Recent Developments:

• Steve Awodey & Vladimir Voevodsky (IAS) independently

discovered a deep connection between dependent type theory &

homotopy theory.

• Some difficulties with dependent type theory can now be

circumvented and generalized for greater flexibility and consistency.

• Homotopy theory can be used as a semantic for type theory.

• Voevodsky is building up a math library using the Coq proof

assistant and his homotopy λ-Calculus.

Multidisciplinary Collaboration: mathematicians, logicians,

computer scientists, philosophers.

• Workshop in Germany, Feb – Mar 2011

• Special Program at IAS, 2012-2013

Page 27: 5. Nguyen - Science of Information

27

Program Summary

New shifts in the program:

• Infusing more ideas from computer science, mathematics, logic,

statistics, and engineering.

• Focusing on the forefront of data and information analytics:

New theory for information, perhaps, beyond Shannon‟s

Information Theory

New paradigm for computation with/on information, beyond

Turing‟s model

Understanding the connection between data and information,

and their control through formal models

• Forging close connection with other programs in RSL.

• Starting to reach out to AFRL/RH.

• Reaching out to the research community in Europe and Canada.