5. nguyen - science of information
TRANSCRIPT
Science of Information,
Computation and Fusion15 March 2011
Tristan Nguyen
Program Manager
AFOSR/RSL
Air Force Office of Scientific Research
AFOSR
Distribution A: Approved for public release; distribution is unlimited. 88ABW-2011-0776
2
2011 AFOSR SPRING REVIEW2311J/Z PORTFOLIO OVERVIEW
NAME: Tristan Nguyen
BRIEF DESCRIPTION OF PORTFOLIO:
To develop scientific foundations for information analytics from
disparate sources through the fundamental understanding of data,
structures of information, and their control for collection.
LIST SUB-AREAS IN PORTFOLIO:
• Control of Information Collection and Fusion
• High-dimensional Data Analytics for Information Discovery
• Sensor Fusion for Detection, Scheduling, Management
• Network Analysis and Coding
• Foundations of Science of Information and Computation
• Mathematics of Signal, Data, Information
3
Scientific Challenges
Some Technical Hurdles:
• How can meaningful information be inferred from
massive, high-dimensional data that are collected from
heterogeneous sources?
• What is “information” and how can it be computed? Is
there a new theory of information?
• Lack of understanding of the gaps between the worlds
of sensor processing and information science.
• Lack of (physical/conceptual) models to describe
information, its dynamics, and its collection process.
4
Transformational Opportunities
Direct Potential Payoffs:
• Advance from sensing to information and better the
collection-processing chain.
• Develop an automated reasoning capability that can
manipulate or compute information.
• Develop a new information system and new concepts
for management of information to tame the data
deluge.
• Enhance human-machine performance through
efficient, semi-autonomous algorithmic procedures.
5
Other Organizations That Fund Related Work
• ARO
- MURI (PM: Liyi Dai): Opportunistic Sensing for Object
and Activity Recognition from Multi-Modal, Multi-
Platform Data
- MURI (PM: Liyi Dai): Value of Information for
Distributed Data Fusion (FY11)
- MURI (PM: John Lavery): Network-based Hard/Soft
Information Fusion
- MURI (PM: Harry Chang): Quantum Stochastics and
Control (FY11)
6
• ONR (PM: TBD, 6.1 & 6.2)
- Information Representation, Integration, Processing
• DDR&E (Director: Robin Quinlan, 6.2)
- DoD‟s Advanced Math Challenge
• NSF & DTRA (Director: Leland Jameson, PMs: Ngai
Wong, Christian Whitchurch, Brandi Vann)
- Algorithms for Threat Detection
Other Organizations That Fund Related Work (Continued)
7
Program Trends
• Information space formulation for control and
collection
• Foundation for a new theory of information and its
computation
• Data analytics and data fusion
• New mathematics for signal, data, information
• Sensor fusion for detection, scheduling,
management
• Network analysis and coding
8
FY06 MURI: Integrated Fusion, Performance Prediction and Sensor Management for ATE
Objective: An integrated theory for
adaptive ATE that simultaneously
addresses
• Information fusion
• Sensor management and control
• Directed sensor signal processing
Approach:
• Optimal, robust information fusion with graphical models and
information-theoretic performance metrics
• Adaptive front-end signal processing to extract optimal feature sets
from sparse aperture data
• Dynamic sensor resource management and control strategies for
platform trajectories to achieve objectives
Team: The Ohio State University
(Randy Moses, PI), MIT, Boston
University, University of
Michigan, University of Florida
AFRL Participation: RY, RW, RI
Recent Transitions
9
OSU MURI team participation in AFRL
Gotcha data collection exercises
• November 2007: Layered sensing collection
• August 2008: AFRL Radar/EO sensing
• Throughout 2009: Analysis of initial data
releases in conjunction with RY
• Data also benefits:
• Other university programs
• ATR Center research
Recent Transitions (Cont‟d)
Technology Transfer
• AFOSR MURI: DARPA‟s ATIF, POSSE; SBIRs with AF
• Army STTR Topic Number A10a-T020: Topological Data Analysis and Wide
Area Detection of Chemical and Biological Contamination
10
Program Roadmap
Symbolic Conceptual
Salient
featuresBehaviors,
meaningEvents
Structures,
relationships Signal
magnitudes
More abstract
Major Themes
• Understanding of “Information Spectrum”:
- Data analytics & data fusion to extract meaningful
information
- Formulation of a new theory of information
- Find a computing science for this new theory
• Understanding how “information” can adaptively be used to
guide collection of data:
- Formalization & computation of information states
- Find a new control theory for information
11
Program Roadmap
Highlights of Some Projects
• Control of Information Collection and Fusion (akin to DDDAS)
• Data Analytics (beyond dimension reduction & classification)
- Ideal Point Topic Model (statistical modeling of data)
- Non-parametric Regression (algorithmic modeling)
- Topological Data Fusion (algorithmic modeling)
• Information & Computation
- Homotopical & Higher Algebraic Structures of Type Theory
Guiding Principles
• More than discovery of algorithms
• Understanding capabilities/limitations of scientific approaches
12
Control of Information Collection and Fusion
AFOSR FY2010 MURI TOPIC # 18
Control of Information Collection and Fusion
Goal: This MURI will forge a rigorous new perspective on
the joint control of multiple information sources of disparate
types to simultaneously achieve quantified informational and
physical objectives…
Innovative Ideas:
- Inference global information from local measurements
- Formulation of information states
- Study the relations between information states and
physical states
- Discovery of a new framework for control of information
• Jadbabaie
• Koditschek
• Kumar
• Ribeiro
• Berkeley
• Ramachandran
• Sastry
• Tomlin
• Minnesota
• Jindal
• Roumeliotis
• Illinois
• Baryshnikov
• Melbourne
• Howard
•Moran
13
Control of Information Collection and Fusion (Cont‟d)
Some Technical Tools:
Shape Spaces
Path Spaces Spaces of
classification
Piecewise Linear
(PL) Representation
14
[ Friedman, ArXiv „08]
Realization Functor
Simplicial
Set
Functor
PL Network
Attributes, Beliefs, Motivation
Agency, Biases, Data
Control of Information Collection and Fusion (Cont‟d)
Singular Set Functor
?
Analytic Network
?
Basic Questions: 1. Evolution of information states?
2. Open, closed loops?
3. Estimation & computation of information states?
15
Ideal Point Topic ModelPI: D. Blei (Princeton U), L. Carin (Duke U)
Motivation: The ideal point model is a model devised to discover voting patterns
Goals: 1. Predict missing votes (incomplete data) or preferences on new items
2. Infer legislative behaviors based on observed roll call data
Ideal Point Model
Observed roll
call data
Users/Voters
Item
s/B
ills
Limitation
16
Ideal Point Topic Model (Cont‟d)PI: D. Blei (Princeton U), L. Carin (Duke U)
New: Ideal Point Topic model incorporates topics that are automatically generated
Users/Voters
Item
s/B
ills
Predicted
Missing Votes
New Item
Goals: 1. Predict missing votes (incomplete data) or preferences on new items
2. Infer legislative behaviors based on observed roll call data
3. Associate topics with legislative behaviors
17
Ideal Point Topic Model (Cont‟d)PIs: D. Blei (Princeton U), L. Carin (Duke U)
Past Work: Topic Modeling (D. Blei et al., ~„03)
Goal: Given a corpus of documents, determine
the topics in it.
Ideas in Topic Modeling: The corpus is
consisted of documents, each of which is
consisted of topics which are groups of words
Discovered Topics
Number of words Number of
documents
Number of
topics
Assumptions: D, K, N are known a priori
18
Ideal Point Topic Model (Cont‟d)PIs: D. Blei (Princeton U), L. Carin (Duke U)
Innovation:
- A combination of generative and
discriminative probabilistic models is
used to automatically identify and
group variables based on observed
data.
- Modular design enables addition
of new and/or heterogeneous data
into the model to improve prediction.
General Applications:
- Versatile model for predicting
individuals and their preferences.
- Versatile model for many types of
digital data simultaneously.Generative Model &
de Finetti‟s Exchangeability
Discriminative Model
19
Non-parametric Regression in High Dimensions
PIs: M. Wainwright, B. Yu, G. Rashkutti (UCB)
Motivation: Non-parametric regression is one of the most important tools in
statistical learning theory which are used, as an “algorithmic model”, for data
analysis (evaluation, prediction, interpolation, etc.)
Goals: 1. Derive a result in high dimensions when only a smaller amount of data
samples is given.
2. Study information-theoretic limits.
Requirement: Efficient construction of provably guaranteed optimal solution
Example: Discovery of linkages in social networks
• p individuals are observed over n different events, yielding a data matrix X of
dimension n x p where n << p
• Unknown graph G with edges E specifies dependencies:
• Learn structure of G by solving a sequence of convex programs
20
High Dimension (p) Sparsity, Low-dimensional Structure (s)
Small Sample Size (n << p) Complexity of Algorithmic Model
Theme: Compressive SensingInnovative
Approach
Assume: unknown , where |S| = s and univariate
Procedure:
Non-parametric Regression in High Dimensions (Cont‟d)
PIs: M. Wainwright, B. Yu, G. Rashkutti (UCB)
21
Simulation & Empirical
Validation of Theorem
Note: in practice,
p >> n >> log (p)
asserts how much data is needed
Non-parametric Regression in High Dimensions (Cont‟d)
PIs: M. Wainwright, B. Yu, G. Rashkutti (UCB)
22
Topological Methods for Data FusionPI: G. Carlsson et al. (Stanford U)
Data Analysis
• probing local measurements of an
abstract entities in an environment;
quantification
• Source separation using different
sensing modes or measurements
• Semantic representation and reasoning
• Fusion or integration (local-to-global,
Gestalt), integration of modular
components
• Invariants of an object (Gestalt), Spaces
of more abstract structures
• Multi-scale interpretation
sheaves of sets, functions, maps,
mathematical structures on a topological
space; valuation
Maps between topological spaces and
their fibers
Formal semantics, logic, type theory
Gluing, colimits, invariants (spatial,
topological, metric, etc.)
Configuration spaces, moduli spaces
Persistent topology, multiscale analysis
Mathematical Tools
Goal: To fuse data sets for analytics tasks and information extraction
Idea: Pure mathematics offers “natural” tools at higher-level analysis
23
Topological Methods for Data Fusion (Cont‟d)
PI: G. Carlsson et al. (Stanford U)
Ontology Integration
Current Practices in Data Fusion or Integration:
1. Matching of points or concepts via geometric or graph methods.
2. Lack of ideas for merging previously matched concepts.
3. Ideas in integration of ontologies and database schemas are not
sufficient tools to handle digital data (a great deal of humans in
the loop).
4. Lack of objective qualitative or quantitative measurements of
performance.
24
Topological Methods for Data Fusion(Cont‟d)
PI: G. Carlsson (Stanford U)
Refinement & M-Alignment of
Ontologies (Mossakowski et al., „10)
Fusion of overlapped samples &
topological bootstrapping
Innovation: 1. Matching and merging of concepts and points in data
2. Analysis of the effects of maps between datasets
3. Introduction of refinement via multi-resolutions
4. Incorporation of ideas in statistical learning to improve
refinement
5. Introduction of means to ensure quality of performance
6. Analysis of possible obstructions to fusion
similarity
25
Homotopical & Higher Algebraic Structures of Type Theory
PI: S. Awodey (CMU)
Past Scientific Discoveries:
Computation
(Simply Typed λ-Calculus)
Types
Terms
Term Reductions
Propositional Logic
Formulas
Proofs
Transformations of Proofs
Curry-Howard
Correspondence
Type theory encodes logical structures & their interpretation
• Martin-Löf originated a constructive (dependent) type theory.
• Thierry Coquand implemented a dependent type theory Coq, a computer
verification system & proof assistant.
Some Applications: Interactive/automated theorem provers are used in
formal mathematical proof, formal verification, automated reasoning;
dependently typed assembly languages; semi-automated information retrieval
from data bases; proof-carrying code.
Goal: Discover a new framework for computation of information
26
Homotopical & Higher Algebraic Structures of Type Theory (Cont‟d)
PI: S. Awodey (CMU)
Recent Developments:
• Steve Awodey & Vladimir Voevodsky (IAS) independently
discovered a deep connection between dependent type theory &
homotopy theory.
• Some difficulties with dependent type theory can now be
circumvented and generalized for greater flexibility and consistency.
• Homotopy theory can be used as a semantic for type theory.
• Voevodsky is building up a math library using the Coq proof
assistant and his homotopy λ-Calculus.
Multidisciplinary Collaboration: mathematicians, logicians,
computer scientists, philosophers.
• Workshop in Germany, Feb – Mar 2011
• Special Program at IAS, 2012-2013
27
Program Summary
New shifts in the program:
• Infusing more ideas from computer science, mathematics, logic,
statistics, and engineering.
• Focusing on the forefront of data and information analytics:
New theory for information, perhaps, beyond Shannon‟s
Information Theory
New paradigm for computation with/on information, beyond
Turing‟s model
Understanding the connection between data and information,
and their control through formal models
• Forging close connection with other programs in RSL.
• Starting to reach out to AFRL/RH.
• Reaching out to the research community in Europe and Canada.