Transcript
Page 1: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

1

NSF-Relevant Challenges in Computational Intelligence

Jaime Carbonell ([email protected])& Tom Mitchell, Guy Bleloch, Randy Bryant, et al

School of Computer ScienceCarnegie Mellon University

26-April-2007

I) Major Computational Intelligence Research Areas

II) Next-Generation Infrastructure (DISC)

Page 2: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

2

Computational Intelligence• Machine Learning

Inductive learning algorithms, active leraning Data mining & novel pattern detection

• Language Technologies Multilingual & next-veneration search engines Machine translation (e.g. Arabic English)

• Perception Computer vision, tactile sensing (e.g., in robotics)

• Planning & optimizing Reasoning & planning under uncertainty Non-linear optimization (beyond O. R.) w/uncertainty

• Key scientific applications Proteomics, genomics, computational biology Modeling human brain functions

Page 3: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

3

Machine Learning

Object recognition

Data Mining

Speech Recognition

Automated Control learning

• Reinforcement learning

• Predictive modeling

• Pattern discovery

• Hidden Markov models

• Convex optimization

• Explanation-based learning

• ....

Extracting facts from text

Page 4: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

4

Influenza cultures

Sentinel physicians

WebMD queries about ‘cough’ etc.

School absenteeism

Sales of cough and cold meds

Sales of cough syrup

ER respiratory complaints

ER ‘viral’ complaints

Influenza-related deaths

Week (1999-2000))

Leveraging Existing Data Collecting Systems1999 Influenza outbreak

[Moore, 2002]

Page 5: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

5

Cluster Evolution and Density Change Detection: d2F(r(t))/dt2

Constant Event New Unobfuscated Event

New Obfuscated Event Growing Event

Page 6: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

6

Classifier = Rocchio, Topic = Civil War (R76 in TREC10), Threshold = MLR

MLR threshold function: locally linear, globally non-linear

Page 7: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

7

Info-Age Bill of Rights

• Get the rightright information

• To the right people

• At the right time

• On the right medium

• In the right language

• With the right level of detail

Search Engines

Personalization

Anticipatory Analysis

Speech Recognition

Machine Translation

Summarization

Page 8: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

8

MMR vs Current Search Engines

query

documents

MMR

IR

λ controls spiral curl

Page 9: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

9

Types of Machine Translation Interlingua

Syntactic Parsing

Semantic Analysis

Sentence Planning

Text Generation

Source (Arabic)

Target(English)

Transfer Rules

Direct: SMT, EBMTRequires Massive

Massive Data Resources

Page 10: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

10

2005 NIST Arabic-English MT

• Interlingual MTGrammars, semanticsBest for focused

domains

• Corpus-Based MTPre-translated text (10-

200M words)Target language text

(100M – 1 Trillon words)

Best for general MT

• Context-Based MT Improved variant of

corpus-based MTPerfect client for DISC

BLEU Score

0.6

0.5

0.4

0.3

0.2

0.1

0.0

GoogleISIIBM + CMUUMDJHU-CUEdinburgh

Systran

Mitre

FSC

0.7

TopicIdentification

Human Edittabletranslation

Usabletranslation

Expert Humantranslator

Useless Region

Page 11: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

11

Arabic Statistical-MT Outputجميع / / 17بكين وروس صينيون مسئولون حث شينخوا يناير

الهدوء " التزام علي المعنية بشان " االطراف النفس ضبط وممارسةالشعبية . الديمقراطية كوريا بجمهورية الخاصة النووية القضية

الخارجية وزير ونائب تشانغ ون يانغ الصيني الخارجية وزير نائب التقي وقدالكسندر الروسي

مواصلة الي المعنية االطراف دعيا حيث غداء مادبة علي لوسيوكوفالسلمي الحل اجل من السعي

الحالي . المعقد الوضع ظل في الحوار خالل من

Beijing January 17 / Shinhua / the Chinese and Russian officials urged all parties concerned to " remain calm and exercise restraint " over the nuclear issue of the Democratic People's Republic of Korea.

He met with vice Chinese foreign minister Yang Chang won the deputy of the Russian foreign minister Alexander Losyukov at a lunch with invited interested parties to continue the search for a peaceful solution through dialogue under the current complicated situation.

BLEU = .64

Page 12: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

12

What About Minor Languages or Dialects without Massive

Data?

Page 13: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

13

Primary SequenceMNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA

3D Structure

Folding

Complex function within network of proteins

Normal

PROTEINSSequence Structure Function

(Borrowed from: Judith Klein-Seetharaman)

Page 14: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

14

Primary SequenceMNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA

3D Structure

Folding

Complex function within network of proteins

Disease

PROTEINSSequence Structure Function

Page 15: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

15

Predicting Protein Structures• Protein Structure is a key determinant of protein function

• Crystalography to resolve protein structures experimentally in-vitro is very expensive, NMR can only resolve very-small proteins

• The gap between the known protein sequences and structures: 3,023,461 sequences v.s. 36,247 resolved structures (1.2%) Therefore we need to predict structures in-silico

Page 16: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

16

Linked Segmentation CRF

• Node: secondary structure elements and/or simple fold• Edges: Local interactions and long-range inter-chain and

intra-chain interactions• L-SCRF: conditional probability of y given x is defined as

, , ,

1 1 , , ,,

1( ,..., | ,..., ) exp( ( , )) exp( ( , , , ))

i j G i j a b G

R R k k i i j l k i a i j a bV k lE

P f g yZ

y y y

y y x x x y x x y

Joint Labels

Page 17: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

17

Fold Alignment Prediction: β-Helix• Predicted alignment for known β -helices on cross-family

validation

Page 18: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

18

fMRI to observe human brain activity

Machine learning to discover patterns in complex data

New discoveries about human brain function

Our algorithms have learned to distinguish whether a human subject is reading a word

e.g. ‘tools’ or ‘buildings’ with 90% accuracy

Data

Page 19: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

19

Requisite Infrastructure

• Data Intensive SuperComputing (DISC) for tera-scale and peta-scale data repositories

• Advanced algorithms researchMassively-parallel decompositionScalability in analytics & learningExtracting compact models for run-timePlanning, reasoning, learning w/uncertainty) Active Learning (maximally reducing uncertainty)

• Domain expertise (e.g. proteomics, neural sciences, astronomy, network security, …)

Page 20: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

20

System Comparison: Data

System collects and maintains data

• Shared, active data setComputation

colocated with storage• Faster access

Data stored in separate repository

• No support for collection or management

Brought into system for computation

• Time consuming• Limits interactivity

System System

DISC Conventional Supercomputers

Page 21: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

21

Program Model Comparison

Application programs written in terms of high-level operations on data

Runtime system controls scheduling, load balancing, …

Programs described at very low level• Specify detailed control of processing &

communications

Rely on small # of software packages• Written by specialists

• Limits classes of problems & solution methods

DISC Conventional Supercomputers

Hardware

Machine-DependentProgramming Model

SoftwarePackages

ApplicationPrograms

Hardware

Machine-IndependentProgramming Model

RuntimeSystem

ApplicationPrograms

Page 22: NSF-Relevant Challenges  in Computational Intelligence

Carnegie MellonSchool of Computer Science

22

Final Thoughts

• Opportunities in Computational IntelligenceMachine learning for tough problems: relevant novelty

detection, structural learning, active learningScientific applications: Computational X (X=biology,

linguistics, astrophysics, chemistry, …)

• Next generation computational infrastructureDISC principle (beyond HPC, beyond grid, …)Algorithmic fundamentals

• International programs (on common problems)


Top Related