research exploring massive learning via a prediction system omid madani yahoo! research

45
Research Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research www.omadani.net

Upload: nathaniel-dean

Post on 21-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Exploring Massive Learning via a Prediction System

Omid Madani

Yahoo! Research www.omadani.net

Page 2: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Goal

Convey a taste of the: • motivations/considerations/

assumptions/speculations/hopes,…• The game, a 1st system, and its

algorithms

Page 3: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Talk Overview

1. Motivational part

2. The approach:• The game (categories, …)• Algorithms• Some experiments

Page 4: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Fill in the Blank(s)!

Would ---- like ------ ------- ----- ------ ?your coffee with sugaryou

Page 5: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

What is this object?

Page 6: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

“Well, categorization is one of the most basic functions of living creatures. We live in a categorized world – table, chair, male, female, democracy, monarchy – every object and event is unique, but we act towards them as members of classes.” From an interview with Eleanor Rosch (Psychologist, a

pioneer on the phenomenon of “basic level” concepts)

“Concepts are the glue that holds our mental world together.” From “The Big Book of Concepts”, Gregory Murphy

Categorization is Fundamental!

Page 7: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

“Rather, the formation and use of categories is the stuff of experience.”

Philosophy in the Flesh, Lakoff and Johnson.

Page 8: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

• Repeated and rapid classification…

• … in the presence of myriad classes

)1(x

classification system

)2(x

In the presence of myriad categories:1. How to categorize efficiently?2. How to efficiently learn to

categorize efficiently?

x ?

Two Questions Arise

Page 9: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Now, a 3rd Question ..

• How can so many inter-related categories be acquired?

• Programming them unlikely to be successful/scale:• Limits of our explicit/conscious knowledge• Unknown/unfamiliar domains• The required scale..• Making the system operational..

Page 10: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Learn? … How?

• “Supervised” learning (explicit human involvement) likely inadequate:• Required scale, or a good sign post:

• ~millions of categories and beyond..• Billions of weights, and beyond..

• Inaccessible “knowledge” (see last slide!)• Other approaches likely do not meet the

needs (incomplete, different goals, etc): active learning, semi-supervised learning, clustering, density learning, RL, etc..

Page 11: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearchDesiderata/Requirements(or Speculations)

• Higher intelligence, such as advanced “advanced” pattern recognition/generation (e.g. vision), may require• Long term learning (weeks, months, years,…)• Cumulative learning (learn these first, then these,

then these,…)• Massive Learning: Myriad inter-related

categories/concepts• Systems learning• Autonomy (relatively little human involvement)

What’s the learning task?

?

Page 12: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

This Work: An Exploration

• An avenue: “prediction games in infinitely rich worlds”

• Exciting part: • World provides unbounded learning opportunity!

(world is the validator, the system is the experimenter!.. and actively builds much of its own concepts)

• World enjoys many regularities (e.g. “hierarchical”)• Based in part on “supervised” techniques!! (“discriminative”, “feedback driven”,

supervisory signal doesn’t originate from humans )

Page 13: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearchIn a Nutshell

Prediction System

…. 0011101110000….

After a while(much learning)

predict observe & update

Prediction System

observe & updatepredict

low level or “hard-wired” categories

higher level categories(bigger chunks)

(Text: characters, .. Vision: edges, curves,…)

(e.g. words, digits, phrases, phone numbers, faces, visual objects, home pages, sites,…)

Page 14: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

The Game

• Repeat • Hide part(s) of the stream• Predict (use context)• Update• Move on

• Objective: predict better ... subject to efficiency constraints

• In the process: categories at different levels of size and abstraction should be learned

Page 15: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Research Goals

• Conjecture: There is much value to be attained from this task

• Beyond language modeling: more advanced pattern recognition/generation

• If so, should yield a wealth of new problems (=> Fun)

Page 16: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Overview

• Goal: Convey a taste of the motivations/considerations, the system and algorithms,..

• Motivation• The approach:

• The game (categories, …)• Algorithms• Some experiments

Page 17: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Upshot

• Takes streams of text• Make categories (strings)• Approx three hours on 800k

documents• Large-scale discriminative

learning (evidence better than than language modeling)

Page 18: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Caveat Emptor!

• Exploratory research

• Many open problems (many I’m not aware of … )

• Chosen algorithms, system org, or objective/performance measures, etc., etc… are likely not even near the best possible

Page 19: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Categories

• Building blocks (atoms!) of intelligence?

• Patterns that frequently occur• External • Internal..• Useful for predicting other categories!• They can have structure/regularities, in

particular:1. Composition (~conjunctions) of other categories (Part-Of)2. Grouping (~disjunctions)(Is-A relations)

Page 20: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Categories

• Low level “primitive” examples: 0 and 1 or characters (“a”, “b”, .. ,“0”, “-”,..) • Provided to the system (easy to detect)

• Higher/composite levels:• Sequence of bits/characters• Words• Phrases• More general: Phone number, contact

info, resume, ...

Page 21: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Example Concept

• Area code is a concept that involves both composition and grouping:• Composition of 3 digits• A digit is a grouping, i.e., the set {0,1,2,

…,9} ( 2 is a digit )

• Other example concepts: phone number, address, resume page, face (in visual domain), etc.

Page 22: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Again, our goal, informally, is to build a system that acquires millions of useful concepts on its own.

Page 23: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Questions for a First System

• Functionality? Architecture? Org?• Would many-class learning scale

to millions of concepts?• Choice of concept building

methods? • How would various learning

processes interact?

Page 24: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Expedition: a First System

• Plays the game in text

• Begins at character level

• No segmentation, just a stream

• Makes and predicts larger sequences, via composition

• No grouping yet

Page 25: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

… New Jersey in …

predictors (active categories)

window containing contextand target

target (category to predict)

… New Jersey in …

next time step

predictors

target

Learning Episodes

In this example, context contains one category on each side

Page 26: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

… loves New York life …

predictors

window containing contextand target

target (category to predict)

.. Some Time Later ..

In terms of supervised learning/classification, in this learning activity (prediction games):• The set of concepts grows over time• Same for features/predictors (concepts ARE the predictors!)• Instance representation (segmentation of the data stream) changes/grows over time ..

Page 27: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Prediction/Recall

}f,f{x 32

1. Features are “activated”

features categories

c1

c2

c3

c4

c5

f1

f2

f3

f42. Edges are activated

3. Receiving categories are activated4. Categories sorted/ranked

).,c(),.,c(),.,c(),.,c(

:list sorted

10104050 1534

40.

30.20.

10.

10.

1. Like use of inverted indices2. Sparse dot products

Page 28: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearchUpdating a Feature’s Connectionsfeatures categories

c1

c2

c3

c4

c5

f1

f2

f3

f4

3

2

Cx

xf

1. Identify connection

2. Increase weight

3. Normalize/weaken weights

4. Drop tiny weights

Degrees are constrained

10 ,1

][, :updatesuch One ,

,

whereccw

wc xcfcf

Kronecker delta

Page 29: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

“ther ”

Example Category Node (from Jane Austen’s)

“and ”

“heart”

“nei”

“toge”

“ far”

“ bro”

0.087

0.07

0.057

0.052

0.13

0.11

“love ”0.10

“by ”

A category nodes keeps track of various weights, such as edge (or prediction) weights, and predictiveness weights, and other statistics (e.g. frequency,

first/last time seen), and updates them when it is activated as a predictor or target..

7.1 0.41(keep local statistics)

prediction weights

categories appearing before

Page 30: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch Network

• Categories and their edges form a network(a directed weighted graph, with different kinds of edges ... )

• The network grows over time: millions of nodes and beyond

Page 31: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

When and How to Compose?

• Two major approaches: 1. Pre-filter: don’t compose if certain

conditions are not met (simplest: only consider possibilities that you see)

2. Post-filter: compose and use, but remove if certain conditions are not met (e.g. if not seen recently enough, remove)

• I expect both are needed …

Page 32: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearchSome Composition (Prefilter) Heuristics

• FRAC: If you see c1 then c2 in the stream, then, with some probability, add c=c1c2

• MU: use the pointwise mutual

information between c1 and c2

• IMPROVE: take string lengths into account

and see whether joining is better

• BOUND: Generate all strings under length Lt.

)(

)|(

2

12

cp

ccp

Page 33: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Prediction Objective

• Desirable: learn higher level categories (bigger/abstract categories are useful externally)

• Question: how does this relate to improving predictions?

1. Higher level categories improve “context” and can save memory

2. Bigger, save time in playing the game (categories are atomic)

Page 34: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Objective (evaluation criterion)

• The Matching Performance:

Number of bits (characters) correctly predicted per unit time or per prediction

action

• Subject to constraints (space, time,..)• How about entropy/perplexity? Categories are structured, so perplexity

seems difficult to use..

Page 35: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearchLinearity and Non-Linearity (a motivation for new concept creation)

n

e

w

new

Versus Which one predicts better?(better constrains what comes next)

Aggregate the votes of“n”, “e”, and “w” to predict

what comes next

new????

Page 36: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Data

• Reuters RCV1 800k news articles• Several online books of Jane Austen,

etc.• Web search query logs

Page 37: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Some Observations

• Ran on Reuters RCV1 (text body) ( simply zcat dir/file* )

• ~800k articles• >= 150 million learning/prediction

episodes• Over 10 million categories built• 3-4 hours each pass (depends on

parameters)

Page 38: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Observations• Performance on held out (one of the

Reuters files):• 8-9 characters long to predict on average• Almost two characters correct on

average, per prediction action

• Can overfit/memorize! (long categories)

• Current: stop category generation after first pass

Page 39: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Page 40: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch Some Example Categories(in order of first time

appearance and increasing length)cat name= "<" cat name= " t" cat name= ".</" cat name= "p>- " cat name= " the " cat name= "ation " cat name= "of the " cat name= "ing the " cat name= "&quot;The " cat name= "company said " cat name= ", the company " cat name= "said on Tuesday" cat name= " said on Tuesday" cat name= ",&quot; said one " cat name= ",&quot; he said.</p> cat name= "--------------------------------" cat name= "--------------------------------------------------------" cat name= "---------------------------------------------------------------</p> cat name= ". Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "press on Tuesday. Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "press on Thursday. Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "press on Wednesday. Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "within 10 percentage points in either direction of the key 225-share Nikkei average over the next six

month" cat name= "ing and selling rates for leading world currencies and gold against the dollar on the London foreign

exchange and bullion "

Page 41: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearchExample “Recall” Paths

From processing one month of Reuters:

"Sinn Fei" (0.128) "n a seat" (0.527) " in the " (0.538) "talks." (0.468) "</p>

<p>B" (0.0185) "rokers " **** The end: connection weight less than: 0.04

" Gas in S" (1) "cotland" (1.04) " and north" (1.18) "ern E"(0.572) "ngland" (0.165) ",&quot; a " (0.0542) "spokeswo" (0.551)

"mansaid " (0.044) "the idea" (0.0869) " was to " (0.144) "quot" (0.164)"e the d" (0.0723) "ivision" (0.0671) " in N" (0.397) "ew York"(0.062) " where " (0.0557) "the main " (0.0474) "marque" (0.229) "swere " (0.253) "base" (0.264) "d. &quot;" (0.0451) "It will " (0.117)"certain" (0.0691) "ly b" (0.0892) "e New " (0.353) "York" (0.112) "party" (0.0917) "s is goin" (0.559) "g to " (0.149) "end.&quot;"(0.239) "</p> <p>T" (0.104) "wedish " (0.125) "Export" (0.0211) "Credi" **** The end: connection weight less than: 0.04

Page 42: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearchSearch Query Logs

"bureoofi" (1) "migration" (1.13) "andci" (1.04) "tizenship." (0.31) "com

www," (0.11) "ictions" (0.116) "zenship." **** The end: this concept wasn't seen in last 1000000 time points.

Random Recall:"bureoofi" (1) "migration" (0.0129) "dept.com"

**** The end: this concept wasn't seen in last 1000000 time points.

Page 43: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Much Related Work!

• Online learning, cumulative learning, feature and concept induction, neural networks, clustering, Bayesian methods, language modeling, deep learning, “hierarchical” learning, importance/ubiquity of predictions/anticipations in the brain (“On Intelligence”, “natural computations”,…), models of neocortex (“circuits of the mind”), concepts and conceptual phenomena (e.g. “big book of concepts”), compression, ….

Page 44: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Summary

• Large-scale learning and classification (data hungry, efficiency paramount)

• A systems approach: Integration of multiple learning processes

• The system makes it own classes• Driving objective: Improve prediction

(currently: “matching” performance)• The underlying goal: effectively acquire

complex concepts• See www.omadani.net

Page 45: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research

ResearchResearch

Current/Future

• Much work:• Integrate learning of groupings• Recognize/use “structural” categories? (learn

to “parse”/segment?)• Prediction objective.. ok?• Control over input stream, etc..• Category generation.. What are good

methods?• Other domains (vision,…)

• Compare: language modeling, etc