research exploring massive learning via a prediction system omid madani yahoo! research
TRANSCRIPT
![Page 1: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/1.jpg)
ResearchResearch
Exploring Massive Learning via a Prediction System
Omid Madani
Yahoo! Research www.omadani.net
![Page 2: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/2.jpg)
ResearchResearch
Goal
Convey a taste of the: • motivations/considerations/
assumptions/speculations/hopes,…• The game, a 1st system, and its
algorithms
![Page 3: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/3.jpg)
ResearchResearch
Talk Overview
1. Motivational part
2. The approach:• The game (categories, …)• Algorithms• Some experiments
![Page 4: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/4.jpg)
ResearchResearch
Fill in the Blank(s)!
Would ---- like ------ ------- ----- ------ ?your coffee with sugaryou
![Page 5: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/5.jpg)
ResearchResearch
What is this object?
![Page 6: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/6.jpg)
ResearchResearch
“Well, categorization is one of the most basic functions of living creatures. We live in a categorized world – table, chair, male, female, democracy, monarchy – every object and event is unique, but we act towards them as members of classes.” From an interview with Eleanor Rosch (Psychologist, a
pioneer on the phenomenon of “basic level” concepts)
“Concepts are the glue that holds our mental world together.” From “The Big Book of Concepts”, Gregory Murphy
Categorization is Fundamental!
![Page 7: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/7.jpg)
ResearchResearch
“Rather, the formation and use of categories is the stuff of experience.”
Philosophy in the Flesh, Lakoff and Johnson.
![Page 8: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/8.jpg)
ResearchResearch
• Repeated and rapid classification…
• … in the presence of myriad classes
)1(x
classification system
)2(x
In the presence of myriad categories:1. How to categorize efficiently?2. How to efficiently learn to
categorize efficiently?
x ?
Two Questions Arise
![Page 9: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/9.jpg)
ResearchResearch
Now, a 3rd Question ..
• How can so many inter-related categories be acquired?
• Programming them unlikely to be successful/scale:• Limits of our explicit/conscious knowledge• Unknown/unfamiliar domains• The required scale..• Making the system operational..
![Page 10: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/10.jpg)
ResearchResearch
Learn? … How?
• “Supervised” learning (explicit human involvement) likely inadequate:• Required scale, or a good sign post:
• ~millions of categories and beyond..• Billions of weights, and beyond..
• Inaccessible “knowledge” (see last slide!)• Other approaches likely do not meet the
needs (incomplete, different goals, etc): active learning, semi-supervised learning, clustering, density learning, RL, etc..
![Page 11: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/11.jpg)
ResearchResearchDesiderata/Requirements(or Speculations)
• Higher intelligence, such as advanced “advanced” pattern recognition/generation (e.g. vision), may require• Long term learning (weeks, months, years,…)• Cumulative learning (learn these first, then these,
then these,…)• Massive Learning: Myriad inter-related
categories/concepts• Systems learning• Autonomy (relatively little human involvement)
What’s the learning task?
?
![Page 12: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/12.jpg)
ResearchResearch
This Work: An Exploration
• An avenue: “prediction games in infinitely rich worlds”
• Exciting part: • World provides unbounded learning opportunity!
(world is the validator, the system is the experimenter!.. and actively builds much of its own concepts)
• World enjoys many regularities (e.g. “hierarchical”)• Based in part on “supervised” techniques!! (“discriminative”, “feedback driven”,
supervisory signal doesn’t originate from humans )
![Page 13: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/13.jpg)
ResearchResearchIn a Nutshell
Prediction System
…. 0011101110000….
After a while(much learning)
predict observe & update
Prediction System
observe & updatepredict
low level or “hard-wired” categories
higher level categories(bigger chunks)
(Text: characters, .. Vision: edges, curves,…)
(e.g. words, digits, phrases, phone numbers, faces, visual objects, home pages, sites,…)
![Page 14: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/14.jpg)
ResearchResearch
The Game
• Repeat • Hide part(s) of the stream• Predict (use context)• Update• Move on
• Objective: predict better ... subject to efficiency constraints
• In the process: categories at different levels of size and abstraction should be learned
![Page 15: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/15.jpg)
ResearchResearch
Research Goals
• Conjecture: There is much value to be attained from this task
• Beyond language modeling: more advanced pattern recognition/generation
• If so, should yield a wealth of new problems (=> Fun)
![Page 16: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/16.jpg)
ResearchResearch
Overview
• Goal: Convey a taste of the motivations/considerations, the system and algorithms,..
• Motivation• The approach:
• The game (categories, …)• Algorithms• Some experiments
![Page 17: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/17.jpg)
ResearchResearch
Upshot
• Takes streams of text• Make categories (strings)• Approx three hours on 800k
documents• Large-scale discriminative
learning (evidence better than than language modeling)
![Page 18: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/18.jpg)
ResearchResearch
Caveat Emptor!
• Exploratory research
• Many open problems (many I’m not aware of … )
• Chosen algorithms, system org, or objective/performance measures, etc., etc… are likely not even near the best possible
![Page 19: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/19.jpg)
ResearchResearch
Categories
• Building blocks (atoms!) of intelligence?
• Patterns that frequently occur• External • Internal..• Useful for predicting other categories!• They can have structure/regularities, in
particular:1. Composition (~conjunctions) of other categories (Part-Of)2. Grouping (~disjunctions)(Is-A relations)
![Page 20: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/20.jpg)
ResearchResearch
Categories
• Low level “primitive” examples: 0 and 1 or characters (“a”, “b”, .. ,“0”, “-”,..) • Provided to the system (easy to detect)
• Higher/composite levels:• Sequence of bits/characters• Words• Phrases• More general: Phone number, contact
info, resume, ...
![Page 21: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/21.jpg)
ResearchResearch
Example Concept
• Area code is a concept that involves both composition and grouping:• Composition of 3 digits• A digit is a grouping, i.e., the set {0,1,2,
…,9} ( 2 is a digit )
• Other example concepts: phone number, address, resume page, face (in visual domain), etc.
![Page 22: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/22.jpg)
ResearchResearch
Again, our goal, informally, is to build a system that acquires millions of useful concepts on its own.
![Page 23: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/23.jpg)
ResearchResearch
Questions for a First System
• Functionality? Architecture? Org?• Would many-class learning scale
to millions of concepts?• Choice of concept building
methods? • How would various learning
processes interact?
![Page 24: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/24.jpg)
ResearchResearch
Expedition: a First System
• Plays the game in text
• Begins at character level
• No segmentation, just a stream
• Makes and predicts larger sequences, via composition
• No grouping yet
![Page 25: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/25.jpg)
ResearchResearch
… New Jersey in …
predictors (active categories)
window containing contextand target
target (category to predict)
… New Jersey in …
next time step
predictors
target
Learning Episodes
In this example, context contains one category on each side
![Page 26: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/26.jpg)
ResearchResearch
… loves New York life …
predictors
window containing contextand target
target (category to predict)
.. Some Time Later ..
In terms of supervised learning/classification, in this learning activity (prediction games):• The set of concepts grows over time• Same for features/predictors (concepts ARE the predictors!)• Instance representation (segmentation of the data stream) changes/grows over time ..
![Page 27: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/27.jpg)
ResearchResearch
Prediction/Recall
}f,f{x 32
1. Features are “activated”
features categories
c1
c2
c3
c4
c5
f1
f2
f3
f42. Edges are activated
3. Receiving categories are activated4. Categories sorted/ranked
).,c(),.,c(),.,c(),.,c(
:list sorted
10104050 1534
40.
30.20.
10.
10.
1. Like use of inverted indices2. Sparse dot products
![Page 28: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/28.jpg)
ResearchResearchUpdating a Feature’s Connectionsfeatures categories
c1
c2
c3
c4
c5
f1
f2
f3
f4
3
2
Cx
xf
1. Identify connection
2. Increase weight
3. Normalize/weaken weights
4. Drop tiny weights
Degrees are constrained
10 ,1
][, :updatesuch One ,
,
whereccw
wc xcfcf
Kronecker delta
![Page 29: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/29.jpg)
ResearchResearch
“ther ”
Example Category Node (from Jane Austen’s)
“and ”
“heart”
“nei”
“toge”
“ far”
“ bro”
0.087
0.07
0.057
0.052
0.13
0.11
“love ”0.10
“by ”
A category nodes keeps track of various weights, such as edge (or prediction) weights, and predictiveness weights, and other statistics (e.g. frequency,
first/last time seen), and updates them when it is activated as a predictor or target..
7.1 0.41(keep local statistics)
prediction weights
categories appearing before
![Page 30: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/30.jpg)
ResearchResearch Network
• Categories and their edges form a network(a directed weighted graph, with different kinds of edges ... )
• The network grows over time: millions of nodes and beyond
![Page 31: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/31.jpg)
ResearchResearch
When and How to Compose?
• Two major approaches: 1. Pre-filter: don’t compose if certain
conditions are not met (simplest: only consider possibilities that you see)
2. Post-filter: compose and use, but remove if certain conditions are not met (e.g. if not seen recently enough, remove)
• I expect both are needed …
![Page 32: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/32.jpg)
ResearchResearchSome Composition (Prefilter) Heuristics
• FRAC: If you see c1 then c2 in the stream, then, with some probability, add c=c1c2
• MU: use the pointwise mutual
information between c1 and c2
• IMPROVE: take string lengths into account
and see whether joining is better
• BOUND: Generate all strings under length Lt.
)(
)|(
2
12
cp
ccp
![Page 33: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/33.jpg)
ResearchResearch
Prediction Objective
• Desirable: learn higher level categories (bigger/abstract categories are useful externally)
• Question: how does this relate to improving predictions?
1. Higher level categories improve “context” and can save memory
2. Bigger, save time in playing the game (categories are atomic)
![Page 34: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/34.jpg)
ResearchResearch
Objective (evaluation criterion)
• The Matching Performance:
Number of bits (characters) correctly predicted per unit time or per prediction
action
• Subject to constraints (space, time,..)• How about entropy/perplexity? Categories are structured, so perplexity
seems difficult to use..
![Page 35: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/35.jpg)
ResearchResearchLinearity and Non-Linearity (a motivation for new concept creation)
n
e
w
new
Versus Which one predicts better?(better constrains what comes next)
Aggregate the votes of“n”, “e”, and “w” to predict
what comes next
new????
![Page 36: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/36.jpg)
ResearchResearch
Data
• Reuters RCV1 800k news articles• Several online books of Jane Austen,
etc.• Web search query logs
![Page 37: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/37.jpg)
ResearchResearch
Some Observations
• Ran on Reuters RCV1 (text body) ( simply zcat dir/file* )
• ~800k articles• >= 150 million learning/prediction
episodes• Over 10 million categories built• 3-4 hours each pass (depends on
parameters)
![Page 38: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/38.jpg)
ResearchResearch
Observations• Performance on held out (one of the
Reuters files):• 8-9 characters long to predict on average• Almost two characters correct on
average, per prediction action
• Can overfit/memorize! (long categories)
• Current: stop category generation after first pass
![Page 39: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/39.jpg)
ResearchResearch
![Page 40: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/40.jpg)
ResearchResearch Some Example Categories(in order of first time
appearance and increasing length)cat name= "<" cat name= " t" cat name= ".</" cat name= "p>- " cat name= " the " cat name= "ation " cat name= "of the " cat name= "ing the " cat name= ""The " cat name= "company said " cat name= ", the company " cat name= "said on Tuesday" cat name= " said on Tuesday" cat name= "," said one " cat name= "," he said.</p> cat name= "--------------------------------" cat name= "--------------------------------------------------------" cat name= "---------------------------------------------------------------</p> cat name= ". Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "press on Tuesday. Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "press on Thursday. Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "press on Wednesday. Reuters has not verified these stories and does not vouch for their accuracy.</p> cat name= "within 10 percentage points in either direction of the key 225-share Nikkei average over the next six
month" cat name= "ing and selling rates for leading world currencies and gold against the dollar on the London foreign
exchange and bullion "
![Page 41: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/41.jpg)
ResearchResearchExample “Recall” Paths
From processing one month of Reuters:
"Sinn Fei" (0.128) "n a seat" (0.527) " in the " (0.538) "talks." (0.468) "</p>
<p>B" (0.0185) "rokers " **** The end: connection weight less than: 0.04
" Gas in S" (1) "cotland" (1.04) " and north" (1.18) "ern E"(0.572) "ngland" (0.165) "," a " (0.0542) "spokeswo" (0.551)
"mansaid " (0.044) "the idea" (0.0869) " was to " (0.144) "quot" (0.164)"e the d" (0.0723) "ivision" (0.0671) " in N" (0.397) "ew York"(0.062) " where " (0.0557) "the main " (0.0474) "marque" (0.229) "swere " (0.253) "base" (0.264) "d. "" (0.0451) "It will " (0.117)"certain" (0.0691) "ly b" (0.0892) "e New " (0.353) "York" (0.112) "party" (0.0917) "s is goin" (0.559) "g to " (0.149) "end.""(0.239) "</p> <p>T" (0.104) "wedish " (0.125) "Export" (0.0211) "Credi" **** The end: connection weight less than: 0.04
![Page 42: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/42.jpg)
ResearchResearchSearch Query Logs
"bureoofi" (1) "migration" (1.13) "andci" (1.04) "tizenship." (0.31) "com
www," (0.11) "ictions" (0.116) "zenship." **** The end: this concept wasn't seen in last 1000000 time points.
Random Recall:"bureoofi" (1) "migration" (0.0129) "dept.com"
**** The end: this concept wasn't seen in last 1000000 time points.
![Page 43: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/43.jpg)
ResearchResearch
Much Related Work!
• Online learning, cumulative learning, feature and concept induction, neural networks, clustering, Bayesian methods, language modeling, deep learning, “hierarchical” learning, importance/ubiquity of predictions/anticipations in the brain (“On Intelligence”, “natural computations”,…), models of neocortex (“circuits of the mind”), concepts and conceptual phenomena (e.g. “big book of concepts”), compression, ….
![Page 44: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/44.jpg)
ResearchResearch
Summary
• Large-scale learning and classification (data hungry, efficiency paramount)
• A systems approach: Integration of multiple learning processes
• The system makes it own classes• Driving objective: Improve prediction
(currently: “matching” performance)• The underlying goal: effectively acquire
complex concepts• See www.omadani.net
![Page 45: Research Exploring Massive Learning via a Prediction System Omid Madani Yahoo! Research](https://reader035.vdocuments.mx/reader035/viewer/2022070415/5697c0051a28abf838cc5169/html5/thumbnails/45.jpg)
ResearchResearch
Current/Future
• Much work:• Integrate learning of groupings• Recognize/use “structural” categories? (learn
to “parse”/segment?)• Prediction objective.. ok?• Control over input stream, etc..• Category generation.. What are good
methods?• Other domains (vision,…)
• Compare: language modeling, etc