short course robust optimization and machine learning [.2 ...people.eecs.berkeley.edu › ~elghaoui...
Post on 04-Jul-2020
1 Views
Preview:
TRANSCRIPT
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Short CourseRobust Optimization and Machine Learning
Lecture 7: Sparse Machine Learning for Text Analytics
Laurent El Ghaoui
EECS and IEOR DepartmentsUC Berkeley
Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012
January 19, 2012
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Outline
Information Overload
Sparse Machine Learning
Topic imagingDynamic imagesCross-language imaging
ASRS Study
References
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Outline
Information Overload
Sparse Machine Learning
Topic imagingDynamic imagesCross-language imaging
ASRS Study
References
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Information Overload
Avalanche of “information” in text format, e.g.I News articles, press releases, RSS feeds, TV captioning data.I 10-K filings, marketing brochures, financial analyst reports, and
other company-related documents.I Consumer reviews, blogs, emails, and other social media content.I Scientific papers, patents, law documents, bills, medical reports,
literature.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Information Overload
Avalanche of “information” in text format, e.g.I News articles, press releases, RSS feeds, TV captioning data.I 10-K filings, marketing brochures, financial analyst reports, and
other company-related documents.I Consumer reviews, blogs, emails, and other social media content.I Scientific papers, patents, law documents, bills, medical reports,
literature.
The top 20 most important news sources have generated ∼ 40,000news articles yesterday.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
What might be useful?I Summarize large text databases.I Detect and visualize trends in term usage.I Compare how topics of interest are treated across different
sources.I Group similar text documents.I Provide interpretable visualizations.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
What might be useful?I Summarize large text databases.I Detect and visualize trends in term usage.I Compare how topics of interest are treated across different
sources.I Group similar text documents.I Provide interpretable visualizations.
Approach: sparse machine learning tools to help in these tasks.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Outline
Information Overload
Sparse Machine Learning
Topic imagingDynamic imagesCross-language imaging
ASRS Study
References
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Sparse Machine Learning
Let X be the term-by-document matrix, y a label vector.I Cardinality-penalized least-squares :
minw,b‖X T w + b1− y‖2
2 + λCard(w)
I Cardinality-penalized low-rank approximations (or, sparse PCA):
minw,v‖X − wvT‖F + λCard(w) + µCard(v).
Despite the hardness of these problems, we can solve themheuristically, at very high scale.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Why are these problems scalable?
Both of these problems allow to eliminate features (or documents)prior to solving the problem at very cheap cost, in an embarrassinglyparallel way. This is known as a SAFE elimination procedure.
For example, for the sparse low-rank approximation problem it can beproven that no feature appears if the corresponding variance is lessthan the penalty parameter λ.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
SAFE for sparse PCA
The sparse PCA problem can be expressed as (n = number offeatures)
maxz : ‖z‖2=1
n∑i=1
max((xT
i z)2 − λ, 0).
I xi is the data for the i-th feature (i-th column of matrix X ).I For no cardinality penalty (λ = 0), reduces to an eigenvalue
problem.I When λ > 0, i-th feature can be removed safely when its
variance = ‖xi‖2 < λ.
In our text applications, cardinality penalty λ is very high , allowing togreatly reduce the size of the problem.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
SAFE for the LASSO
LASSO is a convex approximation to cardinality-penalizedleast-squares. There is a SAFE procedure for it.
0.05 0.07 0.1 0.2 0.3 0.4 0.6 0.8 10
10
100
1,000
10,000
100,000140,000
!/!max
Fe
atu
re
s
Original
M
Lasso
λ = 0.04λmax
25LF M = 1, 000 LF ≤ 1, 000
PubMed data has 3M documents,150K words in dictionary. SAFE forLASSO brings downs that numberto 1000. This allows to load thedata and solve the LASSOproblem.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Outline
Information Overload
Sparse Machine Learning
Topic imagingDynamic imagesCross-language imaging
ASRS Study
References
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
What is topic imaging?
Topic image: A small set of terms that are semantically related to agiven topic (“the query”).
As a predictive problem: predict appearance of query term in adocument given the term use in that document.
I Predictive model must be interpretable: number of predictors(other terms) must be few (sparse classification).
I Model must be obtained fast .
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
What is topic imaging?
Topic image: A small set of terms that are semantically related to agiven topic (“the query”).
As a predictive problem: predict appearance of query term in adocument given the term use in that document.
I Predictive model must be interpretable: number of predictors(other terms) must be few (sparse classification).
I Model must be obtained fast .
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
What is topic imaging?
Topic image: A small set of terms that are semantically related to agiven topic (“the query”).
As a predictive problem: predict appearance of query term in adocument given the term use in that document.
I Predictive model must be interpretable: number of predictors(other terms) must be few (sparse classification).
I Model must be obtained fast .
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Occurence vs. ClassificationTwo NYT op-ed columnists
Occurence analysis : top 10 words
Nicholas Kristof Roger Cohenmr obama
people iranobama said
said americanpresident president
world iraniannew israel
american statesyears newunited united
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Occurence vs. ClassificationTwo NYT op-ed columnists
Occurence analysis : top 10 words Sparse classification
Nicholas Kristof Roger Cohenmr obama
people iranobama said
said americanpresident president
world iraniannew israel
american statesyears newunited united
Nicholas Kristof Roger Cohenvideos olmertdarfur persian
antibiotics chemicalfacebook mohammadsudanese alijanjaweed dialogueyoutube ceasesudan iranian
sweatshops tehraninvite holocaust
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Image across time: “Microsoft”Data: The New York Times headlines, 1985-2007
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
“China”Data: The New York Times headlines, 1985-2007
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
“Cancer”Data: The New York Times headlines, 1985-2007
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
“Diabetes”Data: The New York Times headlines, 1985-2007
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Topic imaging in foreign languagesI Translate query term.I Run topic imaging task on foreign press data in original language.I Translate the few terms in the resulting list.
Avoids huge translation task!
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Topic imaging in foreign languagesI Translate query term.I Run topic imaging task on foreign press data in original language.I Translate the few terms in the resulting list.
Avoids huge translation task!
Query: can you guess?Source: People’s Daily, Feb-Apr 2011.
file:///C|/Users/Dai%20Xinyu/Desktop/result1.txt[2011-04-05 12:43:34]
LIBYA:
(The first column is chinese words of LIBYA, the second colume is the most related chinese words
of LIBYA from China DAILY NEWSPAPER in 2011,
the third column is the translation of the second column)
利比亚 欧佩克 opec
利比亚 武力 force
利比亚 局势 situation
利比亚 行动 action
利比亚 平民 civilians
利比亚 撤出 withdrawal
利比亚 空袭 airstrike
利比亚 北非 french-speaking
利比亚 瓦莱塔 valletta
利比亚 撤离 evacuate
利比亚 军机 planes
利比亚 人道主义 humanitarianism
利比亚 卡扎菲 qadhafi
DALAI:
达赖 中央政府 government
达赖 访 visit
达赖 藏传 tibetan
达赖 祖国 motherland
达赖 分裂 split
达赖 台 taiwan
达赖 宗教 religion
达赖 集团 clique
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Outline
Information Overload
Sparse Machine Learning
Topic imagingDynamic imagesCross-language imaging
ASRS Study
References
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
ExampleDiscovery of emerging issues in flight security
After each commercial flight in the US, pilots generate “ASRS reports”to document flight-related issues.
Key problem: detect emerging issues that are not being classified intoexisting categories, e.g.:
I “Wake vortex” problem of the Boeing 757.I Increased number of runway incursions at LAX.
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
ExampleDiscovery of emerging issues in flight security
After each commercial flight in the US, pilots generate “ASRS reports”to document flight-related issues.
Key problem: detect emerging issues that are not being classified intoexisting categories, e.g.:
I “Wake vortex” problem of the Boeing 757.I Increased number of runway incursions at LAX.
Don’t search for a needle — picture the haystack!
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Sparse PCA Imaging
CLE
DFW
ORDMIA
BOSLAX
STLPHLMDW
DCA SFOZZZEWRATL
LGA
LAS
PIT
HOU BWICYYZ
SEAJFK
clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
rwy19lrwy31ltxwydrwy34lrwy16lrwy34rtxwyerwy28crwy28rrwy28l
Specificstakeoff
rwyustaxi
clearancehold
aircraftclearshorttower
runway
Aviate
apologizedrwy33lrwy16rvehicletxwyhrwy15rtxwyfrwy1rrwy1lrwy12rS
pecifics
stoppedlanding
controllercross
intermediatefixfirstofficer
timepositionrwy23lline
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1/view.html
1 of 1 6/14/11 10:53 AM
Figure 2: A sparse PCA plot of the runway ASRS data. Here, each data point is an airport, withsize of the circles consistent with the number of reports for each airport.
CLEDFW
ORD
MIA
BOSLAX
STLPHLMDW
DCASFO
ZZZ
EWR
ATL
LGA
LASPIT
HOUBWICYYZ
SEA JFK clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
frequentlymotionpermissionopenworkloadportmiddleapologizeddirectvehicleM
anage
linetakeofftaxi
clearancehold
aircraftclearshorttower
runway
Aviate
soundflowsrejoineddoublechkingmarkingislandcarefulmaintainedanglevisionN
avigate
captaincrossing
secondofficerposition
intermediatefixtime
crossfirstofficercontrollerlanding
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1%20without%20Rwy-Txwy/view...
1 of 1 6/14/11 10:52 AM
Figure 3: A sparse PCA plot of the runway ASRS data, with runway features removed.
10
!"#$%&'("#)"*+()%,-%'&+("#+'(&.&+
+ /(01+-2+'1$&$+#-'&+%$,%$&$"'&+%$,-%'&+'1('+-%)*)"('$+2%-3+(+,(%4056(%+()%,-%'7+
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Sparse PCA Imaging
CLE
DFW
ORDMIA
BOSLAX
STLPHLMDW
DCA SFOZZZEWRATL
LGA
LAS
PIT
HOU BWICYYZ
SEAJFK
clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
rwy19lrwy31ltxwydrwy34lrwy16lrwy34rtxwyerwy28crwy28rrwy28l
Specificstakeoff
rwyustaxi
clearancehold
aircraftclearshorttower
runway
Aviate
apologizedrwy33lrwy16rvehicletxwyhrwy15rtxwyfrwy1rrwy1lrwy12rS
pecifics
stoppedlanding
controllercross
intermediatefixfirstofficer
timepositionrwy23lline
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1/view.html
1 of 1 6/14/11 10:53 AM
Figure 2: A sparse PCA plot of the runway ASRS data. Here, each data point is an airport, withsize of the circles consistent with the number of reports for each airport.
CLEDFW
ORD
MIA
BOSLAX
STLPHLMDW
DCASFO
ZZZ
EWR
ATL
LGA
LASPIT
HOUBWICYYZ
SEA JFK clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
frequentlymotionpermissionopenworkloadportmiddleapologizeddirectvehicleM
anage
linetakeofftaxi
clearancehold
aircraftclearshorttower
runway
Aviate
soundflowsrejoineddoublechkingmarkingislandcarefulmaintainedanglevisionN
avigate
captaincrossing
secondofficerposition
intermediatefixtime
crossfirstofficercontrollerlanding
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1%20without%20Rwy-Txwy/view...
1 of 1 6/14/11 10:52 AM
Figure 3: A sparse PCA plot of the runway ASRS data, with runway features removed.
10
!"#$%&'("#)"*+()%,-%'&+("#+'(&.&+
+ /(01+-2+'1$+2-3%+#)%$04-"&+0-%%$&,-"#&+'-+-"$+-2+'1$+2-3%+5(&)0+,)6-'+'(&.&+7$*8+99:;)('$<<=>+
+ ++ +?1$+'$%@&+A$%$+
!"#$%!&'!(()*2-3"#>+
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Sparse PCA Imaging
CLE
DFW
ORDMIA
BOSLAX
STLPHLMDW
DCA SFOZZZEWRATL
LGA
LAS
PIT
HOU BWICYYZ
SEAJFK
clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
rwy19lrwy31ltxwydrwy34lrwy16lrwy34rtxwyerwy28crwy28rrwy28l
Specificstakeoff
rwyustaxi
clearancehold
aircraftclearshorttower
runway
Aviate
apologizedrwy33lrwy16rvehicletxwyhrwy15rtxwyfrwy1rrwy1lrwy12rS
pecifics
stoppedlanding
controllercross
intermediatefixfirstofficer
timepositionrwy23lline
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1/view.html
1 of 1 6/14/11 10:53 AM
Figure 2: A sparse PCA plot of the runway ASRS data. Here, each data point is an airport, withsize of the circles consistent with the number of reports for each airport.
CLEDFW
ORD
MIA
BOSLAX
STLPHLMDW
DCASFO
ZZZ
EWR
ATL
LGA
LASPIT
HOUBWICYYZ
SEA JFK clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
frequentlymotionpermissionopenworkloadportmiddleapologizeddirectvehicleM
anage
linetakeofftaxi
clearancehold
aircraftclearshorttower
runway
Aviate
soundflowsrejoineddoublechkingmarkingislandcarefulmaintainedanglevisionN
avigate
captaincrossing
secondofficerposition
intermediatefixtime
crossfirstofficercontrollerlanding
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1%20without%20Rwy-Txwy/view...
1 of 1 6/14/11 10:52 AM
Figure 3: A sparse PCA plot of the runway ASRS data, with runway features removed.
10
!"#$%&'("#)"*+()%,-%'&+("#+'(&.&+
+ !"#$%&'()*+%,()%,-%'&+(%$+/-&'01+$2,-&$#+'-+(3)(4-"+5$*6+'(.$7-89+("#+:-//;"):(4-"+)&&;$&<+
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Sparse PCA Imaging
CLE
DFW
ORDMIA
BOSLAX
STLPHLMDW
DCA SFOZZZEWRATL
LGA
LAS
PIT
HOU BWICYYZ
SEAJFK
clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
rwy19lrwy31ltxwydrwy34lrwy16lrwy34rtxwyerwy28crwy28rrwy28l
Specificstakeoff
rwyustaxi
clearancehold
aircraftclearshorttower
runway
Aviate
apologizedrwy33lrwy16rvehicletxwyhrwy15rtxwyfrwy1rrwy1lrwy12rS
pecifics
stoppedlanding
controllercross
intermediatefixfirstofficer
timepositionrwy23lline
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1/view.html
1 of 1 6/14/11 10:53 AM
Figure 2: A sparse PCA plot of the runway ASRS data. Here, each data point is an airport, withsize of the circles consistent with the number of reports for each airport.
CLEDFW
ORD
MIA
BOSLAX
STLPHLMDW
DCASFO
ZZZ
EWR
ATL
LGA
LASPIT
HOUBWICYYZ
SEA JFK clevelandProximitytaillinesconfusingnonstandardrptrorangelinesnowCoordinatedUniversalTimeGMTRouteerrordepictedoperatingbrasilia
Most distinguishing words for "CLE"
frequentlymotionpermissionopenworkloadportmiddleapologizeddirectvehicleM
anage
linetakeofftaxi
clearancehold
aircraftclearshorttower
runway
Aviate
soundflowsrejoineddoublechkingmarkingislandcarefulmaintainedanglevisionN
avigate
captaincrossing
secondofficerposition
intermediatefixtime
crossfirstofficercontrollerlanding
Communicate
CLE DFW ORD MIA BOS LAX STL PHL MDW DCA SFO ZZZ EWR ATL LGA LAS PIT HOU BWI CYYZ SEA JFK
file:///Users/elghaoui/Dropbox/ASRS/Visualization/Runway%20Plot%20-%20v1%20without%20Rwy-Txwy/view...
1 of 1 6/14/11 10:52 AM
Figure 3: A sparse PCA plot of the runway ASRS data, with runway features removed.
10
!"#$%&'("#)"*+()%,-%'&+("#+'(&.&+
+ !"#$$%&'()$*"%+()%,-%'&+(%$+/-&'01+$2,-&$#+'-+/("(*$/$"'+34-%0-(#5+("#+"(6)*(7-"+3$*8+/(%.)"*&+-"+%9"4(15+)&&9$&:+
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
LASSO for DFW!"#$%&'(()(*+&&,-..-/01"*2&3"*24&5#*)"*2&
airport term 1 term 2 term 3 term 4 term 5 term 6 term 7 term 8CLE Rwy23L Rwy24L Rwy24C Rwy23R Rwy5R Line Rwy6R Rwy5LDFW Rwy35C Rwy35L Rwy18L Rwy17R Rwy18R Rwy17C cross TowerORD Rwy22R Rwy27R Rwy32R Rwy27L Rwy32L Rwy22L Rwy9L Rwy4LMIA Rwy9L TxwyQ Rwy8R Line Rwy9R PilotInCommand TxwyM TakeoffBOS Rwy4L Rwy33L Rwy22R Rwy4R Rwy22L TxwyK Frequency CaptainLAX Rwy25R Rwy25L Rwy24L Rwy24R Speed cross Line TowerSTL Rwy12L Rwy12R Rwy30L Rwy30R Line cross short TxwyPPHL Rwy27R Rwy9L Rwy27L TxwyE amass TxwyK AirCarrier TxwyYMDW Rwy31C Rwy31R Rwy22L TxwyP Rwy4R midway Rwy22R TxwyYDCA TxwyJ Airplane turn Captain Line Traffic Landing shortSFO Rwy28L Rwy28R Rwy1L Rwy1R Rwy10R Rwy10L b747 CaptainZZZ hangar radio Rwy36R gate Aircraft Line Ground TowerERW Rwy22R Rwy4L Rwy22L TxwyP TxwyZ Rwy4R papa TxwyPBATL Rwy26L Rwy26R Rwy27R Rwy9L Rwy8R atlanta dixie crossLGA TxwyB4 ILS Line notes TxwyP hold vehicle TaxiwayLAS Rwy25R Rwy7L Rwy19L Rwy1R Rwy1L Rwy25L TxwyA7 Rwy19RPIT Rwy28C Rwy10C Rwy28L TxwyN1 TxwyE TxwyW Rwy28R TxwyVHOU Rwy12R Rwy12L citation Takeoff Heading Rwy30L Line TowerBWI TxwyP Rwy15R Rwy33L turn TxwyP1 Intersection TxwyE Taxiway
CYYZ TxwyQ TxwyH Rwy33R Line YYZ Rwy24R short torontoSEA Rwy34R Rwy16L Rwy34L Rwy16R AirCarrier FirstOfficer TxwyJ SMAJFK Rwy31L Rwy13R Rwy22R Rwy13L vehicle Rwy4L amass Rwy31R
Table 3: The terms recovered with LASSO image analysis of a few airports in the “runway” ASRSdata set.
DFW
Rwy35Cairline
reconfirmed
final
TxwyZ
a319
Rwy17C
RwyERlahso
chargeCommunication
Rwy35L
mu2 cbasimultaneouslylima
TxwyEL
Rwy18L
TxwyWM
xxxx
Rwy18R
TxwyA
xyz
Rwy13R
Rwy36Lswitch
AirlineTransportRating
Rwy17R
RwyXX
spot
abed
Rwy36R
Week
cleared
TxwyWLTxwyF
turn
md80
b757abcdRwy22RcbaTxwyEL
committed
downfield
received
cba
b727
danger
Crossing
TxwyY
statnews.org http://statnews.org/tree_dfw
1 of 1 6/14/11 9:40 PM
CYYZ
YYZdisplaced
may
Crossing
across
along
toronto
ClearAirTurbulencelooks
holdoverold bleed Report
TxwyQ
needed
TxwyH
separationholdover
Rwy33R
Knot
TxwyE
TxwyR
radio
initial
Ground
Line
protected
never
area
proceeded
stop
landed
final
length
fulldata
Minutealongcloseout
previous
rightfollowingwhile
conflictsign
inadvertently
distance
Knot
bleed
placed
toward
statnews.org http://statnews.org/tree_cyyz
1 of 1 6/14/11 9:39 PM
Figure 5: A tree LASSO analysis of the DFW (left panel) and CYYZ (right panel) airports,showing the LASSO image (inner circle) and for each term in that image, a further image.
Figure 6: Diagram of DFW.
12
• &6'($78(/&9*#79-.&*:$;-</=&#$&)-*79:.-*&*:$;-<>2-?#;-<&#$2(*/(97"$/&24-2&9-:/(&2*":@.(A&
• &5./"&#'($78(/&9"BB:$#9-7"$&#//:(/A&
C-?#&3-<&3D&
E:$;-<&FGE&
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Outline
Information Overload
Sparse Machine Learning
Topic imagingDynamic imagesCross-language imaging
ASRS Study
References
Robust Optimization &Machine Learning7. Text Analytics
Information Overload
Sparse MachineLearning
Topic imagingDynamic images
Cross-language imaging
ASRS Study
References
Xinyu Dai, Jinzhu Jia, Laurent El Ghaoui, and Bin Yu.
SBA-term: Sparse bilingual association for terms.In Fifth IEEE International Conference on Semantic Computing, Palo Alto, CA, USA, 2011.
Laurent El Ghaoui, Vivian Viallon, and Tarek Rabbani.
Safe feature elimination for the LASSO.Submitted to Journal of Machine Learning Research, April 2011.Early draft: EECS Technical Report no. 126, September 2010.
B. Gawalt, J. Jia, L. Miratrix, L. El Ghaoui, B. Yu, and S. Clavier.
Discovering word associations in news media via feature selection and sparse classification.In Proc. 11th ACM SIGMM International Conference on Multimedia Information Retrieval, 2010.
Luke Miratrix, Jinzhu Jia, Brian Gawalt, Bin Yu, and Laurent El Ghaoui.
Summarizing large-scale, multiple-document news data: sparse methods and human validation.submitted to JASA.
Haipeng Shen and Jianhua Z. Huang.
Sparse principal component analysis via regularized low rank matrix approximation.J. Multivar. Anal., 99:1015–1034, July 2008.
Y. Zhang, A. d’Aspremont, and L. El Ghaoui.
Sparse PCA: Convex relaxations, algorithms and applications.In M. Anjos and J.B. Lasserre, editors, Handbook on Semidefinite, Cone and Polynomial Optimization: Theory,Algorithms, Software and Applications. Springer, 2011.To appear.
top related