michael alcorn, sr. software engineer, red hat inc. at mlconf sf 2017
TRANSCRIPT
![Page 1: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/1.jpg)
REPRESENTATIONLEARNING @ RED HATMichael A. Alcorn ([email protected])
Machine Learning Engineer - Information Retrieval
https://sites.google.com/view/michaelaalcorn/
1
![Page 2: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/2.jpg)
OutlineBackgroundword2vec/url2vecdoc2vec/account2vecDuplicate Detection(batter|pitcher)2vec
MLconf Blog
2
![Page 3: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/3.jpg)
Background
Why?Small amount (zero?) of labeled data for taskLots of unlabeled data (labeled data for a differenttask?)
Can we use large amounts of unlabeled data to makebetter predictions?
Not the same as traditional unsupervised learning!
in Goodfellow et al.'s Deep Learningtextbook
by Bengio et al.
Representation learning
Transfer learning
Excellent chapter
Article
3
![Page 4: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/4.jpg)
word2vec
ew
TextTextTextText
NVIDIA - " "Introduction to Neural Machine Translation with GPUs (Part 2)
4
![Page 5: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/5.jpg)
word2vec
ew
Deeplearning4j - " "
Mikolov et al. (2013)
Word2vec
5
![Page 6: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/6.jpg)
word2vecAnalogies
"x is to y as ? is to z" x - y + z = ?bash - shellshock + heartbleed = opensslfirefox - linux + windows = internet_exploreropenshift - cloud + storage = glusterrhn_register - rhn + rhsm = subscription-manager
=+—
6
![Page 7: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/7.jpg)
Naming Colors mapping RGB values to
color namesResults are pretty underwhelming for those in theknowCan word embeddings improve ( )?
Blog post by Janelle Shane
GitHub
7
![Page 8: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/8.jpg)
url2vecTasks concerning URLs
Search - returning relevant contentTroubleshooting - recommending related articles
Obvious method - look at textAlternative/enhanced method - use customerbrowsing behavior as additional contextual clues
8
![Page 9: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/9.jpg)
url2vecHow?
Treat each day of browsing activity as a "sentence"Treat each URL as a "word"Run word2vec!
9
![Page 10: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/10.jpg)
url2vec
https://access.redhat.com/solutions/25190
https://access.redhat.com/solutions/10107
Application: ScatterPlot3D
10
![Page 11: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/11.jpg)
doc2vec
" "
Le and Mikolov (2014)
NLP 05: From Word2vec to Doc2vec: a simple example with Gensim
11
![Page 12: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/12.jpg)
customer2vecWhy?
Data-driven segmentation
Same idea as url2vec except now we treat each account asa "document" of many "sentences" (different browsingdays)
12
![Page 13: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/13.jpg)
customer2vecWhy?
Data-driven segmentation
Same idea as url2vec except now we treat each account asa "document" of many "sentences" (different browsingdays)
13
![Page 15: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/15.jpg)
Duplicate DetectionThere are a number of "duplicate" KCS solutions onthe Customer Portal
Muddy search results
How can we identify candidate duplicate documents?
Obvious approach - compare text (e.g., tf-idf)
Bag-of-words loses any structural meaning behind text
Can we learn better representations?
Title is essentially a summary of the solution contentLearn representations of body that are similar to titlerepresentations (like the DSSM; )my code
15
![Page 16: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/16.jpg)
Deep Semantic Similarity Model
Jianfeng Gao - " "Deep Learning for Web Search and Natural Language Processing
16
![Page 17: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/17.jpg)
(batter|pitcher)2vec ( )GitHubCan we learn meaningful representations of MLBplayers?
Accurate representations could be used to simulategames and inform tradesFind undervalued/overvalued players
17
![Page 18: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/18.jpg)
Can we learn meaningful representations of MLBplayers?
Accurate representations could be used to simulategames and inform tradesFind undervalued/overvalued players
(batter|pitcher)2vec ( )GitHub 18
![Page 19: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/19.jpg)
Can we learn meaningful representations of MLBplayers?
Accurate representations could be used to simulategames and inform tradesFind undervalued/overvalued players
SI.com NBCSports.com
=+— LR
(batter|pitcher)2vec ( )GitHub 19
![Page 20: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/20.jpg)
(batter|pitcher)2vec
""
Learning to CoachFootball
Wang and Zemel (2016)
20
![Page 21: Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017](https://reader035.vdocuments.mx/reader035/viewer/2022062504/5a65ee957f8b9aaf638b6151/html5/thumbnails/21.jpg)
THANK YOU!
21