picture credit in noteson-demand.gputechconf.com/gtc/2017/presentation/s7611-ian-lumb... ·...
TRANSCRIPT
www.univa.com Picture credit in notes
www.univa.com
Earthquake Damage
▪ Destructive tsunamis occur frequently—about one a year.
▪ There have been 94 destructive tsunamis in the last hundred years.
▪ There have been 51,000 victims (not including Dec. 26, 2004).
▪ Future tsunami disasters are inevitable.
▪ Growing human population in low-lying coastal areas.
▪ Education about tsunamis can save many lives.
Earth: Portrait of a Planet, 5th edition, by Stephen Marshak © 2015 W. W. Norton & Co.
www.univa.com
Ian Lumb
Solutions Architect
GTC 2017 – San Jose
May 9, 2017
Mitigating Disasters
with GPU-Based Deep
Learning from Twitter?
www.univa.com
4
Tsunamis
Earthquake-Tsunami Causality
Deep Learning from Twitter?
Deep Meaning from Twitter???
Other Disasters
Discussion
Outline
Geist, E.L., Titov, V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific American, v. 294, p. 56-63
Shocking Differences
www.univa.com
www.univa.com
Tsunami Advisories
6
Motivation
▪ Non-deterministic cause
▪ Uncertainty inherent in any attempt to predict earthquakes
o In situ measurements may reduce uncertainty
▪ Lead times
▪ Availability of actionable observations
▪ Communication of situation - advisories, warnings, etc.
▪ Cause-effect relationship
▪ Energy transfer - inputs ... coupling ... outputs
o ‘Geometry’ - bathymetry and topography
▪ Other factors - e.g., tides
▪ Established effect
▪ Far-field estimates of tsunami propagation (pre-computed) and coastal
inundation (real-time) have proven to be extremely accurate ...
requires– Distributed array of deep-ocean tsunami detection buoys + forecasting model
htt
p:/
/cre
dit
.pva
mu
.ed
u/M
CB
DA
20
16
/Slid
es/D
ay2
_Lu
mb
_MC
BD
A1
_Tw
itte
r_Ts
un
ami.p
df
www.univa.com
http://www.gitews.org/en/concept/
Traditional Data Sources
www.univa.com
Deep Learning from Twitter?
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdfwww.univa.com
Big Data’s 6Vs
10
htt
p:/
/cre
dit
.pva
mu
.ed
u/M
CB
DA
20
16
/Slid
es/D
ay2
_Lu
mb
_MC
BD
A1
_Tw
itte
r_Ts
un
ami.p
df
www.univa.com
Acquires tweets with the keyword “earthquake”
use Net::Twitter::Lite::WithAPIv1_1;
my $nt = Net::Twitter::Lite::WithAPIv1_1->new(
consumer_key => 'xxxx...xxxxxxx',
consumer_secret => 'xxxxxx.....xxxxxxxxxx',
access_token => 'xxxxx....xxxxxxxxxxx',
access_token_secret => 'xxxxx.....xxxxxxxxxxx',
ssl => 1
);
my $result = $nt->search("earthquake");
for my $status(@{$result->{statuses}} ) {
print "$status->{text}\n";
}
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Perl Script Prototype
www.univa.com
Deep Learning Workflow
12
After Karau et al., Learning Spark, O’Reilly, 2015
www.univa.com
Deep Learning from Twitter?
Represent data
▪ Twitter data manually curated into ‘ham’ and ‘spam’
▪ In-memory representation via Spark RDDs
Extract features
▪ Frequency-based usage via Spark MLlib HashingTF
⇒ feature vectors
Develop model object
▪ Spark MLlib LogisticRegressionWithSGD used for
classification
Evaluate model
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
www.univa.com
Spark Prototype
www.univa.com
Next Steps: Scaling …
15
OUTIN
DOWN
UP
www.univa.com
17
PyTorch
▪ Python package that provides
▪ Tensor computation – strong GPU acceleration, efficient memory usage
o Integrated with NVIDIA CuDNN and NCCL libraries
▪ Deep Neural Networks built on a tape-based autograd system
▪ Can leverage numpy, scipy and Cython as needed
▪ Available tutorials include Natural Language Processing (NLP)
▪ Revisited text classification via Bag-of-Words
http://pytorch.org/about/www.univa.com
PyTorch BoW Classifier
18
http://pytorch.org/tutorials/beginner/deep_learning_nlp_tutorial.htmlwww.univa.com
Towards Deep Meaning …
▪ A feature vector is a feature vector - it is devoid of semantics
▪ The W3C’s Web Ontology Language (OWL) accounts for domain
specifics - disambiguates use of overloaded terms (e.g.,
“earthquake”) in different contexts (e.g., geophysics vs. movies vs.
…)
www.univa.com
20
PyTorch
▪ Python package that provides
▪ Tensor computation – strong GPU acceleration, efficient memory usage
o Integrated with NVIDIA CuDNN and NCCL libraries
▪ Deep Neural Networks built on a tape-based autograd system
▪ Can leverage numpy, scipy and Cython as needed
▪ Available tutorials include Natural Language Processing (NLP)
▪ Revisited text classification via Bag-of-Words
▪ Investigating word embeddings to expose semantic similarity
http://pytorch.org/about/www.univa.com
21
Word Embeddings for Semantic Similarity
▪ “… words appearing in similar contexts are related to each other
semantically.” (Guthrie, PyTorch NLP tutorial)
▪ Could word embeddings disambiguate use of terms (e.g.,
“earthquake”) in different contexts (e.g., geophysics vs. movies vs.
…)???
After Goodfellow et al., 2016www.univa.com
Towards Deep Meaning (Revisited) …
▪ A feature vector is a feature vector - it is devoid of semantics
▪ Ignores inherent, overall credibility of a Tweet - e.g., as quantified by
TweetCred
▪ Twitter metadata (handles, hashtags and URLs) contributes equally
to Twitter data (unstructured text that comprises the body of a
Tweet) in constructing feature vectors - i.e., the semantic value of
Twitter metadata is also ignored by Deep Learning
▪ The W3C’s Resource Description Framework (RDF) facilitates the
representation of metadata and thus exposes semantics
▪ The W3C’s Web Ontology Language (OWL) accounts for domain
specifics - disambiguates use of overloaded terms (e.g.,
“earthquake”) in different contexts (e.g., geophysics vs. movies vs.
…)
▪ Deep Learning in combination with RDF/OWL semantics has the
potential to produce learned models with knowledge represented
www.univa.com
Discussion
▪ Credible tweets could be transformative - Big Data source that can
complement traditional sources (e.g., scientific instruments)
▪ Working with 6V Twitter data can be challenging, though it also
presents interesting opportunities
▪ Curation of training data is extremely important, but also extremely
time consuming (as this is a manual process)
▪ Current research emphasizes Deep Learning, BUT RDF/OWL
semantics will need to play a role ultimately
▪ Approach can be genericized for application to natural and
anthropogenic disasters of all kinds
www.univa.com
Univa Confidential 24
Acknowledgements
Collaborator: James Freemantle
Accounting for Oil Spills and more …
▪ Energy exploration via reflection seismology provides the
fundamental source of data that is subsequently processed and
interpreted for the identification of potential petroleum reservoirs
▪ Reservoir simulation is used to engineer the extraction of petroleum
reserves from reservoirs
▪ Drilling is used to ‘truth’ the results provided by interpretations and
simulations prior to production extraction
▪ SOPs ensure extraction of oil from a production reservoir is routinely
monitored and reported upon - e.g., to quantify rig safety and output
(barrels/day)
▪ From exploration to extraction, this is a data-rich workflow
▪ Additional data sources become relevant when disasters occur (e.g.,
oil spills) - from re-purposed scientific instruments (e.g., weather
satellites) to social media (e.g., Twitter, Instagram, Snapchat, ...)
▪ Data-rich workflows can generate problems in Big Data Analytics
www.univa.com