loras college 2014 business analytics symposium | andy stevens: big data analytics
DESCRIPTION
This session will cover issues and and advice for implementing Big Data Analytics in a Research and Development context. In addition to the basics, it will discuss the past, present and future and touch on relevant mathematics, statistics, science, technology, economics, business, history and even some literature. For more information on the Loras College 2014 Business Analytics Symposium, the Loras College MBA in Business Analytics or the Loras College Business Analytics Certificate visit www.loras.edu/mba or www.loras.edu/bigdata.TRANSCRIPT
Big Data Analytics, R&DRobert Andrew Stevens, CFA
John Deere
Disclaimer
The information, views, and opinions contained in this presentation are those of the author and do not necessarily reflect the views and opinions of John Deere
Outline = Favorite Quotes
1. “when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”
2. “it takes all the running you can do, to keep in the same place”
3. “The future is already here – it’s just not evenly distributed”4. “The essence of strategy is the timing of the sunk cost
commitment”5. “Americans can always be counted on to do the right
thing...”
“when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”
“I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.”
Lecture on “Electrical Units of Measurement” (3 May 1883), published in Popular Lectures Vol. I, p. 73; quoted in Encyclopaedia of Occupational Health and Safety (1998) by Jeanne Mager Stellman, p. 1992http://en.wikiquote.org/wiki/William_Thomson
http://en.wikipedia.org/wiki/Lord_Kelvin
William Thomson, 1st Baron Kelvin
1824–1907
a.k.a.: Lord KelvinOccupation: mathematical physicist and engineer
What is Analytics?Turning Data into Decisions
Production, Assembly, Inspection
Distribution
Consumers
ConsumerResearch
Designand
Redesign
Receipt andTest of
Materials
Tests of Process,Machines, Methods,
Costs
Suppliers ofMaterials and
Equipment
* Deming, W.E. Out of the Crisis,1986 (p. 4)
Production Viewed as a System *
Take Action!
The Road to Earlier Discovery and Shorter Decision Cycles
Big Data in R&D at John Deere
Primarily machine data: CAN and GPSVolume: immeasurableVelocity: fast and furiousVariety: nothing is the sameValue: TBD
“it takes all the running you can do, to keep in the same place”
The Red Queen's race is an incident that appears in Lewis Carroll's Through the Looking-Glass and involves the Red Queen, a representation of a Queen in chess, and Alice constantly running but remaining in the same spot.
“Well, in our country,” said Alice, still panting a little, “you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”“A slow sort of country!” said the Queen. “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”http://en.wikipedia.org/wiki/Red_Queen's_race
http://en.wikipedia.org/wiki/Lewis_Carroll
Charles Lutwidge Dodgson
1832–1898
Pen name: Lewis CarrollOccupation: Writer, mathematician, Anglican cleric, photographer, artist
The Problem/Opportunity
Data generated
Data analyzed
Data captured and stored
[Remember: DIKW = Data Information Knowledge Wisdom ?]
Ideally, if nothing changes…Today Transition Vision
But the data generated might grow faster than we can manage
[Ever hear of “The Internet of Things” ?]
Today Transition Vision
So, maybe we should try to do something like this…
[“If you want to get somewhere else, you must run at least twice as fast as that!”]
Today Transition Vision
A Solution: Data Science
• Applies everywhere
• Practical/feasible?
• In R&D?http://www.dataists.com/2010/09/the-data-science-venn-diagram
Data Science in R&D
1. Multidisciplinary Investigations (25%) 2. Models and Methods for Data (20%) 3. Computing with Data (15%) 4. Pedagogy (15%) 5. Tool Evaluation (5%) 6. Theory (20%)Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics , ISI Review, , 69, 21-26. W. S. Cleveland, 2001.http://www.stat.purdue.edu/~wsc/papers/datascience.pdf
“The future is already here – it’s just not evenly distributed”— William Gibson, quoted in The Economist, December 4, 2003
http://www.economist.com/printedition/2003-12-06http://en.wikipedia.org/wiki/William_Gibson
William Gibson1948–
CERN: Solving the Mysteries of the Universe with Big Data
The Large Hadron Collider Computing Challenge• Data volume
– High rate large number of channels 4 experiments – 15 PetaBytes of new data each year 30 PB in 2013
• Overall compute power – Event complexity Nb. events thousands users – 200 k cores 350 k cores– 45 PB of disk storage 150 PB Storage
http://openlab.web.cern.ch/sites/openlab.web.cern.ch/files/presentations/Jarp_Big_Data_Boston_final.pdf (09/12/13)
The Scientific Method
1. Formulation of a question
2. Hypothesis3. Prediction4. Testing5. Analysis
http://en.wikipedia.org/wiki/Scientific_method
An 18th-century depiction of early experimentation in the field of chemistry
“The essence of strategy is the timing of the sunk cost commitment”Verbal communication during UIUC MBA Strategic Management class
http://www.amazon.com/Economic-Foundations-Strategy-Organizational-Science/dp/1412905435http://business.illinois.edu/facultyprofile/faculty_profile.aspx?ID=99
Professor of Business Administration and Caterpillar Chair of BusinessUniversity of Illinois at Urbana-Champaign
Joseph T. Mahoney1958–
What happens to Q as P 0?• Change “Household” to
“Firm”• Change “chocolate” to
“software”• Now what happens to Q as
P 0?• How could that happen in
a Big Data Analytics, R&D context?http://catalog.flatworldknowledge.com/bookhub/reader/2992?e=coopermicro-ch07_s01
Figure 7.1 The Demand Curve of an Individual Household
The One-Day MBA
http://www.engineeringtoolbox.com/cash-flow-diagrams-d_1231.htmlhttp://en.wikipedia.org/wiki/Net_present_value
𝑁𝑃𝑉=∑𝑡=0
𝑛 𝐹 𝑡
(1+𝑖)𝑡
F0 = Sunk cost investment
• Assuming Ft does not decrease* for t > 0, what happens to NPV as F0 0?
• How could that happen in a Big Data Analytics, R&D context?
• What are the implications for strategy?
Avoid Sunk Cost Commitments and Vendor Lock-in with Open Source
• Apache: http://www.apache.org/– Hadoop, Hive, Mahout, Pig, Spark…
• GRASS GIS: http://grass.osgeo.org/• Java: http://www.java.com/ + Cassandra• Julia: http://julialang.org/• Perl: http://www.perl.org/• Python: http://www.python.org/• R: http://cran.us.r-project.org/ + RHIPE• Scala: http://scala-lang.org/ + Scalding• SQL:
– http://www.mysql.com/– http://www.postgresql.org/ + PostGIS
“Americans can always be counted on to do the right thing...”
“...after they have exhausted all other possibilities.”
Also famous for: “We shall never surrender” “peace in our time”And many others relevant to The War on Data
http://www.quotedb.com/quotes/2313https://en.wikipedia.org/wiki/Winston_churchill
Sir Winston Churchill1874–1965
Profession: Member of Parliament , statesman, soldier, journalist, historian, author, painter
Tips for winning The War on Data
Teamwork
Statistics
Partner with IT
Learn-Do-Teach
Replenish your toolbox
Math
Pop Quiz
What are the 3 most important things in Real Estate?1. Location2. Location3. Location
What are the 3 most important things in Statistics?4. Look at the data5. Look at the data6. Look at the data
… especially for Big Data Analytics:7. Look at the data before you analyze it: Exploratory Data Analysis (EDA)8. Look at the data while you analyze it: model diagnostics9. Look at the data after you analyze it: visualization and communication
Other Survival Tips
• Visualization and Communication– Tools: R & Rmd, Ggobi, Tableau, ArcGIS/GRASS…– Presentations: Tell them 3X, 5Ws
• Collaboration: working as a team– File and code version control– Google's R Style Guide
• Reproducible Research best practices– Avoid errors by Potti (Duke) and Rogoff & Reinhart (Harvard)
• http://en.wikipedia.org/wiki/Anil_Potti• http://en.wikipedia.org/wiki/Reinhart-Rogoff
Summary = Favorite Quotes
1. “when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”
2. “it takes all the running you can do, to keep in the same place”
3. “The future is already here – it's just not evenly distributed”4. “The essence of strategy is the timing of the sunk cost
commitment”5. “Americans can always be counted on to do the right
thing...”“Those who cannot remember the past are condemned to repeat it.”– George Santayana
Q & A
Contact Information
E-mail:[email protected] (business)
[email protected] (personal)
LinkedIn: http://www.linkedin.com/pub/robert-andrew-stevens-cfa/6a/a04/315
Twitter: https://twitter.com/RobertAndrewSt3
GitHub: https://github.com/robertandrewstevens