Data-Driven FutureWhat to Learn and What to Expect?
Denis ReznikData Architect at Intapp KyivMicrosoft Data Platform MVP
About me
•Denis Reznik
•Kyiv, Ukraine
•Data Architect at Intapp, Inc.
•Microsoft Data Platform MVP
•Co-Founder of Ukrainian Data Community
2 |
Agenda
•Data is a new Oil (c)
•Data and Science
•Data in Big Companies
•Data and Application Development
•Data-Driven Future
Data is a New Oil
“Data is the new oil. It’s valuable, but if unrefined it
cannot really be used. It has to be changed into gas,
plastic, chemicals, etc to create a valuable entity that
drives profitable activity; so must data be broken
down, analyzed for it to have value.”
(c) Clive Humby, UK Mathemetician
Data and Science
•Thousands of years•Empirical
•Few hundreds of years•Theoretical
•Last fifty years•Computational•“Query the world”
•Last twenty years•eScience (Data Science)•“Download the world”
Machine Learning
Supervised Learning Unsupervised Learning
Classification Regression
Linear Regression
Learning Algorithm
Training Data
h
h - Hypothesis
OceanTemperature
WhalesPopulation
DEMO
Linear Regression
Data in Big Companies
Parallel Processing
Temperature Sensor Datasets (n Items)
Q: How many times temperature was above the norm during the last week?
A: 5
Time: 2 sec
Algorithmic Complexity: O(n)
Parallel Processing
Temperature Sensor Datasets (k Items in each one)
Q: How many times temperature was above the norm during the last week?
A: 1
Time: 0.5 sec
Algorithmic Complexity: O(n/k)
A: 0 A: 3 A: 4
Map-Reduce
A: 1
Map -> COUNT(*) WHERE Value > 40
A: 0 A: 3 A: 4
Reduce -> COUNT(*)
A: 5
Reduce
DEMO
Map-Reduce
Data and Application Development
source: https://www.youtube.com/watch?v=t6kM2EM6so4
Index (B-Tree) - Seek
…
…
1 .. 1M
1 .. 2K 2K+1 .. 4K
1M-2K .. 1M
1 .. 300 301..800 801..1,5K 1,5K+1..2K
SELECT * FROM UsersWHERE Id = 523
Index (B-Tree) - Scan
…
…
1 .. 1M
1 .. 2K 2K+1 .. 4K
1M-2K .. 1M
1 .. 300 301..800 801..1,5K 1,5K+1..2K
SELECT * FROM Users
Index (B-Tree) - Range Scan
…
…
1 .. 1M
1 .. 2K 2K+1 .. 4K
1M-2K .. 1M
1 .. 300 301..800 801..1,5K 1,5K+1..2K
SELECT * FROM UsersWHERE Id BETWEEN 700 AND 1700
Hashtable
John Dow
John Snow
Jack Snack
2
3
1
4
0
John Dow
Hash Function
0
Jack Snack
2
John Snow
0
Data-Driven Future
• Data amount is growing and this is cool
• More and more decisions are based on data
• More and more applications are developed
• It is exciting to be a Software Engineer now!
Thank you!
Denis Reznik
Twitter: @denisreznik
Email: [email protected]
Blog: http://reznik.uneta.com.ua
Facebook: https://www.facebook.com/denis.reznik.5
LinkedIn: http://ua.linkedin.com/pub/denis-reznik/3/502/234