se2016 bigdata denis reznik "data driven future"

Post on 11-Apr-2017

118 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data-Driven FutureWhat to Learn and What to Expect?

Denis ReznikData Architect at Intapp KyivMicrosoft Data Platform MVP

About me

•Denis Reznik

•Kyiv, Ukraine

•Data Architect at Intapp, Inc.

•Microsoft Data Platform MVP

•Co-Founder of Ukrainian Data Community

2 |

Agenda

•Data is a new Oil (c)

•Data and Science

•Data in Big Companies

•Data and Application Development

•Data-Driven Future

Data is a New Oil

“Data is the new oil. It’s valuable, but if unrefined it

cannot really be used. It has to be changed into gas,

plastic, chemicals, etc to create a valuable entity that

drives profitable activity; so must data be broken

down, analyzed for it to have value.”

(c) Clive Humby, UK Mathemetician

Data and Science

•Thousands of years•Empirical

•Few hundreds of years•Theoretical

•Last fifty years•Computational•“Query the world”

•Last twenty years•eScience (Data Science)•“Download the world”

Machine Learning

Supervised Learning Unsupervised Learning

Classification Regression

Linear Regression

Learning Algorithm

Training Data

h

h - Hypothesis

OceanTemperature

WhalesPopulation

DEMO

Linear Regression

Data in Big Companies

Parallel Processing

Temperature Sensor Datasets (n Items)

Q: How many times temperature was above the norm during the last week?

A: 5

Time: 2 sec

Algorithmic Complexity: O(n)

Parallel Processing

Temperature Sensor Datasets (k Items in each one)

Q: How many times temperature was above the norm during the last week?

A: 1

Time: 0.5 sec

Algorithmic Complexity: O(n/k)

A: 0 A: 3 A: 4

Map-Reduce

A: 1

Map -> COUNT(*) WHERE Value > 40

A: 0 A: 3 A: 4

Reduce -> COUNT(*)

A: 5

Reduce

DEMO

Map-Reduce

Data and Application Development

source: https://www.youtube.com/watch?v=t6kM2EM6so4

Index (B-Tree) - Seek

1 .. 1M

1 .. 2K 2K+1 .. 4K

1M-2K .. 1M

1 .. 300 301..800 801..1,5K 1,5K+1..2K

SELECT * FROM UsersWHERE Id = 523

Index (B-Tree) - Scan

1 .. 1M

1 .. 2K 2K+1 .. 4K

1M-2K .. 1M

1 .. 300 301..800 801..1,5K 1,5K+1..2K

SELECT * FROM Users

Index (B-Tree) - Range Scan

1 .. 1M

1 .. 2K 2K+1 .. 4K

1M-2K .. 1M

1 .. 300 301..800 801..1,5K 1,5K+1..2K

SELECT * FROM UsersWHERE Id BETWEEN 700 AND 1700

Hashtable

John Dow

John Snow

Jack Snack

2

3

1

4

0

John Dow

Hash Function

0

Jack Snack

2

John Snow

0

Data-Driven Future

• Data amount is growing and this is cool

• More and more decisions are based on data

• More and more applications are developed

• It is exciting to be a Software Engineer now!

Thank you!

Denis Reznik

Twitter: @denisreznik

Email: denisreznik@live.ru

Blog: http://reznik.uneta.com.ua

Facebook: https://www.facebook.com/denis.reznik.5

LinkedIn: http://ua.linkedin.com/pub/denis-reznik/3/502/234

top related