java bigdata full stack development (version 2.0)

92
Java BigData Full Stack Development as is ... Alexey Zinovyev, Java Trainer in EPAM

Upload: alexey-zinoviev

Post on 11-Apr-2017

385 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Java BigData Full Stack Development (version 2.0)

Java BigData Full Stack

Development as is ...

Alexey Zinovyev, Java Trainer in EPAM

Page 2: Java BigData Full Stack Development (version 2.0)

About

With IT since 2007

With Java since 2009

With Hadoop since 2012

With EPAM since 2015

Page 3: Java BigData Full Stack Development (version 2.0)

3Java Big Data Full Stack Development

Contacts

E-mail : [email protected]

Twitter : @zaleslaw @BigDataRussia

vk.com/big_data_russia Big Data Russia

vk.com/java_jvm Java & JVM langs

Page 4: Java BigData Full Stack Development (version 2.0)

4Java Big Data Full Stack Development

The Good Old Days

Page 5: Java BigData Full Stack Development (version 2.0)

5Java Big Data Full Stack Development

HRs & RMs are looking for Java developers

Page 6: Java BigData Full Stack Development (version 2.0)

6Java Big Data Full Stack Development

Is Java Dream Team waiting You?

Page 7: Java BigData Full Stack Development (version 2.0)

7Java Big Data Full Stack Development

Required Skills

• Advanced SQL

• Basic Linux

• Core Java & JVM

• Backend Development Experience

• Basic Computer Science Level

Page 8: Java BigData Full Stack Development (version 2.0)

8Java Big Data Full Stack Development

REAL WORLD

Page 9: Java BigData Full Stack Development (version 2.0)

9Java Big Data Full Stack Development

Let’s just use Javascript in frontend ONLY

Page 10: Java BigData Full Stack Development (version 2.0)

10Java Big Data Full Stack Development

In frontend

ONLY?

Page 11: Java BigData Full Stack Development (version 2.0)

11Java Big Data Full Stack Development

Cruel world

Page 12: Java BigData Full Stack Development (version 2.0)

12Java Big Data Full Stack Development

Do you know ML JS library?

Page 13: Java BigData Full Stack Development (version 2.0)

13Java Big Data Full Stack Development

Wild animals everywhere

Page 14: Java BigData Full Stack Development (version 2.0)

14Java Big Data Full Stack Development

And what I tell you

Page 15: Java BigData Full Stack Development (version 2.0)

15Java Big Data Full Stack Development

And what I tell you

Page 16: Java BigData Full Stack Development (version 2.0)

16Java Big Data Full Stack Development

It’s Time for Java Superhero, yeah!

Page 17: Java BigData Full Stack Development (version 2.0)

17Java Big Data Full Stack Development

Before patterns discovering you should ..

• Select small pieces

• Define default values for missed

data

• Remove strange signals from data

• Merge some tables in one if

required

Page 18: Java BigData Full Stack Development (version 2.0)

18Java Big Data Full Stack Development

How it really works

• Share your date with us

• Our magic manipulations

• Building an answering machine

• PROFIT!!!

Page 19: Java BigData Full Stack Development (version 2.0)

19Java Big Data Full Stack Development

How to start?

Page 20: Java BigData Full Stack Development (version 2.0)

20Java Big Data Full Stack Development

Page 21: Java BigData Full Stack Development (version 2.0)

21Java Big Data Full Stack Development

WHAT IS BIG DATA?

Page 22: Java BigData Full Stack Development (version 2.0)

22Java Big Data Full Stack Development

Joke about Excel

Page 23: Java BigData Full Stack Development (version 2.0)

23Java Big Data Full Stack Development

5V

Page 24: Java BigData Full Stack Development (version 2.0)

24Java Big Data Full Stack Development

Every 60 seconds…

Page 25: Java BigData Full Stack Development (version 2.0)

25Java Big Data Full Stack Development

From Mobile Devices

Page 26: Java BigData Full Stack Development (version 2.0)

26Java Big Data Full Stack Development

From Industry

Page 27: Java BigData Full Stack Development (version 2.0)

27Java Big Data Full Stack Development

We started to keep and handle stupid new things!

Page 28: Java BigData Full Stack Development (version 2.0)

28Java Big Data Full Stack Development

10^6 rows

in MySQL

Page 29: Java BigData Full Stack Development (version 2.0)

29Java Big Data Full Stack Development

GB->TB->PB->?

Page 30: Java BigData Full Stack Development (version 2.0)

30Java Big Data Full Stack Development

Is BigData about PBs?

Page 31: Java BigData Full Stack Development (version 2.0)

31Java Big Data Full Stack Development

Is BigData about PBs?

Page 32: Java BigData Full Stack Development (version 2.0)

32Java Big Data Full Stack Development

It’s hard to …

• .. store

• .. handle

• .. search in

• .. visualize

• .. send in network

Page 33: Java BigData Full Stack Development (version 2.0)

33Java Big Data Full Stack Development

Likes in Classmates: how to count?

Page 34: Java BigData Full Stack Development (version 2.0)

34Java Big Data Full Stack Development

Crazy Zoo

2012

Page 35: Java BigData Full Stack Development (version 2.0)

35Java Big Data Full Stack Development

Crazy Zoo

2016

Page 36: Java BigData Full Stack Development (version 2.0)

36Java Big Data Full Stack Development

What will be

lighted this

training

Page 37: Java BigData Full Stack Development (version 2.0)

37Java Big Data Full Stack Development

NOSQL

Page 38: Java BigData Full Stack Development (version 2.0)

38Java Big Data Full Stack Development

What’s the problem with RBDMS’s

• Caching

• Master/Slave

• Cluster

• Table Partitioning

• Sharding

Page 39: Java BigData Full Stack Development (version 2.0)

39Java Big Data Full Stack Development

Family

Page 40: Java BigData Full Stack Development (version 2.0)

40Java Big Data Full Stack Development

Database

party

Page 41: Java BigData Full Stack Development (version 2.0)

41Java Big Data Full Stack Development

Spring Data

Page 42: Java BigData Full Stack Development (version 2.0)

42Java Big Data Full Stack Development

How to start?

Page 43: Java BigData Full Stack Development (version 2.0)

43Java Big Data Full Stack Development

Java MongoDB Driver + Robomongo

Page 44: Java BigData Full Stack Development (version 2.0)

44Java Big Data Full Stack Development

BIG DATA TOOL MASTER

VS

DATA SCIENTIST

Page 45: Java BigData Full Stack Development (version 2.0)

45Java Big Data Full Stack Development

TRAIN

MODEL

Page 46: Java BigData Full Stack Development (version 2.0)

46Java Big Data Full Stack Development

Datasets

• Facebook users, tweets

• Trade transactions

• Government

• Medicine (genomic data)

• Telecommunications

Page 47: Java BigData Full Stack Development (version 2.0)

47Java Big Data Full Stack Development

Data Sources

• Relational Databases

• Data warehouses (Historical data)

• Files in CSV or in binary format

• Internet or electronic mails

• Scientific, research (R, Octave,

Matlab)

Page 48: Java BigData Full Stack Development (version 2.0)

48Java Big Data Full Stack Development

Hey, man, predict something!

Page 49: Java BigData Full Stack Development (version 2.0)

49Java Big Data Full Stack Development

Man or sofa?

Page 50: Java BigData Full Stack Development (version 2.0)

50Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

Page 51: Java BigData Full Stack Development (version 2.0)

51Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

• How do we detect phone card fraud?

Page 52: Java BigData Full Stack Development (version 2.0)

52Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

• How do we detect phone card fraud?

• What is the revenue prediction for next year?

Page 53: Java BigData Full Stack Development (version 2.0)

53Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

• How do we detect phone card fraud?

• What is the revenue prediction for next year?

• Can you recommend music for users?

Page 54: Java BigData Full Stack Development (version 2.0)

54Java Big Data Full Stack Development

Green circle is blue square or red

triangle? Let’s ask its neighbors!

kNN (k-nearest neighbor)

Page 55: Java BigData Full Stack Development (version 2.0)

55Java Big Data Full Stack Development

Collaborative Filtering

Page 56: Java BigData Full Stack Development (version 2.0)

56Java Big Data Full Stack Development

Machine Learning vs Traditional Programming

Page 57: Java BigData Full Stack Development (version 2.0)

57Java Big Data Full Stack Development

Data

Science

Page 58: Java BigData Full Stack Development (version 2.0)

58Java Big Data Full Stack Development

Can a Java programmer to be a Data Scientist?

Page 59: Java BigData Full Stack Development (version 2.0)

59Java Big Data Full Stack Development

Sexy Data Scientist

Page 60: Java BigData Full Stack Development (version 2.0)

60Java Big Data Full Stack Development

Real Data Scientist

Page 61: Java BigData Full Stack Development (version 2.0)

61Java Big Data Full Stack Development

How to start?

Page 62: Java BigData Full Stack Development (version 2.0)

62Java Big Data Full Stack Development

Weka

Page 63: Java BigData Full Stack Development (version 2.0)

63Java Big Data Full Stack Development

HADOOP

Page 64: Java BigData Full Stack Development (version 2.0)

64Java Big Data Full Stack Development

Hadoop and Data Knights

Page 65: Java BigData Full Stack Development (version 2.0)

65Java Big Data Full Stack Development

Hadoop

Page 66: Java BigData Full Stack Development (version 2.0)

66Java Big Data Full Stack Development

MapReduce in different languages

Page 67: Java BigData Full Stack Development (version 2.0)

67Java Big Data Full Stack Development

MapReduce for WordCount

Page 68: Java BigData Full Stack Development (version 2.0)

68Java Big Data Full Stack Development

Hadoop

Jobs

Page 69: Java BigData Full Stack Development (version 2.0)

69Java Big Data Full Stack Development

Hadoop frameworks

• Universal (MapReduce, Tez, RDD in Spark)

• Abstract (Pig, Pipeline Spark)

• SQL - like (Hive, Impala, Spark SQL)

• Processing graph (Giraph, GraphX)

• Machine Learning (Mahout, MLib)

• Stream processing (Spark Streaming, Storm)

Page 70: Java BigData Full Stack Development (version 2.0)

70Java Big Data Full Stack Development

SPARK

Page 71: Java BigData Full Stack Development (version 2.0)

71Java Big Data Full Stack Development

SPARK: the bloody son of MR

• MapReduce in memory

• Up to 50x faster than Hadoop

• RDD is a basic building block

(immutable distributed

collections of objects)

• Pipeline API (no needs in PIG)

Page 72: Java BigData Full Stack Development (version 2.0)

72Java Big Data Full Stack Development

Spark

Family

Page 73: Java BigData Full Stack Development (version 2.0)

73Java Big Data Full Stack Development

MLlib supports

• Classification and regression

• Collaborative filtering

• Clustering

• Dimensionality reduction

• Optimization

Page 74: Java BigData Full Stack Development (version 2.0)

74Java Big Data Full Stack Development

Code sample MLlib (K-Means)

// Cluster the data into two classes using KMeans

int numClusters = 2;

int numIterations = 20;

KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

// Evaluate clustering by computing Within Set Sum of Squared Errors

double WSSSE = clusters.computeCost(parsedData.rdd());

System.out.println("Within Set Sum of Squared Errors = " + WSSSE);

// Save and load model

clusters.save(sc.sc(), "myModelPath");

KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");

Page 75: Java BigData Full Stack Development (version 2.0)

75Java Big Data Full Stack Development

MLlib

• .. extends scikit-learn (Python lib) and Mahout

• .. runs fully on Spark and supports Spark’s Pipeline API

• .. dataset is represented by Spark SQL’s SchemaRDD

• .. supports Hive like external data source

• .. is well for large datasets and parallelized algorithms

Page 76: Java BigData Full Stack Development (version 2.0)

76Java Big Data Full Stack Development

It solves all problems!

Page 77: Java BigData Full Stack Development (version 2.0)

77Java Big Data Full Stack Development

How to start?

Page 78: Java BigData Full Stack Development (version 2.0)

78Java Big Data Full Stack Development

HDP Zoo

Page 79: Java BigData Full Stack Development (version 2.0)

79Java Big Data Full Stack Development

Ok, Google!

Page 80: Java BigData Full Stack Development (version 2.0)

80Java Big Data Full Stack Development

AWS Amazon

Page 81: Java BigData Full Stack Development (version 2.0)

81Java Big Data Full Stack Development

Infrastructure issues are waiting YOU!

Page 82: Java BigData Full Stack Development (version 2.0)

82Java Big Data Full Stack Development

DEEP LEARNING

Page 83: Java BigData Full Stack Development (version 2.0)

83Java Big Data Full Stack Development

Deep Learning help us build NEW FUTURE

Page 84: Java BigData Full Stack Development (version 2.0)

84Java Big Data Full Stack Development

Deep Learning help us build NEW FUTURE

Page 85: Java BigData Full Stack Development (version 2.0)

85Java Big Data Full Stack Development

HOW TO LEARN?

Page 86: Java BigData Full Stack Development (version 2.0)

86Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

DIFFERENT WAYS

Page 87: Java BigData Full Stack Development (version 2.0)

87Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

DIFFERENT WAYS

Page 88: Java BigData Full Stack Development (version 2.0)

88Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

3. MOOC

DIFFERENT WAYS

Page 89: Java BigData Full Stack Development (version 2.0)

89Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

3. MOOC

4. Take a training course

DIFFERENT WAYS

Page 90: Java BigData Full Stack Development (version 2.0)

90Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

3. MOOC

4. Take a training course

5. Visit conferences

DIFFERENT WAYS

Page 91: Java BigData Full Stack Development (version 2.0)

91Java Big Data Full Stack Development

Recommended Books

Page 92: Java BigData Full Stack Development (version 2.0)

92Java Big Data Full Stack Development

Contacts

E-mail : [email protected]

Twitter : @zaleslaw @BigDataRussia

vk.com/big_data_russia Big Data Russia

vk.com/java_jvm Java & JVM langs