big data from the trenches

34
Big Data from the trenches Advice from the FSI industry By: Azrul MADISA

Upload: azrul-madisa

Post on 17-Jan-2017

561 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Big data from the trenches

Big Data from the

trenchesAdvice from the FSI industry

By: Azrul MADISA

Page 2: Big data from the trenches

About me…

• VP – Enterprise Data Architect @ Maybank

• Take care of Maybank’s data world wide

• Nuts about data, analytics and software dev.

• Very hands on, love to read

• Teach aikido to kids

Page 3: Big data from the trenches

Big Data landscape today

https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck

Page 4: Big data from the trenches

Too many big data tech?

Wait … what?

I have to know ALL

that?

Page 5: Big data from the trenches

Let’s change the game a bit…

Use c

ase

Page 6: Big data from the trenches

The data journey

Page 7: Big data from the trenches

The data journey

Acquisition Dumping

Tidy data

Real Time

Analytics

Analytical

model

Sandbox

Page 8: Big data from the trenches

Example: credit scoring and loan origination

Acquisition Dumping

Tidy data

Real Time

Analytics

Analytical

model

ScreensData staging

area

Data

warehouse

Score card

builder

Decisioning

Sandbox

Data

scientist

Page 9: Big data from the trenches

Acquisition with quality

Page 10: Big data from the trenches

Acquisition with quality

• Manage data quality up front

• Human-factor data quality

Data EntryData

StagingApplication

Over-night

Page 11: Big data from the trenches

Acquisition with quality

• Manage data quality up front

• Human-factor data quality

Data EntryData Staging

Application

Over-night

Audit trail

Weekly

Page 12: Big data from the trenches

Acquisition with quality

• Non-human error

• Use PEWMA algorithm

https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/

Page 13: Big data from the trenches

Data sandbox

Page 14: Big data from the trenches

Creating a sandbox on the cloud

• Why cloud:

– Scale data discovery as needed

– Merging private with public data

– Less bureaucratic

• But…

– Customer data on the cloud is a no no

Page 15: Big data from the trenches

Creating a sandbox on the cloud

• Masking

– Non-numerical data => No sweat!

– E.g.

• En. Abdul Jalil => 837x2unxy237e832!@

• 720324-03-8891 => 472376-84-8732

• Masking numerical data?

Page 16: Big data from the trenches

Creating a sandbox on the cloud

• Masking

– Non-numerical data => No sweat!

– E.g.

• En. Abdul Jalil => 837x2unxy237e832!@

• 720324-03-8891 => 472376-84-8732

• Masking numerical data?

What if there is a way to mask numerical data

while keeping the statistical properties intact

Easier for the

regulators to

digest

Page 17: Big data from the trenches

Creating a sandbox on the cloud

• Random projection

• Usually used for dimension reduction

Original

data

(M x N)

Random

matrix

(N x N)X =

Masked

data

(M x N)

Page 18: Big data from the trenches

Fast real-time vs. batch

analytics

Page 19: Big data from the trenches

Fast real-time analytics

• ‘Batch’ analytics:

UserApplication

Over-night

batch

Data

warehouse

Predictive

analyticsDescriptive

analytics

Analytical

model

Monthly

Page 20: Big data from the trenches

Fast real-time analytics

• ‘Batch’ analytics:

UserApplication

Over-night

batch

Data

warehouse

Predictive

analyticsDescriptive

analytics

Real time decisioning

Monthly

Page 21: Big data from the trenches

Fast real-time analytics

• So what is real time analytics:

UserApplication

Real time decisioning analytics

Analytical

model

updated in

real time

Page 22: Big data from the trenches

Fast real-time analytics

• So what is real time analytics:

UserApplication

Real time analytics and decisioning

Analytical

model

updated in

real time

Predictive

analytics

Batch

analytical

model

Real-time

analytical model

Page 23: Big data from the trenches

Fast real-time analytics

• Q- learning

• E.g. SMS advertisement campaign

Real-time

Analytical

Marketting

System

Location, user info

SMS campaign

Page 24: Big data from the trenches

Fast real-time analytics

• Q- learning

• E.g. SMS advertisement campaign

Real-time

Analytical

Marketting

System

Change behaviour

(E.g. buy

something else)

Learn new

behaviour

Page 25: Big data from the trenches

Fast real-time analytics : Real-time analytics in

action

Over time

Interest

in

concerts

Interest

in moviesInterest

in sports

Page 26: Big data from the trenches

Fast real-time analytics: Real time analytics in

action

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.51

174

347

520

693

866

103

91

21

21

38

51

55

81

73

11

90

42

07

72

25

02

42

32

59

62

76

92

94

23

11

53

28

83

46

13

63

43

80

73

98

04

15

34

32

64

49

94

67

24

84

55

01

85

19

15

36

45

53

75

71

05

88

36

05

66

22

96

40

26

57

56

74

86

92

17

09

47

26

77

44

07

61

37

78

67

95

98

13

28

30

58

47

88

65

18

82

48

99

79

17

09

34

39

51

69

68

99

86

21

0…

10…

10…

10…

10…

10…

INT

ER

ES

T

MESSAGES

SPORTS CONCERTS MOVIES

Interest

in

concerts

Interest

in movies

Interest

in sports

Page 27: Big data from the trenches

Fast real-time analytics: Real time analytics in

action

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.51

174

347

520

693

866

103

91

21

21

38

51

55

81

73

11

90

42

07

72

25

02

42

32

59

62

76

92

94

23

11

53

28

83

46

13

63

43

80

73

98

04

15

34

32

64

49

94

67

24

84

55

01

85

19

15

36

45

53

75

71

05

88

36

05

66

22

96

40

26

57

56

74

86

92

17

09

47

26

77

44

07

61

37

78

67

95

98

13

28

30

58

47

88

65

18

82

48

99

79

17

09

34

39

51

69

68

99

86

21

0…

10…

10…

10…

10…

10…

INT

ER

ES

T

MESSAGES

SPORTS CONCERTS MOVIES

Interest

in

concerts

Interest

in movies

Interest

in sports

Real time

analytical

tracking and

learning of

people’s

interest

Page 28: Big data from the trenches

Putting it all together

under one architecture

Page 29: Big data from the trenches

Data architecture

• Some difficult questions around big data and analytics

– How can I invest in big data while managing cost?

– How can I “experiment” with big data while mitigating risks?

– How can I create a 360 view of data without boiling the ocean?

– How can I use oversea data without violation regulations?

Page 30: Big data from the trenches

Tiered data architecture

Data warehouse

- Staging

- SQL access

Big Data Infra (E.g. Hadoop)

Data sources Batch

Real-timeReal-time store

Master / Reference Data

Social / Cloud Public Data

Oversea Data

Oversea data

sources

Social

network

Batch

Page 31: Big data from the trenches

Tiered data architecture

Data

consumer

Data virtualization

SQL /

Rest /

SOAP /

MQ

Data warehouse

- Staging

- SQL access

Big Data Infra (E.g. Hadoop)

Data sources Batch

Real-time Real-time store

Master / Reference Data

Social / Cloud Public Data

Oversea Data

Oversea data

sources

Social

network

Batch

Official data model

Page 32: Big data from the trenches

Tiered data architecture

• Investment / level of support

Master data

Fast data

Hot data

Cold data

Investment

in CPU /

memory

Investment

in storage

Level 1

Level 1

Level 2

Level 3

Data virtualization Level 1

Level of

support

Page 33: Big data from the trenches

Tiered data architecture• Invest where it matters

– Defer investment if needed

– Refocus investment without disrupting business

• Data virtualization

– Create a façade for data access

– Provide standard interface for data

– Single data model, single access, single quality checkpoint

• Allow ‘experimentation’

– E.g. cut-off point for hot / cold

• Oversea data access

– Data stays where they are, only aggregated data is transferred back

– More palatable to regulators

• 360 view

– Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed

• Single place to check for data quality

Page 34: Big data from the trenches

That’s all folks…

• Linkedin:

– https://www.linkedin.com/in/azrul-madisa-6052419