big data : bits of history, words of advice

41
Big Data : Bits of History, Words of Advice Venu Vasudevan GLSEC Big Data Meetup

Upload: venu-vasudevan

Post on 22-Jan-2018

108 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Big Data : Bits of History, Words of Advice

Big Data : Bits of History, Words of Advice

Venu Vasudevan

GLSEC Big Data Meetup

Page 2: Big Data : Bits of History, Words of Advice

Big Data : Bits of History, Words of Advice

Page 3: Big Data : Bits of History, Words of Advice

Big Data PastBig

Fast

intelligent mediaIoT

satellites

Page 4: Big Data : Bits of History, Words of Advice

Big Data : Behavioral

Big Data

- The ‘V’ view of Big Data challenges - Number of V’s up for debate

Page 5: Big Data : Bits of History, Words of Advice

Big Data : Architectural

untidy data

firehose

clean analytics

fast & good

slower & much better

Lambda architecture

Lake architecture

Stream architecture

Page 6: Big Data : Bits of History, Words of Advice

Technical

Page 7: Big Data : Bits of History, Words of Advice

Technical

Page 8: Big Data : Bits of History, Words of Advice

This Talk

Behavioral View

Technology Solution

Stack

‘Middleware’ (benefit of hindsight)

some more some

governance culture (gap)

data economics

owne

rshi

p

foo

d fi

ght

s

data eco

nom

ics

Page 9: Big Data : Bits of History, Words of Advice

3 data pointsBig

Fast

intelligent mediaIoT

satellites

Page 10: Big Data : Bits of History, Words of Advice

Iridium

• mobile routers (10K mph), fixed people

• no repeated patterns

• satellites N-S movement

• earth E-W movement

• regular topology, irregular exceptions

• solar flares

• military satellite presence

Page 11: Big Data : Bits of History, Words of Advice

Fast Data Problem

• cellular frequency allocation (graph coloring problem)

• frequent fast recalculations (fast routers + semi-fast earth)

• transmit-no transmit (solar flares, military satellite presence)

• moving ‘seam’

seam

irreg

ular

ities

Page 12: Big Data : Bits of History, Words of Advice

Fast Data Problem

• cellular frequency allocation (graph coloring problem)

• frequent fast recalculations (fast routers + semi-fast earth)

• transmit-no transmit (solar flares, military satellite presence)

• moving ‘seam’

• + ‘France’

seam

irreg

ular

ities broadcast

= +$$$

broadcast = -$$$ (lawsuit)

Page 13: Big Data : Bits of History, Words of Advice

Fast Data Problem

• quest for (OO)DB technology to address ‘France’ as make-or-break use case

• query expressive power

• complex constraint satisfaction

• query handling throughput

• 3-4 month benchmarking effort

seam

broadcast = +$$$

broadcast = -$$$ (lawsuit)

Page 14: Big Data : Bits of History, Words of Advice

Fast Data Problem

• quest for (OO)DB technology to address ‘France’

• query expressive power

• query handling throughput

• 3-4 month benchmarking effort

• France solved ‘out-of-band’ (legally)

seam

broadcast = +$$$

broadcast = -$$$ (lawsuit)don’t overfit your architecture to

an extreme requirement

unless it’s from an extreme (paying) user

Page 15: Big Data : Bits of History, Words of Advice

Big Data Problem

• systems management

• manage 66 ‘nodes’

• nodes moving at 10K mph

• ‘seam’ moving of 20K mph

• sounds harder than trivial, but not too hard

Page 16: Big Data : Bits of History, Words of Advice

‘Pre’ Lambda Solution

• Dumb edge | smart core approach

• 15K events/sec/satellite

• 1M events/sec

• Fast & Approximate - FMEA: ’compiled’ lookup table for failure modes

• Slow & Precise - Model-based reasoning on satellite models

untidy satellite firehose

(1M events/sec)

actionable insights

‘Pre’ Lambda architecture

Model-Based Reasoning

FMEA

Page 17: Big Data : Bits of History, Words of Advice

‘Pre’ Lambda Solution

• Dumb edge | smart core approach

• 15K events/sec/satellite

• Fast & Approximate - FMEA: ’compiled’ lookup table for failure modes

• Slow & Precise - Model-based reasoning on satellite models

• Simple, straightforward & wrong.

untidy satellite firehose

(1M events/sec)

actionable insights

‘Pre’ Lambda architecture

Model-Based Reasoning

real-time expert system

FMEA

Yet, an architecture that is

‘rinsed and repeated’

over the years

Page 18: Big Data : Bits of History, Words of Advice

why does dumb edge smart cloud endure?

• edges are expensive ($2B)

• when edges go wrong (break/blow up /collide) , they make headlines

$

$$$$$

Page 19: Big Data : Bits of History, Words of Advice

why dumb edge smart cloud

• edges are expensive ($2B)

• when edges go wrong (break/blow up /collide) and make headlines

• nobody messes with an ‘edge’ once it works

• clouds don’t make for good news headlines

$ T-0

$$$$$ T-30 yrs

Page 20: Big Data : Bits of History, Words of Advice

why dumb edge smart cloud

• edges are expensive ($2B)

• when edges go wrong (break/blow up /collide) and make news headlines

• nobody messes with an ‘edge’ once it works

• thus, implementing an end-to-end architecture causes culture clashes

over my dead body

iterate & refine

Page 21: Big Data : Bits of History, Words of Advice

an almost repeat (Industrial IoT)

• edges are messy & domain specific

• creating them means dealing with culture clashes

• but .. an ounce of edge is worth a pound of cloud

$$$$$ T-30 yrs

$ T-0

Page 22: Big Data : Bits of History, Words of Advice

Things to consider• Problem statement. What’s your ‘France’?

• colorful sub-problem. strategy overfit.

• Architecture. small fixes to IT/OT gap can go a long way to a simpler problem

• Technology Choices. best practices & the risk of ‘rewardless risk’

• right - make average programmers productive with new tech

• frequent - turn great programmers into average

Page 23: Big Data : Bits of History, Words of Advice

Big Data to Deep Metadata

streaming video(TV) ~ 1 petabyte/day

second

minute

hour

day/week

epochal

detect & replace ads

Create Playlists by Player,

Play, Sentiment

Identify minor characters with rabid fan following

rejuvenate old content

derive new

content

‘chapterize’ by Player,

Play, Sentiment

Page 24: Big Data : Bits of History, Words of Advice

Platform Triage Challenge new Product, new market

• one core technology, many markets

• platform triaging challenge. what drives the platform?

• highest (but uncertain) $ potential?

• ‘extreme’ requirement?

• sparsest competition?

• use case outlier is your biggest customer

deep metadata

technology

SaaS data

platform

Advertising

Search

Video concept

maps

Page 25: Big Data : Bits of History, Words of Advice

ad replacement use case• speed

• few days (on-demand content)

• few seconds (real-time rebroadcast with new ads)

• precision

• low - best effort, for low cost international content for niche audiences

• high - frame level for expensive content. e.g. Sports/$10M/episode programming

• errors

• 90% accuracy - ok for long tail content

• ‘five nines’ for premium content

precision accu

racy

spee

d

ad replacement opportunity space

largest customer

Page 26: Big Data : Bits of History, Words of Advice

occam’s razor works (again)

• build to simplicity

• loose coupling between data engg & equipment engg

• modularize complexity

• ‘differentiate your product’ changes

• ‘necessary evil’ changes

data-only approach

+1st party integration (dynamically configure

ad splicers)

3rd party knobs (dynamically refresh CDN)

Page 27: Big Data : Bits of History, Words of Advice

Architecture

Page 28: Big Data : Bits of History, Words of Advice

but, what if ..

• Data is untidy

• Interpretation is subjective/cultural

• Automation is aspirational but quixotic

Page 29: Big Data : Bits of History, Words of Advice

human-powered analytics

• some analytics tasks are too ‘slippery’ for machines

• data hard to characterize

• uneven video quality of ‘old’ archives

• untidy

• insights are subjective

Page 30: Big Data : Bits of History, Words of Advice

human-powered analytics

• some analytics tasks are too ‘slippery’ for machines

• need for human augmentation

• humans generate ‘training’ sets to bootstrap m/c learning

• humans completely take over some tasks

Page 31: Big Data : Bits of History, Words of Advice

machines vs humans

• crowdsourcing & human-powered computing

• has been the ‘next big thing’ for a while

• checkered history:

• uneven output

• fraud

• uneven throughput

Machines Humans

fast slow

brittle malleable

objective subjective

clear nuanced

Page 32: Big Data : Bits of History, Words of Advice

machines vs humans

• much of that has changed

• Amazon Mech Turk

• 500K active users

• the ‘human machine’ can return substantial jobs in under 30 mins

• quantifiable as a machine for many media tasks - latency, quality, error rate, thruput

Page 33: Big Data : Bits of History, Words of Advice

Hybrid Architecture

Page 34: Big Data : Bits of History, Words of Advice

Things to consider• Beware ‘France’ in other forms:

• customer with loudest voice & ‘holy grail’ hairball

• Dealing with data quality & variability

• crowdsourcing has come a long way as credible ‘engine’

• If big data the answer, what is the question? (have strong opinion held weakly)

• decision rationalization

• process automation

• human ‘power tool’ (e.g. compelling visualization) vs imperfect automation

Page 35: Big Data : Bits of History, Words of Advice

startup data jiu-jitsu

• How to create a data-driven strategy before the data shows up?

• rationalize future SaaS revenue models

• justify product decisions in a data-driven manner

need data for product

need product for data

Page 36: Big Data : Bits of History, Words of Advice

startup data jiu-jitsu

• How to create a data-driven strategy before the data shows up?

• how ‘intelligent’ can lighting control be with 50-100K users?

• how do people use dimmers (continuous or quantized) — UX implications

Page 37: Big Data : Bits of History, Words of Advice

data set dilemma

• standard sources (e.g. Kaggle & UCI) insufficient

• few ‘physical world’ datasets

• expensive to collect

• may be specialized (vendor-specific)

• dataset proxies for IoT actuation may not work

• energy utilization != switch usage

Page 38: Big Data : Bits of History, Words of Advice

big data, small start

• physical world data likely to be smaller (1-10 homes, few months)

• setup costs limit size of public datasets

• e.g. UMass Smart* light switch dataset

Page 39: Big Data : Bits of History, Words of Advice

big data, small start

• consider data ‘augmentation’

• standard practice in AI (deep learning) - horizontally flipping, random crops …

• under-used in data space

• may need some thought on perturbation models for your domain

real

synthesized

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Page 40: Big Data : Bits of History, Words of Advice

In short ..

• big data success - equal parts tech & non-tech

• solving right problem, not just problem right

• revisit problem, and what success means