streaming data mining

23
Streaming Data Mining 05/30/2022 Streaming Data Mining 1

Upload: ankit-solanki

Post on 27-Jun-2015

232 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Streaming data mining

Streaming Data Mining

04/13/2023 Streaming Data Mining 1

Page 2: Streaming data mining

Once upon a time.

• Life was easy– Eg. Org. has only transaction data, analyst were happy analyzing them.– Competition was less.– Customer had lesser options to review product.

• Wait! Web- 2.(oh)0– Customer who consumed data started generating data - tweets, blogs,

facebook comments, reviews………..– Another burst came when the Mobile era came in.

• Apps recording customers location• Actions on apps.• Pattern of app use.

04/13/2023 Streaming Data Mining 2

Page 3: Streaming data mining

Server

DB

DB

DB

DB

DB

DB

Page 4: Streaming data mining

04/13/2023 Streaming Data Mining

Its All About the Numbers!

4

58M/Day 500Tb/Day 2.1M GB/Hr 4B view/day

Page 5: Streaming data mining

So, Its GOOD to have data, Right?

Page 6: Streaming data mining

Digging Into the Data

• Analyze to understand customer.• Identify Patterns

• Machine Learning• Statistical Model Building• Natural Language Processing• …….

04/13/2023 Streaming Data Mining 7

Page 7: Streaming data mining

Usual Pipeline in Data Mining.

04/13/2023 Streaming Data Mining 8

Data of Entire Population

Sample Population

Cleaning and Preprocessing

Training and testing Models

Production Server

Page 8: Streaming data mining
Page 9: Streaming data mining

Why?

Page 10: Streaming data mining
Page 11: Streaming data mining

Huge Training Data Set - Volume

• Organizations these days have huge datasets that can be used to train their models.

• But Main Memory Restrictions.– Machine Learning Algorithm.– Batch Processing.

• Y no Sampling??

04/13/2023 Streaming Data Mining 12

Page 12: Streaming data mining

Streams - Velocity

• Ubiquitous Computing, Mobile Devices, Social Media.

• Potentially of Infinite length

• Usual Strategy – Batch Mode.

04/13/2023 Streaming Data Mining 13

Page 13: Streaming data mining

Contextual Trends.

• Trending topic on social media.• Weather• Location• Demographics• Market Dynamics

• Jargon Alert : Concept Drift

04/13/2023 Streaming Data Mining 14

Page 14: Streaming data mining

What we want today?

Consume Real time data and extract insights.

Wait.! Can I say Analyze Streams?

Page 15: Streaming data mining

Streaming Data Mining!

Page 16: Streaming data mining

Philosophy

• Continuous Data Record aka Data Streams• Bounded Storage• Single Pass• Real Time• Concept Drift

04/13/2023 Streaming Data Mining 17

Page 17: Streaming data mining

So What… We have Hadoop…

• The big Elephant doesn’t fit in here.• Hadoop – Batch Processing• We need Storm

– Storm is fast: a benchmark clocked it at over a million tuples processed per second per node.

04/13/2023 Streaming Data Mining 18

Page 18: Streaming data mining

Algorithms.

• The conventional Machine learning algorithm were designed for batch processing.– The Algorithm needs to load entire dataset into the memory.– Computes the necessary statistics, example entropy\information gain

in decision trees.

• With Streams?– Streams are of infinite length– Storing everything, if you can, will be an issue on the memory of the

system $$$$

04/13/2023 Streaming Data Mining 19

Page 19: Streaming data mining

Streaming Machine Learning

• When?– High Data volume– Rate at which data comes is high.– Unbound, will always arrive in the system and we wont be able to fit it

in our memory

• Requirements to be adhered.– Each input element to be processed atmost once.– Space– Time– Start predicting from t0

04/13/2023 Streaming Data Mining 20

Page 20: Streaming data mining

General Flow of Streaming Algorithms

04/13/2023 Streaming Data Mining 21

Page 21: Streaming data mining

Spam Detection

• Models trained in the past by traditional data mining strategy will become obsolete as spammers will find a way out.

• Solution : VFDT - Hoeffding Tree Steam Classification• Train the model in streaming setup.• When new spam pattern detected, people mark them as

spams.• Use them to retrain the model in real time.

Concept Drift! Win!

04/13/2023 Streaming Data Mining 22

Page 22: Streaming data mining

Answering Todays BigData Needs

• Streaming Data Mining– Storm– MOA– SAMOA– KAFKA– ……

04/13/2023 Streaming Data Mining 23

Page 23: Streaming data mining

Thank You!Ankit Solanki

Neil Shah