bigdata stream mining

175
Albert Bifet, André C P L F de Carvalho, João Gama [email protected] BigData Stream Mining 1

Upload: vunhu

Post on 13-Feb-2017

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BigData Stream Mining

Albert Bifet, André C P L F de Carvalho, João Gama [email protected]

BigData Stream Mining

1

Page 2: BigData Stream Mining

n  Learning from Data Streams n  Motivation n  Big Data Stream n  Novelty detection n  Clustering Learning n  Predictive Learning n  Frequent Pattern Mining

n  Counting Algorithms n  Frequent Items

n  Tools and Applications

Main Topics

2

Page 3: BigData Stream Mining

n  Traditional datasets n  Data streams n  Novelty detection n  Algorithms n  Examples n  Challenges

Motivation

3

Page 4: BigData Stream Mining

n  DM techniques were developed and are usually applied to static datasets n  All the data are available n  Machine learning algorithm induces a static decision model n  Small to medium datasets

Data Mining

4

Page 5: BigData Stream Mining

n  Previous practice n  Few companies generate data n  All the rest consume data

n  Current practice n  Everybody produces data n  Everybody consumes data

5

Data production is changing

Page 6: BigData Stream Mining

n  Machines are continuously collecting data n  And sending them to other machines

6

Data explosion

Page 7: BigData Stream Mining

n  Everybody is a movie maker n  And wants a big audience

n  Everybody has a great video taste n  And shares what likes

n  Everybody is being watched n  Everywhere and everytime

7

Data Mining

Page 8: BigData Stream Mining

n  Real-life problems are dynamic n  Data are generated continuously and at

high speed n  Medium to large size n  Data streams n  New techniques and modification of

existing techniques

8

Data Mining

Page 9: BigData Stream Mining

9

http

s://

ww

w.d

omo.

com

Data never sleeps

Page 10: BigData Stream Mining

10

http

s://

ww

w.d

omo.

com

Page 11: BigData Stream Mining

11

Page 12: BigData Stream Mining

12

Page 13: BigData Stream Mining

http://www.flightradar24.com

13

Data never sleeps

Page 14: BigData Stream Mining

Data never sleeps

14

Page 15: BigData Stream Mining

15

Data never sleeps

Page 16: BigData Stream Mining

16

Day 1 - Afternoon

Day 2 - Morning

Data never sleeps

Page 17: BigData Stream Mining

Data never sleeps

17

Page 18: BigData Stream Mining

18

Data never sleeps

Page 19: BigData Stream Mining

19

Data never sleeps

Page 20: BigData Stream Mining

20

Real data from smartphones

Portugal

http://www.publico.pt/ciencia/noticia/telemoveis-fornecem-quase-em-tempo-real-mapas-da-densidade-populacional-portuguesa-1677020

Page 21: BigData Stream Mining

21

Real data from smartphones

Population dynamics between the main holiday period (July and August) and working periods in France. Credit: Catherine Linard http://phys.org/news/2014-10-cellphone-population-density.html#jCp

Page 22: BigData Stream Mining

n  For each taxi in Porto, predict passenger demand n  30 minutes horizon

n  ECML/PKDD data science challenge

22

Real-time taxi demand prediction

Page 23: BigData Stream Mining

23

Data never sleeps

Page 24: BigData Stream Mining

n  Walmart n  Data Center occupies 11.000 m2 n  > 1 million transactions per hour n  Process 40 petabytes per day n  > 2000 times content of all books in the

American Congress library n  Largest world library in space and number of books (> 155 million items)

André Ponce de Leon F de Carvalho 24

Data sources

24

Page 25: BigData Stream Mining

n  Youtube n  More than 1 billion users n  At each day, billions of accesses and

hundreds of millions of watching hours n  Number of hours each person watch per

month grows 50% each year n  300 hours of video uploaded each minute

André Ponce de Leon F de Carvalho 25

Data sources

25

Page 26: BigData Stream Mining

26

Big Data relevance

http://hadoopadmin.com/big-data-hadoop-what-it-is-why-it-matters/sas-volume-variety-verlocity-value/

Page 27: BigData Stream Mining

27

Mismanaged data cost

Page 28: BigData Stream Mining

n  The new characteristics of data: n  Time and space:

n  The objects of analysis exist in time and space n  Often they are able to move

n  Dynamic environment: n  The objects exist in a dynamic and evolving

environment

n  Information processing capability: n  The objects have limited information processing

capabilities

28

A World in movement

Page 29: BigData Stream Mining

n  The new characteristics of data: n  Locality:

n  The objects know only their local spatio-temporal environment

n  Distributed Environment: n  Objects are able to exchange information with

other objects

n  Main Goal: n  Real-Time Analysis:

n  Decision models must evolve in correspondence with the evolving environment

29

A World in movement

Page 30: BigData Stream Mining

n  These characteristics imply: n  Switch from one-shot learning to continuously

learning dynamic models that evolve over time n  In the perspective induced by ubiquitous

environments n  Finite training sets, static models, and stationary

distributions will have to be completely thought anew

n  The algorithms will have to use limited computational resources

n  In terms of processing, memory space and communication time

30

Challenges of Real Time Stream Mining

Page 31: BigData Stream Mining

n  Usual features of and

Task

Classification Regression

Data generation

Asynchronous Synchronous

Labelled observations?

No Yes

DS TS

Sequence dependence?

No Yes

31

Time Series x Data Streams

Page 32: BigData Stream Mining

n  Stock market n  Currency value n  Energy demand and consumption n  Hydro-electrical energy generation n  Weather forecasting

Time series sources

32

Page 33: BigData Stream Mining

n  Data arrive sequentially and, usually n  With high speed n  Dynamically, time-changing environments n  Without control on the arrival order n  Different intervals between arrivals

n  Stream usually have unlimited size n  Data distribution may change over time n  Arriving objects are unlabelled

33

Data streams main features

Page 34: BigData Stream Mining

n  Data must be accessed only once n  Data cannot be stored in memory

n  After processed, object is discarded

n  Decision model must be continuously updated n  Be able to detect novelties

n  Novelty detection

n  Model update must be fast n  Concept drift

34

Data streams solution requirements

Page 35: BigData Stream Mining

n  DS mining can use incremental learning algorithms

n  Model is adapted as new examples become available n  Training never stops n  Alternative: wait and train again with the

expanded training set (retraining) n  Ignore previous model

n  Several incremental learning algorithms

35

Incremental Learning

Page 36: BigData Stream Mining

n  Ability to identify new or unknown situations n  Usually a classification task

n  Novelty, anomaly and outlier detection n  Different definitions in statistics and

machine learning n  Find patterns that are different from the

normal, usual, patterns

36

Novelty Detection

Page 37: BigData Stream Mining

n  Few examples that are unexpected and do not represent a new concept n  Anomaly

n  Exception to what is known n  Cohesive and representative group of examples

representing a new concept n  Can be a novelty

n  Decision model must be adapted to incorporate the anomaly

n  Outlier n  Abnormality or noise

37

Anomalies and Outliers

Page 38: BigData Stream Mining

n  Concept evolution n  New concept (class) emerges in the stream

n  Concept drift n  Change in the profile (data distribution) of

an existing concept (class) n  Recurring concepts

n  Concepts that appeared in the past and disappeared may occur again in the future

38

Novelty Detection modalities

Page 39: BigData Stream Mining

Variable 1 Time n

Variable 1 Time n + m

Varia

ble

2

Varia

ble

2

Adapted from Albert Bifet, Joao Gama, Ricard Gavalda,Georg Krempl, Mykola Pechenizkiy,Bernhard Pfahringer, Myra Spiliopoulou, Indre Zliobaite Advanced topics in Data Stream Mining, ECML PKDD 2012

39

Feature Drift

Page 40: BigData Stream Mining

Variable 1 Variable 1

Varia

ble

2

Varia

ble

2

Time n Time n + m

Offline - first data Online - new data

Variable 1 Variable 1 Va

riabl

e 2

Varia

ble

2

Offline Online - first data

New model

New model

Initial model

Adapted from Albert Bifet, Joao Gama, Ricard Gavalda,Georg Krempl, Mykola Pechenizkiy,Bernhard Pfahringer, Myra Spiliopoulou, Indre Zliobaite Advanced topics in Data Stream Mining, ECML PKDD 2012

40

Concept Drift

Page 41: BigData Stream Mining

Variable 1 Variable 1

Varia

ble

2

Varia

ble

2

Time n Time n + m

Offline - first data Online - new data

Variable 1 Variable 1 Va

riabl

e 2

Varia

ble

2

Offline Online - first data

New model

New model

Initial model

41

Concept Evolution

Adapted from Albert Bifet, Joao Gama, Ricard Gavalda,Georg Krempl, Mykola Pechenizkiy,Bernhard Pfahringer, Myra Spiliopoulou, Indre Zliobaite Advanced topics in Data Stream Mining, ECML PKDD 2012

Page 42: BigData Stream Mining

Time n + m + k Time n + m + k + l

Variable 1 Variable 1 Va

riabl

e 2

Varia

ble

2

Offline Online - first data New model

Variable 1 Variable 1

Offline - first data Online - new data

New model

Time n Time n + m

Varia

ble

2 Initial model

Varia

ble

2

New model

42

Concept Re-occurrence

Adapted from Albert Bifet, Joao Gama, Ricard Gavalda,Georg Krempl, Mykola Pechenizkiy,Bernhard Pfahringer, Myra Spiliopoulou, Indre Zliobaite Advanced topics in Data Stream Mining, ECML PKDD 2012

Page 43: BigData Stream Mining

Time

Mea

n da

ta

Time

Mea

n da

ta

Time

Mea

n da

ta

Abrupt change

Time

Mea

n da

ta

Time

Mea

n da

ta

Time

Mea

n da

ta

Abrupt change Multiple streams

Incremental Gradual

Outlier Reocurring concepts Multiple streams

43

Profiles of changes over time

Adapted from Albert Bifet, Joao Gama, Ricard Gavalda,Georg Krempl, Mykola Pechenizkiy,Bernhard Pfahringer, Myra Spiliopoulou, Indre Zliobaite Advanced topics in Data Stream Mining, ECML PKDD 2012

Page 44: BigData Stream Mining

n  Data may become outdated and no longer useful n  Outdated data should be discharged

n  Several mechanisms n  Choice depends on

n  How we expect the changes to occur in the data distribution

n  Trade-off between intensity (reactivity) and robustness of noise

n  Faster reactivity ⇒ more abrupt ⇒ higher risk of keeping noise

44

Forgetting mechanisms

Page 45: BigData Stream Mining

n  Forgetting can be: n  Abrupt (crisp forgetting)

n  At each time, a given observation is kept or removed from a learning window

n  Gradual (soft forgetting) n  All observations are kept in a full memory n  Observations are weighted, reflecting their age

(relevance) n  Importance of the observation in the training set

should reduce with aging

45

Forgetting mechanisms

Page 46: BigData Stream Mining

n  We do not know all the classes during training n  Only the classes in the training set are

known n  Unknown classes can appear in the test set n  Does not mean that

n  Unknown classes did not exist when training set was obtained

n  Data comes in a stream n  Data distribution changed

46

Open Set Recognition

Page 47: BigData Stream Mining

n  Non profit movements to bring social benefits to people and communities n  Some of them adopted by companies

n  How does it occur? n  Meetings n  Events n  Academic internships n  Social networks

n  Current trend: data stream mining for social good

47

Data Science for Social Good

Page 48: BigData Stream Mining

n  Existing approaches n  Using (open) data to solve civic problems

n  Usually want development of web/mobile apps

n  Using data science techniques to solve social problems

n  Mainly want insights from data scientists

n  Data democratization n  Allow anyone access to data n  First U.S. Chief Data Scientist was named

n  Precision medicine, open data, data-driven decision

48

Data Science for Social Good

Page 49: BigData Stream Mining

n  Different forms of engagement n  Challenges and competitions

n  Predictive data analytics to preventing fires n  http://ibmhadoop.devpost.com/

n  University internship n  Volunteer n  Part time jobs n  Full time jobs

49

http://www.kdnuggets.com/2014/07/data-for-good-data-driven-projects-social-good.html

Data Science for Social Good

Page 50: BigData Stream Mining

n  Bring social benefits to people and communities n  Good health care for all n  Economical development of poor countries n  Good education for all n  Clean and cheap energy n  Citizenship n  Environmental protection n  Better and cleaner transport

50

Data Science for Social Good

Page 51: BigData Stream Mining

Education

n  Monitor student performance n  Support development of better teaching

platforms n  Dynamically adapted to students

performance and needs

n  Evaluate teachers and schools n  Replicate good experiences n  Act before late

51

Data Science for Social Good

Page 52: BigData Stream Mining

Finance

n  Improve the financial health of communities

n  Support small business n  Direct social initiatives n  Fraud detection in the use of public

resources

52

Data Science for Social Good

Page 53: BigData Stream Mining

Environmental

n  Reduce global warming n  Decrease deforestation n  Reduce effects of draughts n  Predict natural disasters n  Detect invasive species n  Increase species diversity

53

Data Science for Social Good

Page 54: BigData Stream Mining

Health care

n  Monitor patient status in intensive units n  Accelerate and make cheaper medical

research n  Look at millions of patient records arriving

in streams

n  Discover epidemics n  Elderly fall prevention

54

Data Science for Social Good

Page 55: BigData Stream Mining

Relavant links

n  Data Science for Social Good Fellowship n  DataLook n  civisanalytics.com n  digitalhumanitarians.com n  www.data4good.co n  http://www.meetup.com/DataKind-UK

55

Data Science for Social Good

Page 56: BigData Stream Mining

Big Data Stream Mining

Albert Bifet, Andre Carvalho, Joao [email protected]

LIAAD-INESC TEC, University of Porto, Portugal

Page 57: BigData Stream Mining

Learning from Data StreamsPowerful IdeasClustering LearningPredictive LearningNovelty DetectionFrequent Pattern Mining

Page 58: BigData Stream Mining

Outline

Learning from Data StreamsPowerful IdeasClustering LearningPredictive LearningNovelty DetectionFrequent Pattern Mining

Page 59: BigData Stream Mining

Data Streams

Data Streams: Continuous flow of data generated at high-speedin Dynamic, Time-changing environments.We need to maintain Decision models in real time.Decision Models must be capable of:

I incorporating new information at the speed data arrives;

I detecting changes and adapting the decision models to themost recent information.

I forgetting outdated information;

Unbounded training sets, dynamic models.

Page 60: BigData Stream Mining

Data Stream Processing

1. One example at at time,used at most once

2. Limited memory

3. Limited time

4. Anytime prediction

Page 61: BigData Stream Mining

Approximate Algorithms

Powerful ideas

I Summarization:Compact and fast summaries to store sufficient statistics

I Approximation:How much information we need to learn, with high probability,an hypothesis H that is within small error of the truehypothesis ?Pr(|H − H| < ε|H|) > 1− δ

I Estimation:Useful for change detection

Page 62: BigData Stream Mining

Adaptive Learning Algorithms

A survey on concept drift adaptation, Gama, Zliobaite, Bifet et al, ACM-CSUR 2014

Page 63: BigData Stream Mining
Page 64: BigData Stream Mining

Clustering Data Streams

I New requirements in stream clustering:I Generate high-quality clusters in one scanI High quality, efficient incremental clusteringI Analysis for different time granularityI Tracking the evolution of clusters

I Clustering: A stream data reduction technique

Page 65: BigData Stream Mining

Cluster Feature Vector

Birch: Balanced Iterative Reducing and Clustering using Hierarchies, by Zhang,

Ramakrishnan, Livny 1996

Cluster Feature Vector: CF = (N, LS ,SS)

I N: Number of data points

I LS :∑N

1 ~xi

I SS :∑N

1 (~xi )2

Constant space irrespective to the number of examples!

Page 66: BigData Stream Mining

Micro clusters

The sufficient statistics of a cluster A are CFA = (N, LS ,SS).

I N, the number of data objects,

I LS, the linear sum of the data objects,

I SS, the sum of squared the data objects.

Properties:

I Centroid = LS/N

I Radius =√

SS/N − (LS/N)2

I Diameter =√

2×N∗SS−2×LS2

N×(N−1)

Page 67: BigData Stream Mining

Micro clusters

Given the sufficient statistics of a cluster A, CFA = (NA, LSA, SSA).Updates are:

I Incremental: a point x is added to the cluster:LSA ← LSA + x ; SSA ← SSA + x2; NA ← NA + 1

I Additive: merging clusters A and B:LSC ← LSA + LSB ; SSC ← SSA + SSB ; NC ← NA + NB

I Subtractive:CF (C1 − C2) = CF (C1)− FV (C2)

Page 68: BigData Stream Mining

CluStream

CluStream: A Framework for Clustering Evolving Data Streams, Aggarwal, J. Han, J.

Wang, P. Yu (VLDB03)

I Divide the clustering process into online and offlinecomponents

I Online: periodically stores summary statistics about the streamdata

I Micro-clustering: better quality than k-meansI Incremental, online processing and maintenance

I Offline: answers various user queries based on the storedsummary statistics

I Tilted time framework: register dynamic changes

I With limited overhead to achieve high efficiency, scalability,quality of results and power of evolution/change detection

Page 69: BigData Stream Mining

CluStream: Online Phase

Inputs:

I Maximum micro-cluster diameter Dmax

For each x in the stream:I Find the nearest micro-cluster Mi

I IF the diameter of (Mi ∪ x) < Dmax

I THEN assign x to that micro-clusterMi ← Mi ∪ x

I ELSE Start a new micro-cluster based on x

Page 70: BigData Stream Mining

Pyramidal Time Frame

I The micro-clusters are stored at snapshots.

I The snapshots follow a pyramidal pattern

I The micro-clusters might be aggregated using tiltedhistograms

Page 71: BigData Stream Mining

Any Time Stream Clustering

The ClusTree: indexing micro-clusters for anytime stream mining, Kranen, Assent,

Baldauf, Seidl, KAIS 2011

Properties of anytime algorithms

I Deliver a model at any timeI Improve the model if more time is available

I Model adaptation whenever an instance arrivesI Model refinement whenever time permits

I an online component to learn micro-clusters

I Any variety of online components can be utilized

I Micro-clusters are subject to exponential aging

Page 72: BigData Stream Mining

Clustering Evaluation

An effective evaluation measure for clustering on evolving data streams; Kremer,

Kranen, Jansen, Seidl, Bifet, Holmes, Pfahringer, KDD 2011

I Clusters may: appear, fade, move, mergeI Missed points (unassigned)I Misplaced points (assigned to different cluster)I Noise

I Cluster Mapping Measure CMMI External (ground truth)I Normalized sum of penalties of these errors

Page 73: BigData Stream Mining

Cluster Evolution

Page 74: BigData Stream Mining

Analysis

I find the cluster structure in the current window,

I find the cluster structure over time ranges with granularityconfined by the specification of window size and boundary,

I put different weights on different windows to mine variouskinds of weighted cluster structures,

I mine the evolution of cluster structures based on the changesof their occurrences in a sequence of windows

Page 75: BigData Stream Mining

Bibliography: Cluster data streams

I Birch: an efficient data clustering method for very large databases Zhang, T.,Ramakrishnan, R., e Livny, M. ACM SIGMOD 1996

I Clustering data streams: Theory and practice. Guha, S., Meyerson, A., Mishra,N., Motwani, R., e O’Callaghan, L. IEEE TKDE 2003

I CluStream: A Framework for Clustering Evolving Data Streams, Aggarwal, J.Han, J. Wang, P. Yu VLDB03

I Monic: modeling and monitoring cluster transitions Spiliopoulou, M., Ntoutsi,I., Theodoridis, Y., e Schult, R. ACM SIGKDD 2006

I The clustree: indexing micro-clusters for anytime stream mining Kranen, P.,Assent, I., Baldauf, C., and Seidl, T. KAIS 2011

I An effective evaluation measure for clustering on evolving data streams; Kremer,Kranen, Jansen, Seidl, Bifet, Holmes, Pfahringer, KDD 2011

I Data stream clustering: A survey Silva, J. A., Faria, E., Barros, R., Hruschka,E., Carvalho, A., Gama, J. ACM Computing Surveys, 2013

Page 76: BigData Stream Mining
Page 77: BigData Stream Mining

Learning Decision Trees

The base Idea

I Which attribute to choose at each splitting node?

I A small sample can often be enough to choose the optimalsplitting attribute

I Collect sufficient statistics from a small set of examples

I Estimate the merit of each attribute

How large should be the sample?

I The wrong idea: Fixed sized, defined apriori without lookingfor the data;

I The right idea: Choose the sample size that allow todifferentiate between the alternatives.

Page 78: BigData Stream Mining

Very Fast Decision Trees

Mining High-Speed Data Streams, P. Domingos, G. Hulten; KDD 2000

The base IdeaA small sample can often be enough to choose the optimalsplitting attribute

I Collect sufficient statistics from a small set of examples

I Estimate the merit of each attributeI Use Hoeffding bound to guarantee that the best attribute is

really the best.I Statistical evidence that it is better than the second best

Page 79: BigData Stream Mining

Very Fast Decision Trees: Main Algorithm

I Input: δ desired probability level.

I Output: T A decision Tree

I Init: T ← Empty Leaf (Root)I While (TRUE)

I Read next exampleI Propagate example through the tree from the root till a leafI Update sufficient statistics at leafI If leaf (#examples) > Nmin

I Evaluate the merit of each attributeI Let A1 the best attribute and A2 the second bestI Let ε =

√R2ln(1/δ)/(2n)

I If G(A1)− G(A2) > εI Install a splitting test based on A1

I Expand the tree with two descendant leaves

Page 80: BigData Stream Mining

VFDT

Page 81: BigData Stream Mining

Concept-adapting VFDT

G. Hulten, L. Spencer, P. Domingos: Mining Time-Changing Data Streams KDD 2001

I Model consistent with sliding window on streamI Keep sufficient statistics also at internal nodes

I Recheck periodically if splits pass Hoeffding testI If test fails, grow alternate subtree and swap-in when accuracy

of alternate is better

I Processing updates O(1), time +O(W) memoryI Increase counters for incoming instance, decrease counters for

instance going out window

Page 82: BigData Stream Mining

Hoeffding Adaptive Tree

A. Bifet, R. Gavalda: Adaptive Parameter-free Learning from Evolving Data Streams

IDA, 2009

I Replace frequency counters by estimatorsI No need for window of examplesI Sufficient statistics kept by estimators separately

I Parameter-free change detector + estimator with theoreticalguarantees for subtree swap (ADWIN)

I Keeps sliding window consistent with the no-change hypothesis

Page 83: BigData Stream Mining

Hoeffding Algorithms

I Classification:Mining high-speed data streams, P. Domingos, G. Hulten, KDD, 2000

I Regression:Learning model trees from evolving data streams; Ikonomovska, Gama,Dzeroski; Data Min. Knowl. Discov. 2011

I Rules:Learning Decision Rules from Data Streams, J. Gama, P. Kosina; IJCAI 2011

I Clustering:Hierarchical Clustering of Time-Series Data Streams. Rodrigues, Gama, IEEETKDE 20(5): 615-627 (2008)

I Multiple Models:Ensembles of Restricted Hoeffding Trees. Bifet, Frank, Holmes, Pfahringer;ACM TIST; 2012J. Duarte, J. Gama, Ensembles of Adaptive Model Rules from High-Speed DataStreams. BigMine 2014.

I . . .

Page 84: BigData Stream Mining

Option Trees

Speeding-Up Hoeffding-Based Regression Trees With Options, Ikonomovska, et al,

ICML 2011

Use option nodes to solve ties

Page 85: BigData Stream Mining

Rules

Problem: very large decision treeshave context that is complex and hardto understand

I Rules: self-contained, modular,easier to interpret, no need tocover the universe

I L keeps sufficient statistics to:make predictionsexpand the ruledetect changes and anomalies

Page 86: BigData Stream Mining

Adaptive Model Rules

Adaptive Model Rules from Data Streams, Almeida, Ferreira, Gama; ECML/PKDD

2013

I Ruleset: ensemble of rules

I Rule prediction: mean, linearmodel

I Ruleset prediction:I Ordered: only first rule covers

instanceI Unordered: weighted avg. of

predictions of rules coveringinstance x

I Weights inversely proportionalto error

Page 87: BigData Stream Mining

AMRules Induction

I Rule creation: default ruleexpansion

I Rule expansion: split onattribute maximizing σreduction

I Hoeffding boundε =

√R2ln(1/δ)/(2n)

I Expand whenσ1st/σ2nd < 1− ε

I Evict rule when P-H signals analarm

I Detect and explain localanomalies

Page 88: BigData Stream Mining

Clustering Time-series

Hierarchical Clustering of Time-Series Data Streams. Rodrigues, Gama, TKDE, 2008

Using Pearson correlation as splitting criteria.

Page 89: BigData Stream Mining

Hoeffding Algorithms: Analysis

The number of examples required to expand a node only dependson the Hoeffding bound: ε decreases with

√N.

I Low variance models:Stable decisions with statistical support.

I Low overfiting:Examples are processed only once.

I No need for pruning;Decisions with statistical support;

I Convergence: Hoeffding Algorithms becomes asymptoticallyclose to that of a batch learner. The expected disagreement isδ/p; where p is the probability that an example fall into a leaf.

Page 90: BigData Stream Mining

Bibliography on Predictive Learning

I Mining High Speed Data Streams, by Domingos, Hulten, SIGKDD 2000.

I Mining time-changing data streams, Hulten, Spencer, Domingos, KDD2001.

I Efficient Decision Tree Construction on Streaming Data, by R. Jin, G.Agrawal, SIGKDD 2003.

I Accurate Decision Trees for Mining High Speed Data Streams, by J.Gama, R. Rocha, P. Medas, SIGKDD 2003.

I Forest trees for on-line data; J. Gama, P. Medas, R. Rocha; SAC 2004.

I Learning decision trees from dynamic data streams, Gama, Medas, andRodrigues; SAC 2005

I Decision trees for mining data streams, Gama, Fernandes, and Rocha,Intelligent Data Analysis, Vol. 10, 2006.

I Handling Time Changing Data with Adaptive Very Fast Decision Rules,Kosina, Gama; ECML-PKDD 2012

I Learning model trees from evolving data streams, Ikonomovska, Gama,Dzeroski: Data Min. Knowl. Discov. 2011

Page 91: BigData Stream Mining
Page 92: BigData Stream Mining

Definition

I Novelty Detection refers to the automatic identification ofunforeseen phenomena embedded in a large amount of normaldata.

I Novelty is a relative concept with regard to our currentknowledge:

I It must be defined in the context of a representation of ourcurrent knowledge.

I Specially useful when novel concepts represent abnormal orunexpected conditions

I Expensive to obtain abnormal examplesI Probably impossible to simulate all possible abnormal

conditions

Page 93: BigData Stream Mining

Context

I In real problems, as time goes by

I The distribution of known concepts may changeI New concepts may appear

I By monitoring the data stream, emerging concepts may bediscovered

I Emerging concepts may represent

I An extension to a known concept (Extension)I A novel concept (Novelty)

I Several interesting applications: Early Detection of Fault in JetEngines, Intrusion Detection in computer networks, Breaking Newsin a flow of text documents (news articles), Burst of Gamma-ray(astronomical data),

Page 94: BigData Stream Mining

One-Class Classification

Page 95: BigData Stream Mining

Autoassociator Networks

Concept-learning in the absence of counter-examples: an

autoassociaton-based approach Nathalie Japcowicz, 1999

I Three layer network

I The nr. of neurons in the outputlayer is equal to the input layer

I Train the network such that ~y isequal to the ~x

I The network is trained toreproduce the input at theoutput layer

Page 96: BigData Stream Mining

Autoassociator Networks

To classify a test example ~x

I Propagate ~x through the network and let ~y be thecorresponding output;

I If∑k

i (xi − yi )2 < Threshold Then the example is considered

from class normal;

I Otherwise, ~x is a counter-example of the normal class.

Page 97: BigData Stream Mining

Novelty detection

I Training set (Offline Phase )I Dtr = (X1, y1), (X2, y2), . . . , (Xm, ym)I Xi : vector of input attributes for the ith example

yi : target attributeI yi ∈ Ytr where Ytr = c1, c2, . . . , cL

I When new data arrive (Online Phase)I Given a sequence of unlabelled examples Xnew

Goal: Classify Xnew in Yall where Yall = c1, c2, . . . , cL, . . . , cKand K > L

Page 98: BigData Stream Mining

Novelty Detection Systems

I ECSMiner: Assume that the class label of new examples isknown

I OLINDDA: unsupervised, but restricted to binary classificationproblems

I MINAS (MultI-class learNing Algorithm for data Streams)I Does not use the class labels of new examplesI Can deal with novelty detection in data streams multi-class

problem

Page 99: BigData Stream Mining

OLINDDA algorithm

OnLIne Novelty and Drift Detection AlgorithmSpinosa, Carvalho, Gama: OLINDDA: a cluster-based approach for

detecting novelty and concept drift in data streams SAC 2007

I Offline and Online phases

I Models: normal, extension and novelty

I Each model is represented by a set of clusters

I Not suitable for multi-class problem

Page 100: BigData Stream Mining

OLLINDA

Page 101: BigData Stream Mining

ECSMiner algorithm

Masud, Gao, Khan, Han, and Thuraisingham, Classification and novel

class detection in concept-drifting data streams under time constraints,

TKDE 2011

Supervised algorithm integrating novel concepts and concept drift

I Ensemble of classifiersI Creates a new model when all examples in a chunk are labeled

I Supposes that all examples in the stream will be labeled (aftera delay of Tl time units)

I An instance will be classified in until Tc time units of its arrival

Page 102: BigData Stream Mining

Minas algorithm

MINAS: Multiclass Learning Algorithm for Novelty Detection in Data

Streams, E. Faria, J. Gama, A. Carvalho, DAMI (to appear)

I Unsupervised algorithm for novelty detection in data streamsmulti-class problemsRepresents each known class by a set of hyperspheres

I Use of offline (training) and online phasesIn each phase learns one or more classes

I Cohesive set of examples is necessary to learn new concepts orextensionsIsolated examples are not considered as novelty

Page 103: BigData Stream Mining

MINAS - Offline phase

I Learns a decision model based on the known concept aboutthe problemKMeans or Clustream

I Run only once

I Each class is represent by a set of clusters (hyperspheres)

Page 104: BigData Stream Mining

MINAS - Online phase

I Receives new examples from the streamI Classify each new example

I In one of the known classes orI As unknown

I Cohesive group of unknown examples are used to detect newclasses or extensions

Page 105: BigData Stream Mining

Minas

Page 106: BigData Stream Mining

Novelty Detection Bibliography

I Masud, Gao, Khan, Han, and Thuraisingham, Classification and

novel class detection in concept-drifting data streams under time

constraints, TKDE 2011

I Spinosa, Carvalho, Gama: OLINDDA: a cluster-based approach for

detecting novelty and concept drift in data streams SAC 2007

I MINAS: Multiclass Learning Algorithm for Novelty Detection in

Data Streams, E. Faria, J. Gama, A. Carvalho, DAMI (to appear)

I P. Angelov and X. Zhou, Evolving fuzzy-rule-based classifiersfrom data streams Trans. Fuz Syst. 2008.

I D. Tax and R. Duin, Growing a multi-class classifier with areject option Pattern Recognit. Lett., 2008.

I F. Denis, R. Gilleron, and F. Letouzey, Learning from positiveand unlabeled examples, Theoretical Comput. Sci., 2005.

I D. Cardoso and F. Franca A Bounded Neural Network forOpen Set Recognition, IJCNN 2015

Page 107: BigData Stream Mining
Page 108: BigData Stream Mining

Introduction

I Frequent pattern mining refers to finding patterns that occurgreater than a pre-specified threshold value.

I Patterns refer to items, itemsets, or sequences.

I Threshold refers to the percentage of the pattern occurrencesto the total number of transactions. It is termed as Support.

Page 109: BigData Stream Mining

Introduction

I Finding frequent patterns is the first step for the discovery ofassociation rules in the form of A→ B.

I Apriori algorithm represents a pioneering work for associationrules discoveryR Agrawal and R Srikant, Fast Algorithms for Mining Association Rules.VLDB 2004.

I An important step towards improving the performance of association rulesdiscovery was FP-GrowthJ. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without CandidateGeneration SIGMOD 2000

Page 110: BigData Stream Mining

Introduction

I Many measurements have been proposed for finding thestrength of the rules.

I The very frequently used measure is support.I The support Supp(X ) of an itemset X is defined as the

proportion of transactions in the data set which contain theitemset.

I Another frequently used measure is confidence.I Confidence refers to the probability that set B exists given that

A already exists in a transaction.I Confidence (A→ B) = Supp (AB) / Supp (A)

Page 111: BigData Stream Mining

Frequent Pattern Mining in Data Streams

The process of frequent pattern mining over data streams differsfrom the conventional one as follows:

I The technique should be linear or sublinear: You Have OnlyOne Look.

I heavy hitters, top-k, frequent items, and itemsets.

Page 112: BigData Stream Mining

Frequent Items (Heavy Hitters) in Data Streams

Manku and Motwani have two master algorithms in this area:

I Sticky Sampling

I Lossy Counting

G. S. Manku and R. Motwani. Approximate Frequency Counts over DataStreams, in Proceedings of the 28th International Conference on Very LargeData Bases (VLDB), Hong Kong, China, August 2002.

Page 113: BigData Stream Mining

Sticky Sampling

Sticky sampling is a probabilistic technique.I The user inputs three parameters

I Minimum Support(s)I Admissible Error (ε)I Probability of failure (δ)

I A simple data structure is maintained that has entries of dataelements and their associated frequencies (e, f).

I The sampling rate decreases gradually with the increase in thenumber of processed data elements: t = 1

ε log(s−1δ−1)

Page 114: BigData Stream Mining

Sticky Sampling

I For each incoming element in a data stream, the datastructure is checked for an entry

I If an entry exists, then increment the frequencyI Otherwise sample the element with the current sampling rate.I If selected, then add a new entry, else the element is ignored.

I With every change in sampling rate, an unbiased coin toss isdone for each entry with decreasing the frequency with everyunsuccessful coin toss

I If the frequency goes down to zero, the entry is released

Page 115: BigData Stream Mining

Lossy Counting

I Lossy counting is a deterministic technique.I The user inputs two parameters

I Minimum Support (s)I Admissible Error (ε)

I The data structure has entries of data elements, theirassociated frequencies (e, f, 4) where 4 is the maximumpossible error in f.

I The stream is conceptually divided into buckets with a widthw = 1/ε.

I Each bucket is labeled by a value of N/w , where N startsfrom 1 and increases by 1.

Page 116: BigData Stream Mining

Lossy Counting

I For a new incoming element, the data structure is checkedI If an entry exists, then increment the frequencyI Otherwise, add a new entry with 4 = bcurrent − 1 where

bcurrent is the current bucket label.

I When switching to a new bucket, all entries withf +4 < bcurrent are deleted.

Page 117: BigData Stream Mining

Error Analysis

Output:

I Elements with counter values exceeding s × N − ε× N

How much do we undercount?

I If the current size of stream is N and window-size = 1/ε thenfrequency error ≤ #window = ε× N

Approximation guarantees:

I Frequencies underestimated by at most ε× N

I No false negatives

I False positives have true frequency at least s × N − ε× N

How many counters do we need?

I Worst case: 1/εlog(εN) counters

Page 118: BigData Stream Mining

Pattern mining: definitions

Patterns: sets with a subpattern relation ⊂

{cheese,milk} ⊂ {milk, peanuts, cheese, butter}

(search?buy) ⊂ (home?search?cart?buy?exit)

Applications: market basket analysis, intrusion detection, churnprediction, feature selection, XML query analysis, query andclickstream analysis, anomaly detection . . .

Page 119: BigData Stream Mining

Pattern mining in streams: definitions

I The support of a pattern T in a stream S at time t is theprobability that a pattern T ′ drawn from S ′s distribution attime t is such that T ⊂ T ′

I Typical task: Given access to S , at all times t, produce theset of patterns T with support at least ε at time t

I A pattern is closed if no superpattern has the same support.

I No information is lost if we focus only on closed patterns.

Page 120: BigData Stream Mining

Key data structure: Lattice of patterns, with counts

Page 121: BigData Stream Mining

Fundamentals

I A priori property: t ⊆ t ′ ⇒ support(t) ≥ support(t ′)

I Closed: none of its supersets has the same supportCan generate all freq. itemsets and their support

I Maximal: none of its supersets is frequentCan generate all freq. itemsets (without support)

I Maximal ⊆ Closed ⊆ Frequent ⊆ D

Page 122: BigData Stream Mining

FP-Stream

C. Giannella, J. Han, J. Pei, X. Yan, P. S. Yu: Mining frequentpatterns in data streams at multiple time granularities. NGDM(2003)

I Multiple time granularities

I Based on FP-Growth (depth-first search over itemset lattice)

I Pattern-tree with Tilted-time windowTilted-time window: logarithmically aggregated time slots (lognumber of levels, aggregate when the level is full, push theaggregate one level up)

I Time sensitive queries, emphasis on recent history

I High time and memory complexity

Page 123: BigData Stream Mining

Moment

Y. Chi , H. Wang, P. Yu , R. Muntz: Moment: Maintaining Closed

Frequent Itemsets over a Stream Sliding Window. ICDM 2004

I Keeps track of boundary below frequent itemsetsI Closed Enumeration Tree (CET) (≈ prefix tree)

I Infrequent gateway nodes (infrequent)I Unpromising gateway nodes (infrequent, dominated)I Intermediate nodes (frequent, dominated)I Closed nodes (frequent)

I By adding/removing transactions closed/infreq. do notchange

Page 124: BigData Stream Mining

Itemset mining

I MOMENT (Chi+ 04) (Sliding window, frequent closed, exact)

I CLOSTREAM (Yen+ 09) (Sliding window, all closed, exact)

I MFI (Li+ 09) (Transaction-sensitive window, frequent closed,exact)

I IncMine (Cheng+ 08) (Sliding window, frequent closed,approximate; faster for moderate approximate ratios)

Page 125: BigData Stream Mining

Sequence, trees, and graph mining

I Frequent subsequence mining:MILE (Chen+05), SMDS (Marascu-Masseglia 06), SSBE(Koper-Nguyen 11)

I Bifet+08: Frequent closed unlbeled subtree mining

I Bifet+11: Frequent closed labeled subtree mining; Frequentclosed labeled subgraph mining

Page 126: BigData Stream Mining

Bibliography on Frequent Item’s

I What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically,by G. Cormode, S. Muthukrishnan, PODS 2003.

I Dynamically Maintaining Frequent Items Over A Data Stream, by C. Jin,W. Qian, C. Sha, J. Yu, A. Zhou; CIKM 2003.

I Processing Frequent Itemset Discovery Queries by Division and SetContainment Join Operators, by R. Rantzau, DMKD 2003.

I Approximate Frequency Counts over Data Streams, by G. Singh Manku,R. Motawani, VLDB 2002.

I Finding Hierarchical Heavy Hitters in Data Streams, by G. Cormode, F.Korn, S. Muthukrishnan, D. Srivastava, VLDB 2003.

I J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without CandidateGeneration SIGMOD 2000

I Metwally, D. Agrawal, A. Abbadi, Efficient Computation of Frequent andTop-k Elements in Data Streams, ICDT 2005

I Y. Chi , H. Wang, P. Yu , R. Muntz: Moment: Maintaining ClosedFrequent Itemsets over a Stream Sliding Window ICDM04

I C. Giannella, J. Han, J. Pei, X. Yan, P. S. Yu: Mining frequent patternsin data streams at multiple time granularities. NGDM (2003)

Page 127: BigData Stream Mining

Outline

1 Evaluation

2 Non Distributed Open Source Tools

3 Distributed Open Source Tools

4 Applications

Page 128: BigData Stream Mining

Data stream classification cycle

1 Process an example at a time,and inspect it only once (atmost)

2 Use a limited amount of memory

3 Work in a limited amount oftime

4 Be ready to predict at any point

Page 129: BigData Stream Mining

Evaluation

1 Error estimation: Hold-out or Prequential

2 Evaluation performance measures: Accuracy or κ-statistic

3 Statistical significance validation: MacNemar or Nemenyi test

Evaluation Framework

Page 130: BigData Stream Mining

Error Estimation

Data available for testing

Holdout an independent test set

Apply the current decision model to the test set, at regulartime intervals

The loss estimated in the holdout is an unbiased estimator

Holdout Evaluation

Page 131: BigData Stream Mining

1. Error Estimation

No data available for testing

The error of a model is computed from the sequence ofexamples.

For each example in the stream, the actual model makes aprediction, and then uses it to update the model.

Prequential orInterleaved-Test-Then-Train

Page 132: BigData Stream Mining

1. Error Estimation

Hold-out or Prequential?

Hold-out is more accurate, but needs data for testing.

Use prequential to approximate Hold-out

Estimate accuracy using sliding windows or fading factors

Hold-out or Prequential orInterleaved-Test-Then-Train

Page 133: BigData Stream Mining

2. Evaluation performance measures

Predicted PredictedClass+ Class- Total

Correct Class+ 75 8 83Correct Class- 7 10 17

Total 82 18 100

Table: Simple confusion matrix example

Accuracy = 75100 + 10

100 = 7583

83100 + 10

1717100 = 85%

Arithmetic mean = (7583 + 1017)/2 = 74.59%

Geometric mean =√

7583

1017 = 72.90%

Page 134: BigData Stream Mining

2. Performance Measures with Unbalanced Classes

Predicted PredictedClass+ Class- Total

Correct Class+ 75 8 83Correct Class- 7 10 17

Total 82 18 100

Table: Simple confusion matrix example

Predicted PredictedClass+ Class- Total

Correct Class+ 68.06 14.94 83Correct Class- 13.94 3.06 17

Total 82 18 100

Table: Confusion matrix for chance predictor

Page 135: BigData Stream Mining

2. Performance Measures with Unbalanced Classes

Kappa Statistic

p0: classifier’s prequential accuracy

pc : probability that a chance classifier makes a correctprediction.

κ statistic

κ =p0−pc1−pc

κ = 1 if the classifier is always correct

κ = 0 if the predictions coincide with the correct ones as oftenas those of the chance classifier

Forgetting mechanism for estimating prequential kappa

Sliding window of size w with the most recent observations

Page 136: BigData Stream Mining

Outline

1 Evaluation

2 Non Distributed Open Source Tools

3 Distributed Open Source Tools

4 Applications

Page 137: BigData Stream Mining

VFML

Very Fast Machine Learning

Developed by Pedro Domingos and his team

Contains first implementation of Hoeffding Tree

VFDT: Very Fast Decision TreeCVFDT: Concept-adapting Very Fast Decision Tree

Does not contain ensembles

Implemented in C

Not longer maintained since 2003

Page 138: BigData Stream Mining

VW

Vowpal Wabbit

Developed by John Langford at Yahoo Research and MicrosoftResearch

Used in Microsoft Azure Machine Learning

Single Classifier until 2013

Distributed using MPI

Based on the Hashing Trick

Page 139: BigData Stream Mining

Sofia-ML

Developed by David Sculley, at Google

Good design of the software

Contains

Fast online learnersFast k-means clustering

Page 140: BigData Stream Mining

{M}assive {O}nline {A}nalysis MOA (Bifet et al. 2010)

{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.

It is closely related to WEKA

It includes a collection of offline and online as well as tools forevaluation:

classification, regressionclusteringfrequent pattern mining

Easy to extend

Easy to design and run experiments

Page 141: BigData Stream Mining

WEKA

Waikato Environment for Knowledge Analysis

Collection of state-of-the-art machine learning algorithms anddata processing tools implemented in Java

Released under the GPL

Support for the whole process of experimental data mining

Preparation of input dataStatistical evaluation of learning schemesVisualization of input data and the result of learning

Used for education, research and applications

Complements “Data Mining” by Witten & Frank & Hall

Page 142: BigData Stream Mining

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

Page 143: BigData Stream Mining

Classification Experimental Setting

Page 144: BigData Stream Mining

Classification Experimental Setting

Evaluation procedures for DataStreams

Holdout

Interleaved Test-Then-Train orPrequential

Page 145: BigData Stream Mining

Classification Experimental Setting

Data Sources

Random Tree Generator

Random RBF Generator

LED Generator

Waveform Generator

Hyperplane

SEA Generator

STAGGER Generator

Page 146: BigData Stream Mining

Classification Experimental Setting

Classifiers

Naive Bayes

Decision stumps

Hoeffding Tree

Hoeffding Option Tree

Bagging and Boosting

ADWIN Bagging andLeveraging Bagging

Page 147: BigData Stream Mining

Clustering Experimental Setting

Page 148: BigData Stream Mining

Clustering Experimental Setting

Internal measures External measuresGamma Rand statisticC Index Jaccard coefficientPoint-Biserial Folkes and Mallow IndexLog Likelihood Hubert Γ statisticsDunn’s Index Minkowski scoreTau PurityTau A van Dongen criterionTau C V-measureSomer’s Gamma CompletenessRatio of Repetition HomogeneityModified Ratio of Repetition Variation of informationAdjusted Ratio of Clustering Mutual informationFagan’s Index Class-based entropyDeviation Index Cluster-based entropyZ-Score Index PrecisionD Index RecallSilhouette coefficient F-measure

Table: Internal and external clustering evaluation measures.

Page 149: BigData Stream Mining

Clustering Experimental Setting

Clusterers

StreamKM++

CluStream

ClusTree

Den-Stream

D-Stream

CobWeb

Page 150: BigData Stream Mining

Web

http://www.moa.cms.waikato.ac.nz

Page 151: BigData Stream Mining

Easy Design of a MOA classifier

void resetLearningImpl ()

void trainOnInstanceImpl (Instance inst)

double[] getVotesForInstance (Instance i)

Page 152: BigData Stream Mining

Easy Design of a MOA clusterer

void resetLearningImpl ()

void trainOnInstanceImpl (Instance inst)

Clustering getClusteringResult()

Page 153: BigData Stream Mining

Extensions of MOA

Multi-label Classification

Active Learning

Regression

Closed Frequent Graph Mining

Twitter Sentiment Analysis

Page 154: BigData Stream Mining

streamDM C++

http://streamdm.noahlab.com.hk/

Page 155: BigData Stream Mining

Outline

1 Evaluation

2 Non Distributed Open Source Tools

3 Distributed Open Source Tools

4 Applications

Page 156: BigData Stream Mining

streams Framework

Developed by Christian Bockermann at University ofDortmund

Uses MOA for Machine Learning methods

Integrates with Storm

RapidMiner Streams Plugin

Page 157: BigData Stream Mining

Apache Mahout

Scalable machine learning library

Current version runs on Hadoop

Some methods are streaming to scale

New version in Scala, to run on Spark

Page 158: BigData Stream Mining

Jubatus

Developed by Nippon Telegraph and Telephone

Open source online machine learning and distributedcomputing framework

Implemented in C++

Page 159: BigData Stream Mining

Apache SAMOA(De Francisci & Bifet 2015)

samoa is distributed streaming machine learning(ML) framework that contains a programing

abstraction for distributed streaming ML algorithms.

Page 160: BigData Stream Mining

Apache SAMOA

samoa-SPE

SAMOA

Algorithm and API

SPE-adapter

S4 Storm other SPEs

ML-

adap

ter MOA

Other ML frameworks

samoa-S4 samoa-storm samoa-other-SPEs

Page 161: BigData Stream Mining

Apache SAMOA

SAMOA  SA

Page 162: BigData Stream Mining

SAMOA ML Developer API

Processing ItemProcessor

Stream

Page 163: BigData Stream Mining

SAMOA ML Developer API

Page 164: BigData Stream Mining

Web

http://samoa-project.net/

Page 165: BigData Stream Mining

Apache Flink

Page 166: BigData Stream Mining

streamDM

http://streamdm.noahlab.com.hk/

Page 167: BigData Stream Mining

streamDM

New project specific designed for Spark Streaming

Spark Streaming: latency in seconds

Easy to integrate in Spark systems

Designed in Scala

Classification, Regression, Clustering, Frequent Pattern Mining

Page 168: BigData Stream Mining

Outline

1 Evaluation

2 Non Distributed Open Source Tools

3 Distributed Open Source Tools

4 Applications

Page 169: BigData Stream Mining

Twitter: A Massive Data Stream

Web 2.0

Micro-blogging service

Built to discover what is happening at any moment in time,anywhere in the world.

3 billion requests a day via its API.

Page 170: BigData Stream Mining

Twitter Streaming API

Twitter APIs

Streaming API

Two discrete REST APIs

Real-time access to Tweets

sampled form

filtered form

HTTP based

GET

POST

DELETE

Page 171: BigData Stream Mining

Sentiment Analysis on Twitter

Sentiment analysis

Classifying messages into two categories depending on whetherthey convey positive or negative feelings

Emoticons are visual cues associated with emotional states, whichcan be used to define class labels for sentiment classification

Positive Emoticons Negative Emoticons

:) :(:-) :-(: ) : (:D=)

Table: List of positive and negative emoticons.

Page 172: BigData Stream Mining

Outline

Final Comments

Page 173: BigData Stream Mining

Open Challenges

Open Challenges

I Structured input and output

I Multi-target, multi-task and transfer learning

I Millions of classes

I Visualization

I Distributed Streams

I Representation learning

I Ease of use

Page 174: BigData Stream Mining

Lessons Learned

Learning from data streams:

I Learning is not one-shot: is an evolving process;

I We need to monitor the learning process;

I Opens the possibility to reasoning about the learning

Page 175: BigData Stream Mining

Reasoning about the Learning Process

Intelligent systems must:

I be able to adapt continuously to changing environmentalconditions and evolving user habits and needs.

I be capable of predictive self-diagnosis.

The development of such self-configuring, self-optimizing, andself-repairing systems is a major scientific and engineeringchallenge.