scalable learning technologies for big data mining

152
DASFAA 2015 Hanoi Tutorial Scalable Learning Technologies for Big Data Mining Gerard de Melo, Tsinghua University http://gerard.demelo.org Aparna Varde, Montclair State University http://www.montclair.edu/~vardea/ DASFAA 2015 Hanoi Tutorial Scalable Learning Technologies for Big Data Mining Gerard de Melo, Tsinghua University http://gerard.demelo.org Aparna Varde, Montclair State University http://www.montclair.edu/~vardea/

Upload: gerard-de-melo

Post on 28-Jul-2015

42 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Scalable Learning Technologies for Big Data Mining

DASFAA 2015 Hanoi Tutorial

Scalable Learning Technologies

for Big Data MiningGerard de Melo, Tsinghua University

http://gerard.demelo.org

Aparna Varde, Montclair State Universityhttp://www.montclair.edu/~vardea/

DASFAA 2015 Hanoi Tutorial

Scalable Learning Technologies

for Big Data MiningGerard de Melo, Tsinghua University

http://gerard.demelo.org

Aparna Varde, Montclair State Universityhttp://www.montclair.edu/~vardea/

Page 2: Scalable Learning Technologies for Big Data Mining

Big DataBig DataBig DataBig Data

Image: Caixin

Image: Corbis

Alibaba: 31 million ordersper day! (2014)

Alibaba: 31 million ordersper day! (2014)

Page 3: Scalable Learning Technologies for Big Data Mining

Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web

Source: Coup Media 2013

Page 4: Scalable Learning Technologies for Big Data Mining

Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web

Source: Coup Media 2013

Page 5: Scalable Learning Technologies for Big Data Mining

From Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to Knowledge

Image:Brett Ryder

Page 6: Scalable Learning Technologies for Big Data Mining

Learning from DataLearning from Data

PreviousKnowledge

PreparationTime

Passed Exam?

Student 1 80% 48h Yes

Student 2 50% 75h Yes

Student 3 95% 24h Yes

Student 4 60% 24h No

Student 5 80% 10h No

Page 7: Scalable Learning Technologies for Big Data Mining

Learning from DataLearning from Data

PreviousKnowledge

PreparationTime

Passed Exam?

Student 1 80% 48h Yes

Student 2 50% 75h Yes

Student 3 95% 24h Yes

Student 4 60% 24h No

Student 5 80% 10h No

Student A 30% 20h ?

Page 8: Scalable Learning Technologies for Big Data Mining

Learning from DataLearning from Data

PreviousKnowledge

PreparationTime

Passed Exam?

Student 1 80% 48h Yes

Student 2 50% 75h Yes

Student 3 95% 24h Yes

Student 4 60% 24h No

Student 5 80% 10h No

Student A 30% 20h ?

Student B 80% 45h ?

Page 9: Scalable Learning Technologies for Big Data Mining

Machine LearningMachine Learning

Labelsfor Test

Data

PredictionPrediction

ProbablySpam!

ClassifierModel

Unsupervisedor SupervisedLearning

Unsupervisedor SupervisedLearning

D1

0.324

0.739

0.000

0.112

Data withor without

labels

Page 10: Scalable Learning Technologies for Big Data Mining

Unsupervisedor SupervisedLearning

Unsupervisedor SupervisedLearning

Data MiningData Mining

Labelsfor Test

DataClassifierModel

Data withor without

labels

World

Data Acquisition

Data Acquisition

Raw Data Preprocessing +Feature Engineering

Preprocessing +Feature Engineering

VisualizationVisualizationPredictionPrediction

Use of newKnowledge

Use of newKnowledge

Model

Data Acquisition

Data Acquisition

010010100101101110011

Model

D1

0.324

0.739

0.000

0.112

Unsupervisedor SupervisedAnalysis

Unsupervisedor SupervisedAnalysis

AnalysisResults

Page 11: Scalable Learning Technologies for Big Data Mining

Problem with Classic Methods: Problem with Classic Methods: ScalabilityScalability

Problem with Classic Methods: Problem with Classic Methods: ScalabilityScalability

http://www.whistlerisawesome.com/wp-content/uploads/2011/12/drinking-from-firehose.jpg

Page 12: Scalable Learning Technologies for Big Data Mining

Scaling UpScaling Up

Page 13: Scalable Learning Technologies for Big Data Mining

Scaling Up: More FeaturesScaling Up: More Features

PreviousKnowledge

PreparationTime

Passed Exam?

Student 1 80% 48h Yes

Student 2 50% 75h Yes

Student 3 95% 24h Yes

Student 4 60% 24h No

Student 5 80% 10h No

Page 14: Scalable Learning Technologies for Big Data Mining

Scaling Up: More FeaturesScaling Up: More Features

PreviousKnowledge

PreparationTime

... ... ... ... ... Passed Exam?

Student 1 80% 48h ... ... ... ... ... Yes

Student 2 50% 75h ... ... ... ... ... Yes

Student 3 95% 24h ... ... ... ... ... Yes

Student 4 60% 24h ... ... ... ... ... No

Student 5 80% 10h ... ... ... ... ... No

For example:- words and phrases mentioned in exam response- Facebook likes, Websites visited- user interaction details in online learning

For example:- words and phrases mentioned in exam response- Facebook likes, Websites visited- user interaction details in online learning

Could be many millions!Could be many millions!

Page 15: Scalable Learning Technologies for Big Data Mining

Scaling Up: More FeaturesScaling Up: More Features

PreviousKnowledge

PreparationTime

... ... ... ... ... Passed Exam?

Student 1 80% 48h ... ... ... ... ... Yes

Student 2 50% 75h ... ... ... ... ... Yes

Student 3 95% 24h ... ... ... ... ... Yes

Student 4 60% 24h ... ... ... ... ... No

Student 5 80% 10h ... ... ... ... ... No

Classic solution:Feature Selection

Page 16: Scalable Learning Technologies for Big Data Mining

Scaling Up: More FeaturesScaling Up: More Features

PreviousKnowledge

PreparationTime

... ... ... ... ... Passed Exam?

Student 1 80% 48h ... ... ... ... ... Yes

Student 2 50% 75h ... ... ... ... ... Yes

Student 3 95% 24h ... ... ... ... ... Yes

Student 4 60% 24h ... ... ... ... ... No

Student 5 80% 10h ... Σ ... ... ... No

Scalable solution:Buckets with sumsof original features

“Clicked onhttp://physics...”

“Clicked onhttp://icsi.berkeley...”

Page 17: Scalable Learning Technologies for Big Data Mining

Scaling Up: More FeaturesScaling Up: More Features

F0 F1 F2 F3 ... ... Fn Passed Exam?

Student 1 ... ... ... ... ... ... ... Yes

Student 2 ... ... ... ... ... ... ... Yes

Student 3 ... ... ... ... ... ... ... Yes

Student 4 ... ... ... ... ... ... ... No

Student 5 ... ... ... ... ... ... ... No

Feature Hashing: Use fixed feature dimensionality n.Hash original Feature ID (e.g. “Clicked on http://...”)to a bucket number in 0 to n-1Normalize features and use bucket-wise sums

Feature Hashing: Use fixed feature dimensionality n.Hash original Feature ID (e.g. “Clicked on http://...”)to a bucket number in 0 to n-1Normalize features and use bucket-wise sums

Small loss of precisionusually trumped by big gains from being able to use morefeatures

Small loss of precisionusually trumped by big gains from being able to use morefeatures

Page 18: Scalable Learning Technologies for Big Data Mining

Scaling Up: More Training ExamplesScaling Up: More Training Examples

Banko & Brill (2001): Word confusion experiments(e.g. “principal” vs. “principle”)

Page 19: Scalable Learning Technologies for Big Data Mining

Scaling Up: More Training ExamplesScaling Up: More Training Examples

Banko & Brill (2001): Word confusion experiments(e.g. “principal” vs. “principle”)

More Dataoften trumpsbetterAlgorithms

Alon Halevy, Peter Norvig, Fernando Pereira (2009).The Unreasonable Effectiveness of Data

More Dataoften trumpsbetterAlgorithms

Alon Halevy, Peter Norvig, Fernando Pereira (2009).The Unreasonable Effectiveness of Data

Page 20: Scalable Learning Technologies for Big Data Mining

Scaling Up: More Training ExamplesScaling Up: More Training Examples

Léon Bottou. Learning with Large Datasets Tutorial.Text Classification experiments

Page 21: Scalable Learning Technologies for Big Data Mining

Background:Stochastic Gradient Descent

Background:Stochastic Gradient Descent

Images:http://en.wikipedia.org/wiki/File:Hill_climb.png

http://en.wikipedia.org/wiki/Hill_climbing#mediaviewer/File:Local_maximum.png

move towardsoptimum by

approximatinggradient based

on 1 (or a smallbatch of) random

training examples

move towardsoptimum by

approximatinggradient based

on 1 (or a smallbatch of) random

training examples

Stochasticnature mayhelp us escapelocal optima

Stochasticnature mayhelp us escapelocal optima

Improvedvariants:

AdaGrad(Duchi et al. 2011)

AdaDelta(Zeiler et al. 2012)

Improvedvariants:

AdaGrad(Duchi et al. 2011)

AdaDelta(Zeiler et al. 2012)

Page 22: Scalable Learning Technologies for Big Data Mining

Scaling Up: More Training ExamplesScaling Up: More Training Examples

Recommended Tool

VowPal WabbitBy John Langford et al.http://hunch.net/~vw/

Recommended Tool

VowPal WabbitBy John Langford et al.http://hunch.net/~vw/

Page 23: Scalable Learning Technologies for Big Data Mining

Scaling Up: More Training ExamplesScaling Up: More Training Examples

Parallelization?Parallelization?

Use lock-free approach to

updating weight vectorcomponents

(HogWild! byNiu, Recht, et al.)

Page 24: Scalable Learning Technologies for Big Data Mining

Problem: Where to get Training Examples?

Problem: Where to get Training Examples?

Gerard de Melo

Labeled Data is expensive!

● Penn Chinese Treebank: 2 years for 4000 sentences

● Adaptation is difficult

● Wall Street Journal ≠ Novels ≠ Twitter

● For Speech Recognition, ideally need training data for each domain, voice/accent, microphone, microphone setup, social setting, etc.

http://en.wikipedia.org/wiki/File:Chronic_fatigue_syndrome.JPG

Page 25: Scalable Learning Technologies for Big Data Mining

Semi-Supervised Learning

Semi-Supervised Learning

Page 26: Scalable Learning Technologies for Big Data Mining

● Goal: When learning a model, use unlabeled data in addition to labeled data

● Example: Cluster-and-label

● Run a clustering algorithm on labeled and unlabeled data

● Assign cluster majority label to unlabeled examples of every cluster Image: Wikipedia

Semi-Supervised LearningSemi-Supervised Learning

Page 27: Scalable Learning Technologies for Big Data Mining

● Bootstrapping or Self-Training (e.g. Yarowsky 1995)

– Use classifier to label the unlabelled examples

– Add the labels with the highest confidence to the training data and re-train

– Repeat

Semi-Supervised LearningSemi-Supervised Learning

Page 28: Scalable Learning Technologies for Big Data Mining

● Co-Training (Blum & Mitchell 1998)

● Given:multiple (ideally independent) viewsof the same data(e.g. left context and right context of a word)

● Learn separate models for each view

● Allow different views to teach each other: Model 1 can generate labels that will be helpful to improve model 2 and vice versa.

Semi-Supervised LearningSemi-Supervised Learning

Page 29: Scalable Learning Technologies for Big Data Mining

Semi-Supervised Learning:Transductive Setting

Semi-Supervised Learning:Transductive Setting

Image: Partha Pratim Talukdar

Page 30: Scalable Learning Technologies for Big Data Mining

Semi-Supervised Learning:Transductive Setting

Semi-Supervised Learning:Transductive Setting

Image: Partha Pratim Talukdar

Algorithms: Label Propagation (Zhu et al. 2003), Adsorption (Baluja et al. 2008),Modified Adsorption (Talukdar et al. 2009)

Page 31: Scalable Learning Technologies for Big Data Mining

● Sentiment Analysis:

● Look for Twitter tweets with emoticons like “:)”, “:(“

● Remove emoticons. Then use as training data!

Crimson Hexagon

Distant SupervisionDistant Supervision

Page 32: Scalable Learning Technologies for Big Data Mining

Representation Learning to BetterExploit Big Data

Representation Learning to BetterExploit Big Data

Page 33: Scalable Learning Technologies for Big Data Mining

RepresentationsRepresentationsRepresentationsRepresentations

Image: David Warde-Farley via Bengio et al. Deep Learning Book

Page 34: Scalable Learning Technologies for Big Data Mining

RepresentationsRepresentationsRepresentationsRepresentations

Inputs Bits:

0011001…..Images: Marc'Aurelio Ranzato

Note sharingbetween classes

Note sharingbetween classes

Page 35: Scalable Learning Technologies for Big Data Mining

RepresentationsRepresentationsRepresentationsRepresentations

Inputs Bits:

0011001…..Images: Marc'Aurelio Ranzato

Massive improvements inimage object recognition (human-level?),

speech recognition.

Good improvements inNLP and IR-related tasks.

Massive improvements inimage object recognition (human-level?),

speech recognition.

Good improvements inNLP and IR-related tasks.

Page 36: Scalable Learning Technologies for Big Data Mining

ExampleExampleExampleExample

Google's image Source: Jeff Dean, Google

Page 37: Scalable Learning Technologies for Big Data Mining

Inspiration: The BrainInspiration: The BrainInspiration: The BrainInspiration: The Brain

Source: Alex Smola

Input: delivered via dendrites from other neurons

Processing:Synapses may alter input signals. The cell then combines all input signals

Output: If enough activationfrom inputs, output signal sentthrough a long cable (“axon”)

Input: delivered via dendrites from other neurons

Processing:Synapses may alter input signals. The cell then combines all input signals

Output: If enough activationfrom inputs, output signal sentthrough a long cable (“axon”)

Page 38: Scalable Learning Technologies for Big Data Mining

PerceptronPerceptronPerceptronPerceptron

Input: Features

Every feature fi gets a weight w

i.

Input: Features

Every feature fi gets a weight w

i.

feature weight

dog 7.2

food 3.4

bank -7.3

delicious 1.5

train -4.2

Feature f1

Feature f2

Feature f3

Feature f4

Neuron

w1

w2

w3

w4

Page 39: Scalable Learning Technologies for Big Data Mining

PerceptronPerceptronPerceptronPerceptron

Neuron Output

w1

w2

w3

w4

Activation of NeuronMultiply the feature valuesof an object xwith the feature weights.

Activation of NeuronMultiply the feature valuesof an object xwith the feature weights.

a( x)=∑i

wi f i(x)=wtf ( x)

Feature f1

Feature f2

Feature f3

Feature f4

Page 40: Scalable Learning Technologies for Big Data Mining

PerceptronPerceptronPerceptronPerceptron

Neuron Output

w1

w2

w3

w4

Output of NeuronCheck if activation exceedsa threshold t = –b

Output of NeuronCheck if activation exceedsa threshold t = –b

Feature f1

Feature f2

Feature f3

Feature f4

output (x)=g (w t f (x)+b)

e.g. g could return1 (positive) if positive,-1 otherwise

e.g. 1 for “spam”,-1 for “not-spam”

e.g. 1 for “spam”,-1 for “not-spam”

Page 41: Scalable Learning Technologies for Big Data Mining

Decision SurfacesDecision SurfacesDecision SurfacesDecision Surfaces

Decision Trees Linear Classifiers(Perceptron, SVM)

Kernel-based Classifiers(Kernel Perceptron, Kernel SVM)

Multi-Layer Perceptron

Images: Vibhav Gogate

Not max-marginNot max-marginOnly straightdecision surface

Only straightdecision surface

Any decisionsurface

Any decisionsurface

Page 42: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Neuron1

Output

Feature f1

Feature f2

Feature f3

Feature f4

Neuron2

Neuron

Input Layer Output LayerHidden Layer

Page 43: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Neuron1

Output

Feature f1

Feature f2

Feature f3

Feature f4

Neuron2

Neuron

Neuron2

Input Layer Hidden Layer Output Layer

Page 44: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Neuron1

Output 1

Feature f1

Feature f2

Feature f3

Feature f4

Neuron2

Neuron

Neuron2

Neuron Output 2

Input Layer Hidden Layer Output Layer

Page 45: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Single-Layer:

output (x)=g (W f ( x)+b)

Input Layer (Feature Extraction)

f (x)

Three-Layer Network:

output ( x)=g2(W

2g

1(W

1f (x)+b

1)+b

2)

Four-Layer Network:

output ( x)=g3(W

3g

2(W

2g

1(W

1f ( x)+b

1)+b

2)+b

3)

Page 46: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron

Page 47: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Computing the OutputComputing the Output

Deep Learning:Deep Learning:Computing the OutputComputing the Output

Simplyevaluate theoutput function(for each node,compute anoutput based onthe node inputs)

Simplyevaluate theoutput function(for each node,compute anoutput based onthe node inputs)

Output

y1Input x1

Input x2

z1

z2

z3Output

y2

Page 48: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:TrainingTraining

Deep Learning:Deep Learning:TrainingTraining

Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it

Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it

Backpropagation

The error ispropagated backfrom outputnodes towardsthe input layer

Backpropagation

The error ispropagated backfrom outputnodes towardsthe input layer

Output

y1Input x1

Input x2

z1

z2

z3Output

y2

Page 49: Scalable Learning Technologies for Big Data Mining

Exploit the chain rule to computethe gradient

Deep Learning:Deep Learning:TrainingTraining

Deep Learning:Deep Learning:TrainingTraining

Backpropagation

The error ispropagated backfrom outputnodes towardsthe input layer

Backpropagation

The error ispropagated backfrom outputnodes towardsthe input layer

Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it

Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it

x

y=f(x)

z=g(y)

∂ z

∂ y

∂ y

∂ x∂ z

∂ x=

∂ z∂ y

∂ y∂ x

We are interested in the gradient,i.e. the partial derivatives for the output function z=g(y) with respect to all [inputs and] weights, including those at a deeper part of the network

Page 50: Scalable Learning Technologies for Big Data Mining

DropOut TechniqueDropOut TechniqueDropOut TechniqueDropOut Technique

Basic Idea

While training, randomly drop inputs (make the feature zero)

Basic Idea

While training, randomly drop inputs (make the feature zero)

Effect

Training on variations of original training data (artificial increaseof training data size). Trained network relies less on the existence of specific features.

Effect

Training on variations of original training data (artificial increaseof training data size). Trained network relies less on the existence of specific features.

Reference: Hinton et al. (2012)

Also: Maxout Networks by Goodfellow et al. (2013)

Page 51: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Convolutional Neural NetworksConvolutional Neural Networks

Deep Learning:Deep Learning:Convolutional Neural NetworksConvolutional Neural Networks

Image: http://torch.cogbits.com/doc/tutorials_supervised/

Reference: Yann LeCun's work

Page 52: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks

Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks

Source: Bayesian Behavior Lab, Northwestern University

Page 53: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks

Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks

Source: Bayesian Behavior Lab, Northwestern University

Then can do backpropagation.Challenge: Vanishing/Exploding gradients

Then can do backpropagation.Challenge: Vanishing/Exploding gradients

Page 54: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks

Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks

Source: Bayesian Behavior Lab, Northwestern University

Page 55: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks

Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks

Deep LSTMsfor Sequence-to-sequenceLearning

Suskever et al. 2014(Google)

Page 56: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks

Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks

French Original:La dispute fait rage entre les grands constructeurs aéronautiques propos de la largeur des siges de la classe touriste sur les vols long-courriers, ouvrant la voie une confrontation amre lors du salon aéronautique de Duba qui a lieu de mois-ci.

LSTM's English Translation: The dispute is raging between large aircraft manufacturers on the size of the touristseats on the long-haul flights, leading to a bitter confrontation at the Dubai Airshowin the month of October.

Ground Truth English Translation:A row has flared up between leading plane makers over the width of tourist-classseats on long-distance flights, setting the tone for a bitter confrontation at thisMonth's Dubai Airshow.

Suskever et al. 2014 (Google)

Page 57: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Source: Bayesian Behavior Lab, Northwestern University

Page 58: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Source: Bayesian Behavior Lab, Northwestern University

Page 59: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Source: Bayesian Behavior Lab, Northwestern University

Page 60: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Source: Bayesian Behavior Lab, Northwestern University

Page 61: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Source: Bayesian Behavior Lab, Northwestern University

Page 62: Scalable Learning Technologies for Big Data Mining

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines

Learning to sort!Learning to sort!

Vectors for numbersare random

Page 63: Scalable Learning Technologies for Big Data Mining

Big Data inFeature Engineering

andRepresentation Learning

Big Data inFeature Engineering

andRepresentation Learning

Page 64: Scalable Learning Technologies for Big Data Mining

● Language Models for Autocompletion

Web Semantics:Statistics from Big Data as Features

Web Semantics:Statistics from Big Data as Features

Page 65: Scalable Learning Technologies for Big Data Mining

Source: Wang et al. An Overview of Microsoft Web N-gram Corpus and Applications

Word SegmentationWord Segmentation

Page 66: Scalable Learning Technologies for Big Data Mining

● NP Coordination

Source: Bansal & Klein (2011)

Parsing: AmbiguityParsing: Ambiguity

Page 67: Scalable Learning Technologies for Big Data Mining

Source: Bansal & Klein (2011)

Parsing: Web SemanticsParsing: Web Semantics

Page 68: Scalable Learning Technologies for Big Data Mining

● Lapata & Keller (2004): The Web as a Baseline (also: Bergsma et al. 2010)

● “big fat Greek wedding”but not “fat Greek big wedding”

Source: Shane Bergsma

Adjective OrderingAdjective Ordering

Page 69: Scalable Learning Technologies for Big Data Mining

Source: Bansal & Klein 2012

Coreference ResolutionCoreference Resolution

Page 70: Scalable Learning Technologies for Big Data Mining

Source: Bansal & Klein 2012

Coreference ResolutionCoreference Resolution

Page 71: Scalable Learning Technologies for Big Data Mining

● Data Sparsity:E.g. most words are rare (in the “long tail”)→ Missing in training data

● Solution (Blitzer et al. 2006, Koo & Collins 2008, Huang & Yates 2009, etc.)

● Cluster together similar features

● Use clustered features instead of / in addition to original features

BrownCorpus

Source:Baroni& Evert

Distributional SemanticsDistributional Semantics

Page 72: Scalable Learning Technologies for Big Data Mining

Even worse: Arnold Schwarzenegger

Spelling CorrectionSpelling Correction

Page 73: Scalable Learning Technologies for Big Data Mining

Vector RepresentationsVector Representations

xx

x x

petronia

sparrow

parched

aridxdry

x bird

Put words intoa vector space

(e.g. with d=300dimensions)

Put words intoa vector space

(e.g. with d=300dimensions)

Page 74: Scalable Learning Technologies for Big Data Mining

Word Vector RepresentationsWord Vector Representations

Tomas Mikolov et al.Proc. ICLR 2013.

Available from https://code.google.com/p/word2vec/

Page 75: Scalable Learning Technologies for Big Data Mining

WikipediaWikipedia

Page 76: Scalable Learning Technologies for Big Data Mining

● Exploit edit history, especially on Simple English Wikipedia

● “collaborate” → “work together”“stands for” → “is the same as”

Text SimplificationText Simplification

Page 77: Scalable Learning Technologies for Big Data Mining

Answering Questions

IBM's Jeopardy!-winning Watson system

Gerard de Melo

Page 78: Scalable Learning Technologies for Big Data Mining

Knowledge Integration

Page 79: Scalable Learning Technologies for Big Data Mining

UWN/MENTA

multilingual extension of WordNet forword senses and taxonomical information over 200 languages

www.lexvo.org/uwn/

Page 80: Scalable Learning Technologies for Big Data Mining

WebChild

AAAI 2014WSDM 2014AAAI 2011

WebChild

AAAI 2014WSDM 2014AAAI 2011

WebChild:Common-Sense Knowledge

WebChild:Common-Sense Knowledge

Page 81: Scalable Learning Technologies for Big Data Mining

Challenge: From Really Big DataChallenge: From Really Big Datato Real Insightsto Real Insights

Challenge: From Really Big DataChallenge: From Really Big Datato Real Insightsto Real Insights

Image:Brett Ryder

Page 82: Scalable Learning Technologies for Big Data Mining

Big Data Miningin Practice

Big Data Miningin Practice

Page 83: Scalable Learning Technologies for Big Data Mining

Gerard de Melo (Tsinghua University, Bejing China) Aparna Varde (Montclair State University, NJ, USA)

DASFAA, Hanoi, Vietnam, April 2015

1

Page 84: Scalable Learning Technologies for Big Data Mining

Dr. Aparna Varde

2

Page 85: Scalable Learning Technologies for Big Data Mining

Internet-based computing - shared resources, software &

data provided on demand, like the electricity grid

Follows a pay-as-you-go model

3

Page 86: Scalable Learning Technologies for Big Data Mining

Several technologies, e.g., MapReduce & Hadoop

MapReduce: Data-parallel programming model for

clusters of commodity machines

• Pioneered by Google

• Processes 20 PB of data per day

Hadoop: Open-source framework, distributed

storage and processing of very large data sets

• HDFS (Hadoop Distributed File System) for storage

• MapReduce for processing

• Developed by Apache 4

Page 87: Scalable Learning Technologies for Big Data Mining

• Scalability

– To large data volumes

– Scan 100 TB on 1 node @ 50 MB/s = 24 days

– Scan on 1000-node cluster = 35 minutes

• Cost-efficiency – Commodity nodes (cheap, but unreliable)

– Commodity network (low bandwidth)

– Automatic fault-tolerance (fewer admins)

– Easy to use (fewer programmers)

5

Page 88: Scalable Learning Technologies for Big Data Mining

Data type

key-value records

Map function

(Kin, Vin) list(Kinter, Vinter)

Reduce function

(Kinter, list(Vinter)) list(Kout, Vout)

6

Page 89: Scalable Learning Technologies for Big Data Mining

MapReduce Example

the quick brown

fox

the fox ate the mouse

how now brown

cow

Map

Map

Map

Reduce

Reduce

brown, 2 fox, 2 how, 1 now, 1 the, 3

ate, 1 cow, 1

mouse, 1 quick, 1

the, 1 brown, 1

fox, 1

quick, 1

the, 1 fox, 1 the, 1

how, 1 now, 1

brown, 1 ate, 1

mouse, 1

cow, 1

Input Map Shuffle & Sort Reduce Output

7

Page 90: Scalable Learning Technologies for Big Data Mining

40 nodes/rack, 1000-4000 nodes in cluster 1 Gbps bandwidth in rack, 8 Gbps out of rack Node specs (Facebook):

8-16 cores, 32 GB RAM, 8×1.5 TB disks, no RAID

Aggregation switch

Rack switch

8

Page 91: Scalable Learning Technologies for Big Data Mining

Files split into 128MB blocks

Blocks replicated across

several data nodes (often 3)

Name node stores metadata

(file names, locations, etc)

Optimized for large files,

sequential reads

Files are append-only

Namenode

Datanodes

1 2 3 4

1 2 4

2 1 3

1 4 3

3 2 4

File1

9

Page 92: Scalable Learning Technologies for Big Data Mining

Hive: Relational

D/B on Hadoop developed at Facebook

Provides SQL-like query language

10

Page 93: Scalable Learning Technologies for Big Data Mining

Supports table partitioning,

complex data types, sampling,

some query optimization

These help discover knowledge

by various tasks, e.g., • Search for relevant terms

• Operations such as word count

• Aggregates like MIN, AVG

11

Page 94: Scalable Learning Technologies for Big Data Mining

/* Find documents of enron table with word frequencies within range of 75 and 80

*/ SELECT DISTINCT D.DocID FROM docword_enron D WHERE D.count > 75 and D.count < 80 limit 10; OK 1853… 11578 16653 Time taken: 64.788 seconds

12

Page 95: Scalable Learning Technologies for Big Data Mining

/* Create a view to find the count for WordID=90 and docID=40, for the nips table */

CREATE VIEW Word_Freq AS SELECT D.DocID, D.WordID, V.word,

D.count FROM docword_Nips D JOIN vocabNips V ON D.WordID=V.WordID AND D.DocId=40 and D.WordId=90; OK Time taken: 1.244 seconds

13

Page 96: Scalable Learning Technologies for Big Data Mining

/* Find documents which use word "rational" from nips table */

SELECT D.DocID,V.word FROM docword_Nips D JOIN vocabnips V ON D.wordID=V.wordID and V.word="rational" LIMIT 10; OK 434 rational 275 rational 158 rational …. 290 rational 422 rational Time taken: 98.706 seconds

14

Page 97: Scalable Learning Technologies for Big Data Mining

/* Find average frequency of all words in

the enron table */

SELECT AVG(count)

FROM docWord_enron;

OK

1.728152608060543

Time taken: 68.2 seconds

15

Page 98: Scalable Learning Technologies for Big Data Mining

Query Execution Time for HQL & MySQL on big data sets

Similar claims for other SQL packages

16

Page 99: Scalable Learning Technologies for Big Data Mining

17

Server Storage Capacity

Max Storage per instance

Page 100: Scalable Learning Technologies for Big Data Mining

18

Page 101: Scalable Learning Technologies for Big Data Mining

Hive supports rich data types: Map, Array & Struct; Complex types

It supports queries with SQL Filters, Joins, Group By, Order By etc.

Here is when (original) Hive users miss SQL….

No support in Hive to update data after insert

No (or little) support in Hive for relational semantics (e.g., ACID)

No "delete from" command in Hive - only bulk delete is possible

No concept of primary and foreign keys in Hive

19

Page 102: Scalable Learning Technologies for Big Data Mining

Ensure dataset is already compliant with

integrity constraints before load

Ensure that only compliant data rows are

loaded SELECT & temporary staging table

Check for referential constraints using Equi-

Join and query on those rows that comply

20

Page 103: Scalable Learning Technologies for Big Data Mining

Providing more power than Hive, Hadoop & MR

21

Page 104: Scalable Learning Technologies for Big Data Mining

Cloudera’s Impala: More efficient SQL-compliant analytic database

Hortonworks’ Stinger: Driving the future

of Hive with enterprise SQL at Hadoop scale

Apache’s Mahout: Machine Learning

Algorithms for Big Data Spark: Lightning fast framework for Big

Data MLlib: Machine Learning Library of Spark

MLbase: Platform Base for MLlib in Spark

22

Page 105: Scalable Learning Technologies for Big Data Mining

Fully integrated, state-of-the-art analytic

D/B to leverage the flexibility &

scalability of Hadoop

Combines benefits • Hadoop: flexibility, scalability, cost-effectiveness

• SQL: performance, usability, semantics

23

Page 106: Scalable Learning Technologies for Big Data Mining

MPP: Massively Parallel Processing

Page 107: Scalable Learning Technologies for Big Data Mining

NDV: function for

counting • Table w/ 1 billion rows

COUNT(DISTINCT) • precise answer

• slow for large-scale

data

NDV() function • approximate result

• much faster

25

Page 108: Scalable Learning Technologies for Big Data Mining

Hardware Configuration • Generates less CPU load than Hive

• Typical performance gains: 3x-4x

• Impala cannot go faster than hardware permits!

Query Complexity • Single-table aggregation queries : less gain

• Queries with at least one join: gains of 7-45X

Main Memory as Cache • Data accessed by query is in cache, speedup is more

• Typical gains with cache: 20x-90x

26

Page 109: Scalable Learning Technologies for Big Data Mining

No - Many viable use cases for MR

& Hive • Long-running data transformation

workloads & traditional DW frameworks

• Complex analytics on limited, structured

data sets

Impala is a complement to the

approaches • Supports cases with very large data sets

• Especially to get focused result sets

quickly

27

Page 110: Scalable Learning Technologies for Big Data Mining

Drive future of Hive with enterprise SQL at

Hadoop scale

3 main objectives • Speed: Sub-second query response times

• Scale: From GB to TB & PB

• SQL: Transactions & SQL:2011 analytics for Hive

28

Page 111: Scalable Learning Technologies for Big Data Mining

Wider use cases with modifications to data

BEGIN, COMMIT, ROLLBACK for multi-stmt transactions

29

Page 112: Scalable Learning Technologies for Big Data Mining

Hybrid engine with LLAP (Live Long and Process) • Caching & data reuse across queries

• Multi-threaded execution

• High throughput I/O

• Granular column level security 30

Page 113: Scalable Learning Technologies for Big Data Mining

Common Table Expressions

Sub-queries: correlated & uncorrelated

Rollup, Cube, Standard Aggregates

Inner, Outer, Semi & Cross Joins

Non Equi-Joins

Set Functions: Union, Except & Intersect

Most sub-queries, nested and otherwise

31

Page 114: Scalable Learning Technologies for Big Data Mining

32

Page 115: Scalable Learning Technologies for Big Data Mining

Supervised and Unsupervised Learning Algorithms

33

Page 116: Scalable Learning Technologies for Big Data Mining

ML algorithms on distributed frameworks

good for mining big data on the cloud

Supervised Learning: e.g. Classification

Unsupervised Learning: e.g. Clustering

The word Mahout means “elephant rider” in Hindi (from India), an interesting analogy

34

Page 117: Scalable Learning Technologies for Big Data Mining

Clustering (Unsupervised) • K-means, Fuzzy k-means, Streaming k-means etc.

Classification (Supervised) • Random Forest, Naïve Bayes etc.

Collaborative Filtering (Semi-Supervised)

• Item Based, Matrix Factorization etc.

Dimensionality Reduction (For Learning)

• SVD, PCA etc.

Others

• LDA for Topic Models, Sparse TF-IDF Vectors from Text etc. 35

Page 118: Scalable Learning Technologies for Big Data Mining

Input: Big Data from Emails • Goal: automatically classify text in various categories

Prior to classifying text, TF-IDF applied • Term Frequency – Inverse Document Frequency

• TF-IDF increases with frequency of word in doc, offset

by frequency of word in corpus

Naïve Bayes used for classification • Simple classifier using posteriori probability

• Each attribute is distributed independently of others

36

Page 119: Scalable Learning Technologies for Big Data Mining

Training Data

Pre-Process

Training Algorithm

Model

Building the Model

Historical data

with reference

decisions:

• Collected a set of e-

mails organized in

directories labeled

with predictor

categories:

• Mahout

• Hive

• Other

• Stored email as text

in HDFS on an EC2

virtual server

Using Apache

Mahout:

• Convert text

files to HDFS

Sequential

File format

• Create TF-IDF

weighted

Sparse

Vectors

Build and

evaluate the

model with

Mahout’s implementation

of Naïve Bayes

Classifier

Classification

Model which takes

as input vectorized

text documents

and assigns one of

three document

topics:

• Mahout

• Hive

• Other

37

Page 120: Scalable Learning Technologies for Big Data Mining

New Data

Model Output

Using Model to Classify New Data

• Store a set of new

email documents

as text in HDFS on

an EC2 virtual

server

• Pre-process using

Apache Mahout’s Libraries

Use the existing model

to predict topics for

new text files.

This was implemented in Java

• With Apache Mahout Libraries and

• Apache Maven to manage dependencies and build the project

• The executable JAR file is submitted with the project documentation

• The program works with data files stored in HDFS

For each input

document, the model

returns one of the

following categories:

• Mahout

• Hive

• Other

38

Page 121: Scalable Learning Technologies for Big Data Mining

Text Classification - Results

Predicted Category: Hive

Input:

Output

:

Possible uses

• Automatic email

sorting

• Automatic news

classification

• Topic modeling

39

Page 122: Scalable Learning Technologies for Big Data Mining

Lightning fast processing for Big Data

Open-source cluster computing developed in

AMPLab at UC Berkeley

Advanced DAG execution engine for cyclic data

flow & in-memory computing

Very well-suited for large-scale Machine Learning

40

Page 123: Scalable Learning Technologies for Big Data Mining

Spark runs much faster

than Hadoop & MR

100x faster in memory

10x faster on disk

Page 124: Scalable Learning Technologies for Big Data Mining

42

Page 125: Scalable Learning Technologies for Big Data Mining

Uses TimSort (derived from merge-sort & insertion-sort), faster than quick-sort

Exploits cache locality due to in-memory computing

Fault-tolerant when scaling, well-designed for failure-recovery

Deploys power of cloud through enhanced N/W & I/O intensive throughput

43

Page 126: Scalable Learning Technologies for Big Data Mining

Spark Core: Distributed task dispatching, scheduling, basic I/O

Spark SQL: Has SchemaRDD (Resilient Distributed Databases); SQL support with CLI, ODBC/JDBC

Spark Streaming: Uses Spark’s fast scheduling for stream analytics, code written for batch analytics can be used for streams

MLlib: Distributed machine learning framework (10x of Mahout)

Graph X: Distributed graph processing framework with API 44

Page 127: Scalable Learning Technologies for Big Data Mining

MLBase - tools & interfaces to bridge gap b/w

operational & investigative ML

Platform to support MLlib - the distributed

Machine Learning Library on top of Spark

45

Page 128: Scalable Learning Technologies for Big Data Mining

MLlib: Distributed ML library for classification, regression, clustering & collaborative filtering

MLI: API for feature extraction, algorithm development, high-level ML programming abstractions

ML Optimizer: Simplifies ML problems for end users by automating model selection

46

Page 129: Scalable Learning Technologies for Big Data Mining

Classification • Support Vector Machines (SVM), Naive Bayes, decision trees

Regression

• linear regression, regression trees

Collaborative Filtering • Alternating Least Squares (ALS)

Clustering

• k-means

Optimization • Stochastic gradient descent (SGD), Limited-memory BFGS

Dimensionality Reduction

• Singular value decomposition (SVD), principal component analysis (PCA)

47

Page 130: Scalable Learning Technologies for Big Data Mining

Easily interpretable Ensembles are top performers

Support for categorical variables Can handle missing data Distributed decision trees Scale well to massive datasets

48

Page 131: Scalable Learning Technologies for Big Data Mining

Uses modified version of k-means

Feature extraction & selection

• Extraction: lot of time and tools

• Selection: domain expertise

• Wrong selection of features: bad quality clusters

Glassbeam’s SCALAR platform • SPL (Semiotic Parsing Language)

• Makes feature engineering easier & faster

49

Page 132: Scalable Learning Technologies for Big Data Mining

50

Page 133: Scalable Learning Technologies for Big Data Mining

High-D data: Not all features IMP to build

model & answer Qs

Many applications: Reduce dimensions before

building model

MLlib: 2 algorithms for dimensionality

reduction • Principal Component Analysis (PCA)

• Singular Value Decomposition (SVD)

51

Page 134: Scalable Learning Technologies for Big Data Mining

52

Page 135: Scalable Learning Technologies for Big Data Mining

53

Page 136: Scalable Learning Technologies for Big Data Mining

Grow into unified platform for data scientists

Reduce time to market with platforms like

Glassbeam’s SCALAR for feature engineering

Include more ML algorithms

Introduce enhanced filters

Improve visualization for better performance

54

Page 137: Scalable Learning Technologies for Big Data Mining

Processing Streaming Data on the Cloud

55

Page 138: Scalable Learning Technologies for Big Data Mining

Apache Storm • Reliably process unbounded streams, do for

real-time what Hadoop did for batch processing

Apache Flink • Fast & reliable large scale data processing

engine with batch & stream based alternatives

56

Page 139: Scalable Learning Technologies for Big Data Mining

Integrates queueing & D/B

Nimbus node

• Upload computations • Distribute code on cluster • Launch workers on cluster • Monitor computation

ZooKeeper nodes

• Coordinate the Storm cluster

Supervisor nodes • Interacts w/ Nimbus through Zookeeper

• Starts & stops workers w/ signals from Nimbus

57

Page 140: Scalable Learning Technologies for Big Data Mining

Tuple: ordered list of elements,

e.g., a “4-tuple” (7, 1, 3, 7)

Stream: unbounded sequence of tuples

Spout: source of streams in a computation (e.g. Twitter API)

Bolt: process I/P streams & produce O/P streams to run functions

Topology: overall calculation, as N/W of spouts and bolts

58

Page 141: Scalable Learning Technologies for Big Data Mining

Exploits in-memory data streaming & adds iterative processing into system

Makes system super fast for data-intensive & iterative jobs

59

Performance Comparison

Page 142: Scalable Learning Technologies for Big Data Mining

Requires few config parameters

Built-in optimizer finds best way to run program

Supports all Hadoop I/O & data types

Runs MR operators unmodified & faster

Reads data from HDFS

60

Execution of Flink

Page 143: Scalable Learning Technologies for Big Data Mining

Summary and Ongoing Work

61

Page 144: Scalable Learning Technologies for Big Data Mining

MapReduce & Hadoop: Pioneering tech

Hive: SQL-like, good for querying

Impala: Complementary to Hive, overcomes its drawbacks

Stinger: Drives future of Hive w/ advanced SQL semantics

Mahout: ML algorithms (Sup & Unsup)

Spark: Framework more efficient & scalable than Hadoop

MLlib: Machine Learning Library of Spark

MLbase: Platform supporting MLlib

Storm: Stream processing for cloud & big data

Flink: Both stream & batch processing

62

Page 145: Scalable Learning Technologies for Big Data Mining

Store & process Big Data? • MR & Hadoop - Classical technologies

• Spark – Very large data, Fast & scalable

Query over Big Data?

• Hive – Fundamental, SQL-like

• Impala – More advanced alternative

• Stinger - Making Hive itself better

ML supervised / unsupervised?

• Mahout - Cloud based ML for big data

• MLlib - Super large data sets, super fast

Mine over streaming big data?

• Only streams – Storm

• Batch & Streams – Flink

63

Page 146: Scalable Learning Technologies for Big Data Mining

Big Data Skills • Cloud Technology • Deep Learning • Business Perspectives • Scientific Domains

Salary $100k +

• Big Data Programmers

• Big Data Analysts

Univ Programs & concentrations • Data Analytics • Data Science

64

Page 147: Scalable Learning Technologies for Big Data Mining

Big Data Concentration being developed in CS Dept • http://cs.montclair.edu/ • Data Mining, Remote Sensing, HCI,

Parallel Computing, Bioinformatics, Software Engineering

Global Education Programs available for visiting and exchange students • http://www.montclair.edu/global-

education/

Please contact me for details 65

Page 148: Scalable Learning Technologies for Big Data Mining

Include more cloud intelligence in big data analytics

Further bridge gap between Hive & SQL

Add features from standalone ML packages to Mahout, MLlib …

Extend big data capabilities to focus more on PB & higher scales

Enhance mining of big data streams w/ cloud services & deep learning

Address security & privacy issues on a deeper level for cloud & big data

Build lucrative applications w/ scalable technologies for big data

Conduct domain-specific research, e.g., Cloud & GIS, Cloud & Green Computing

66

Page 149: Scalable Learning Technologies for Big Data Mining

[1] M. Bansal, D. Klein. Coreference Semantics from Web Features. In Proceedings of

ACL 2012.

[2] R. Bekkerman, M. Bilenko, J. Langford (Eds.). Scaling Up Machine Learning:

Parallel and Distributed Approaches. Cambridge University Press, 2011.

[3] T. Brants, A. Franz. Web 1T 5-Gram Version 1, Linguistic Data Consortium, 2006.

[4] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa. Natural

Language Processing (Almost) from Scratch, Journal of Machine Learning Research

(JMLR), 2011.

[5] G. de Melo. Exploiting Web Data Sources for Advanced NLP. In Proceedings of

COLING 2012.

[6] G. de Melo, K. Hose. Searching the Web of Data. In Proceedings of ECIR 2013.

Springer LNCS.

[7] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large

Clusters. In USENIX OSDI-04, San Francisco, CA, pp. 137-149.

[8] C. Engle, A. Lupher, R. Xin, M. Zaharia, M. Franklin, S. Shenker, I. Stoica. Shark: Fast

Data Analysis using Coarse-Grained Distributed Memory. In Proceedings of

SIGMOD 2013, ACM, pp. 689-692.

[9] A. Ghoting, P. Kambadur, E. Pednault, R. Kannan. NIMBLE: A Toolkit for the

Implementation of Parallel Data Mining and Machine Learning Algorithms on

MapReduce. In Proceedings of KDD 2011. ACM, New York, NY, USA, pp. 334-342.

[10] K. Hammond, A.Varde. Cloud-Based Predictive Analytics,. In ICDM-13 KDCloud

Workshop, Dallas, Texas, December 2013.

67

Page 150: Scalable Learning Technologies for Big Data Mining

[11] K. Hose, R. Schenkel, M. Theobald, G. Weikum: Database Foundations for

Scalable RDF Processing. Reasoning Web 2011, pp. 202-249.

[12] R. Kiros, R. Salakhutdinov, R. Zemel. Multimodal Neural Language Models. In

Proc. of ICML, 2014.

[13] T. Kraska, A. Talwalkar, J.Duchi, R. Griffith, M. Franklin, M. Jordan. MLbase: A

Distributed Machine Learning System. In Conference on Innovative Data Systems

Research, 2013.

[14] Q.V. Le, T. Mikolov. Distributed Representations of Sentences and Documents. In

Proceedings of ICML, 2014.

[15] J. Leskovec, A. Rajaraman, J. Ullman. Mining of Massive Datasets. Cambridge

University Press, 2011.

[16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean. Distributed

Representations of Words and Phrases and Their Compositionality. In NIPS 26, pp.

3111–3119, 2013.

[17] R. Nayak, P. Senellart, F. Suchanek, A. Varde. Discovering Interesting Information

with Advances in Web Technology. SIGKDD Explorations 14(2): 63-81 (2012).

[18] M. Riondato, J. DeBrabant, R. Fonseca, E. Upfal. PARMA: A Parallel Randomized

Algorithm for Approximate Association Rules Mining in MapReduce. In

Proceedings of CIKM 2012. ACM, pp. 85-94.

[19] F. Suchanek, A. Varde, R.Nayak, P.Senellart. The Hidden Web, XML and the

Semantic Web: Scientific Data Management Perspectives. EDBT 2011, Uppsala,

Sweden, pp. 534-537.

68

Page 151: Scalable Learning Technologies for Big Data Mining

[20] N. Tandon, G. de Melo, G. Weikum. Deriving a Web-Scale Common Sense Fact

Database. In Proceedings of AAAI 2011.

[21] J. Tancer, A. Varde. The Deployment of MML For Data Analytics Over The Cloud.

In ICDM-11 KDCloud Workshop, December 2011, Vancouver, Canada, pp. 188-195.

[22] A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R.

Murthy. Hive: A Petabyte Scale Data Warehouse Using Hadoop, 2009.

[23] J. Turian, L. Ratinov, Y. Bengio. Word Representations: A simple and general

method for semi-supervised learning. In Proceedings of ACL 2010.

[24] A. Varde, F. Suchanek, R.Nayak, P.Senellart. Knowledge Discovery over the Deep

Web, Semantic Web and XML. DASFAA 2009, Brisbane, Australia, pp. 784-788.

[25] T. White. Hadoop. The Definitive Guide, O’Reilly, 2011. [26] http://flink.incubator.apache.org

[27] https://github.com/twitter/scalding‎ [28] http://hortonworks.com/labs/stinger/

[29]http://linkeddata.org/

[30] http://mahout.apache.org/

[31] https://storm.apache.org

69

Page 152: Scalable Learning Technologies for Big Data Mining

Contact Information

Gerard de Melo ([email protected]) [http://gerard.demelo.org]

Aparna Varde ([email protected]) [http://www.montclair.edu/~vardea]

70