01 machine learning introduction
TRANSCRIPT
Machine Learning for Data MiningIntroduction
Andres Mendez-Vazquez
May 13, 2015
1 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
2 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
3 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are1 Records2 Transactions3 Web Searches4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are1 Records2 Transactions3 Web Searches4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are1 Records2 Transactions3 Web Searches4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are1 Records2 Transactions3 Web Searches4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are1 Records2 Transactions3 Web Searches4 etc
4 / 56
However
Something NotableWhat constitutes truly “high” volume varies by industry and evengeography!!!
Simply look at the DNA data for a cellular cycle.
Example
5 / 56
HoweverSomething NotableWhat constitutes truly “high” volume varies by industry and evengeography!!!
Simply look at the DNA data for a cellular cycle.
Example
5 / 56
Intuitive Definition: Variety
When looking at the Structure of the Information, we have:Variety like there is not tomorrow:
I It is structured, semi-structured, unstructured
SoDo you have some examples of structures in Information?
6 / 56
Intuitive Definition: Variety
When looking at the Structure of the Information, we have:Variety like there is not tomorrow:
I It is structured, semi-structured, unstructured
SoDo you have some examples of structures in Information?
6 / 56
Intuitive Definition: Variety
When looking at the Structure of the Information, we have:Variety like there is not tomorrow:
I It is structured, semi-structured, unstructured
SoDo you have some examples of structures in Information?
6 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?Data in Motion!!!Velocity:
I Dynamic GenerationI Real Time Generation
Problems with that: LatencyLag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?Data in Motion!!!Velocity:
I Dynamic GenerationI Real Time Generation
Problems with that: LatencyLag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?Data in Motion!!!Velocity:
I Dynamic GenerationI Real Time Generation
Problems with that: LatencyLag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?Data in Motion!!!Velocity:
I Dynamic GenerationI Real Time Generation
Problems with that: LatencyLag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?Data in Motion!!!Velocity:
I Dynamic GenerationI Real Time Generation
Problems with that: LatencyLag time between capture or generation and when it is available!!!
7 / 56
For example
Imagine that I have a stream of m = 1025 integers with Ranges from[a1, ..., an] with n = 10, 000, 000Now, somebody ask you to find the most frequent item!!!
A naive algorithm1 Take hash table with a counter.2 Then, put numbers in the hash table.
ProblemsWhich problems we have?
8 / 56
For example
Imagine that I have a stream of m = 1025 integers with Ranges from[a1, ..., an] with n = 10, 000, 000Now, somebody ask you to find the most frequent item!!!
A naive algorithm1 Take hash table with a counter.2 Then, put numbers in the hash table.
ProblemsWhich problems we have?
8 / 56
For example
Imagine that I have a stream of m = 1025 integers with Ranges from[a1, ..., an] with n = 10, 000, 000Now, somebody ask you to find the most frequent item!!!
A naive algorithm1 Take hash table with a counter.2 Then, put numbers in the hash table.
ProblemsWhich problems we have?
8 / 56
However
There is theCount-Min Sketch Algorithm
Invented byCharikar, Chen and Farch-Colton in 2004
With PropertiesSpace Used Error Probability Error
O(
1ε log
(1δ
)· (logm + log n)
)δ ε
9 / 56
However
There is theCount-Min Sketch Algorithm
Invented byCharikar, Chen and Farch-Colton in 2004
With PropertiesSpace Used Error Probability Error
O(
1ε log
(1δ
)· (logm + log n)
)δ ε
9 / 56
However
There is theCount-Min Sketch Algorithm
Invented byCharikar, Chen and Farch-Colton in 2004
With PropertiesSpace Used Error Probability Error
O(
1ε log
(1δ
)· (logm + log n)
)δ ε
9 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
10 / 56
Complexity
Given all these thingsIt is necessary to correlate and share data across entities.It is necessary to link, match and transform data across businessentities and systems.
With this...Complexity goes through the roof!!!
11 / 56
Complexity
Given all these thingsIt is necessary to correlate and share data across entities.It is necessary to link, match and transform data across businessentities and systems.
With this...Complexity goes through the roof!!!
11 / 56
Complexity
Given all these thingsIt is necessary to correlate and share data across entities.It is necessary to link, match and transform data across businessentities and systems.
With this...Complexity goes through the roof!!!
11 / 56
And it is through the roof!!! Linking open-data communityproject
12 / 56
Cautionary Tale
Something NotableIn 1880 the USA made a Census of the Population in different aspects:
I PopulationI MortalityI AgricultureI Manufacturing
HoweverOnce data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something NotableIn 1880 the USA made a Census of the Population in different aspects:
I PopulationI MortalityI AgricultureI Manufacturing
HoweverOnce data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something NotableIn 1880 the USA made a Census of the Population in different aspects:
I PopulationI MortalityI AgricultureI Manufacturing
HoweverOnce data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something NotableIn 1880 the USA made a Census of the Population in different aspects:
I PopulationI MortalityI AgricultureI Manufacturing
HoweverOnce data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something NotableIn 1880 the USA made a Census of the Population in different aspects:
I PopulationI MortalityI AgricultureI Manufacturing
HoweverOnce data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something NotableIn 1880 the USA made a Census of the Population in different aspects:
I PopulationI MortalityI AgricultureI Manufacturing
HoweverOnce data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something NotableIn 1880 the USA made a Census of the Population in different aspects:
I PopulationI MortalityI AgricultureI Manufacturing
HoweverOnce data was collected it took 7 years to say something!!!
13 / 56
Ahhh...
Thus, Hollering came with the following machine (Circa 1890)!!!
14 / 56
Hollering Tabulating Machine
It was basically a sorter and counterUsing punching cards as memories.And Mercury Sensors.
Example
15 / 56
Hollering Tabulating Machine
It was basically a sorter and counterUsing punching cards as memories.And Mercury Sensors.
Example
15 / 56
It was FAST!!!
It took only!!!
2 years!!!Nevertheless in 1837Babbage’s Difference engine was
The First General Computer!!!Turing-complete!!!Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!Nevertheless in 1837Babbage’s Difference engine was
The First General Computer!!!Turing-complete!!!Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!Nevertheless in 1837Babbage’s Difference engine was
The First General Computer!!!Turing-complete!!!Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!Nevertheless in 1837Babbage’s Difference engine was
The First General Computer!!!Turing-complete!!!Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!Nevertheless in 1837Babbage’s Difference engine was
The First General Computer!!!Turing-complete!!!Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
Funny!!!
Funny!!!
17 / 56
The Problem
Actually, it never reached completion because
Babbage was actually a yucky project manager!!!
18 / 56
The Problem
Actually, it never reached completion because
Babbage was actually a yucky project manager!!!
18 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
19 / 56
Data is Everywhere!
Lots of data is being collected and warehousedWeb data, e-commercePurchases at department/ grocery storesBank/Credit Card transactionsSocial Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehousedWeb data, e-commercePurchases at department/ grocery storesBank/Credit Card transactionsSocial Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehousedWeb data, e-commercePurchases at department/ grocery storesBank/Credit Card transactionsSocial Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehousedWeb data, e-commercePurchases at department/ grocery storesBank/Credit Card transactionsSocial Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehousedWeb data, e-commercePurchases at department/ grocery storesBank/Credit Card transactionsSocial Network
Many Places
20 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of DataHow many data in the world?
I 800 Terabytes, 2000I 160 Exabytes, 2006I 500 Exabytes (Internet), 2009I 2.7 Zettabytes, 2012I 35 Zettabytes by 2020
GenerationHow many data generated ONEday?
I 7 TB, TwitterI 10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-ductivity”McKinsey Global Institute 2011 21 / 56
Type of Data
ThusRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML)
And more...Graph Data
I Social Network, Semantic Web (RDF), . . .
Streaming DataI You can only scan the data once
22 / 56
Type of Data
ThusRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML)
And more...Graph Data
I Social Network, Semantic Web (RDF), . . .
Streaming DataI You can only scan the data once
22 / 56
Type of Data
ThusRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML)
And more...Graph Data
I Social Network, Semantic Web (RDF), . . .
Streaming DataI You can only scan the data once
22 / 56
Type of Data
ThusRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML)
And more...Graph Data
I Social Network, Semantic Web (RDF), . . .
Streaming DataI You can only scan the data once
22 / 56
Type of Data
ThusRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML)
And more...Graph Data
I Social Network, Semantic Web (RDF), . . .
Streaming DataI You can only scan the data once
22 / 56
Type of Data
ThusRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML)
And more...Graph Data
I Social Network, Semantic Web (RDF), . . .
Streaming DataI You can only scan the data once
22 / 56
Type of Data
ThusRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML)
And more...Graph Data
I Social Network, Semantic Web (RDF), . . .
Streaming DataI You can only scan the data once
22 / 56
The Ever Growing Landscape
23 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Machine LearningDefinitionAlgorithms or techniques that enable computer (machine) to “learn” fromdata. Related with many areas such as data mining, statistics, informationtheory, etc.
Algorithm Types:Unsupervised LearningSupervised learningReinforcement learning
ExamplesArtificial Neural Network (ANN)Support Vector Machine (SVM)Expectation-Maximization (EM)Deterministic Annealing (DA)
24 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
25 / 56
Machine Learning Process
Process1 Feature Extraction/Feature Generation2 Clustering ≈ Class Identification ≈ Unsupervised Learning3 Classification ≈ Supervised Learning
Then...We start thinking: We need to process a lot of data...
Or...LARGE SCALE MACHINE LEARNING
26 / 56
Machine Learning Process
Process1 Feature Extraction/Feature Generation2 Clustering ≈ Class Identification ≈ Unsupervised Learning3 Classification ≈ Supervised Learning
Then...We start thinking: We need to process a lot of data...
Or...LARGE SCALE MACHINE LEARNING
26 / 56
Machine Learning Process
Process1 Feature Extraction/Feature Generation2 Clustering ≈ Class Identification ≈ Unsupervised Learning3 Classification ≈ Supervised Learning
Then...We start thinking: We need to process a lot of data...
Or...LARGE SCALE MACHINE LEARNING
26 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
27 / 56
Feature Generation/Dimensionality Reduction
Feature GenerationGiven a set of measurements, the goal is to discover compact andinformative representations of the obtained data.
Examples1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Feature Generation/Dimensionality Reduction
Feature GenerationGiven a set of measurements, the goal is to discover compact andinformative representations of the obtained data.
Examples1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Feature Generation/Dimensionality Reduction
Feature GenerationGiven a set of measurements, the goal is to discover compact andinformative representations of the obtained data.
Examples1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Feature Generation/Dimensionality Reduction
Feature GenerationGiven a set of measurements, the goal is to discover compact andinformative representations of the obtained data.
Examples1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Dimension Reduction/Feature Extraction
DefinitionProcess to transform high-dimensional data into low-dimensional ones forimproving accuracy, understanding, or removing noises.
Why?Curse of dimensionality: Complexity grows exponentially in volume byadding extra dimensions.
29 / 56
Dimension Reduction/Feature Extraction
DefinitionProcess to transform high-dimensional data into low-dimensional ones forimproving accuracy, understanding, or removing noises.
Why?Curse of dimensionality: Complexity grows exponentially in volume byadding extra dimensions.
29 / 56
Feature SelectionFeature SelectionWhich features should be used for the classifier?
Why? The Curse of Dimensionality!!!
Hypothesis Testing to discriminate good features
30 / 56
Feature SelectionFeature SelectionWhich features should be used for the classifier?
Why? The Curse of Dimensionality!!!
Hypothesis Testing to discriminate good features30 / 56
What can be done?
Measures for Class SeparabilityExample: Between-class scatter matrix:
Sb =M∑
i=1Pi (µi − µ0) (µi − µ0)T (1)
Where:µ0 is the global mean vector, µ0 =
∑Mi=1 Piµi .
µi the median of class ωi .Pi ∼= ni
N .
31 / 56
What can be done?
Measures for Class SeparabilityExample: Between-class scatter matrix:
Sb =M∑
i=1Pi (µi − µ0) (µi − µ0)T (1)
Where:µ0 is the global mean vector, µ0 =
∑Mi=1 Piµi .
µi the median of class ωi .Pi ∼= ni
N .
31 / 56
What can be done?
Measures for Class SeparabilityExample: Between-class scatter matrix:
Sb =M∑
i=1Pi (µi − µ0) (µi − µ0)T (1)
Where:µ0 is the global mean vector, µ0 =
∑Mi=1 Piµi .
µi the median of class ωi .Pi ∼= ni
N .
31 / 56
What can be done?
Measures for Class SeparabilityExample: Between-class scatter matrix:
Sb =M∑
i=1Pi (µi − µ0) (µi − µ0)T (1)
Where:µ0 is the global mean vector, µ0 =
∑Mi=1 Piµi .
µi the median of class ωi .Pi ∼= ni
N .
31 / 56
What can be done?
Feature Subset SelectionExamples:
I Filter ApproachF All combinations of features are used together with a separability
measure.I Wrapper Approach:
F Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset SelectionExamples:
I Filter ApproachF All combinations of features are used together with a separability
measure.I Wrapper Approach:
F Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset SelectionExamples:
I Filter ApproachF All combinations of features are used together with a separability
measure.I Wrapper Approach:
F Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset SelectionExamples:
I Filter ApproachF All combinations of features are used together with a separability
measure.I Wrapper Approach:
F Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset SelectionExamples:
I Filter ApproachF All combinations of features are used together with a separability
measure.I Wrapper Approach:
F Use the decided classifier itself to find the best set.
32 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
33 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Classification
DefinitionA procedure dividing data into the given set of categories based on thetraining set in a supervised way.
What we want from classification?Generalization Vs. Specification
I Hard to achieve both
Avoid - overfitting/overtrainingI Early stoppingI Holdout validationI K-fold cross validationI Leave-one-out cross-validation
34 / 56
Avoid - overfitting/overtraining
Validation and Training ErrorUnderfitting Overfitting
Validation Error
Training Error
35 / 56
Examples of Classification Algorithms
Many Possible AlgorithmsLinear Classifiers: PerceptronProbability Classifiers: Naive BayesKernel Methods Classifiers : Support Vector MachinesNon-Linear Classifiers: Artificial Neural NetworksGraph Model Classifiers:. . .
36 / 56
Examples of Classification Algorithms
Many Possible AlgorithmsLinear Classifiers: PerceptronProbability Classifiers: Naive BayesKernel Methods Classifiers : Support Vector MachinesNon-Linear Classifiers: Artificial Neural NetworksGraph Model Classifiers:. . .
36 / 56
Examples of Classification Algorithms
Many Possible AlgorithmsLinear Classifiers: PerceptronProbability Classifiers: Naive BayesKernel Methods Classifiers : Support Vector MachinesNon-Linear Classifiers: Artificial Neural NetworksGraph Model Classifiers:. . .
36 / 56
Examples of Classification Algorithms
Many Possible AlgorithmsLinear Classifiers: PerceptronProbability Classifiers: Naive BayesKernel Methods Classifiers : Support Vector MachinesNon-Linear Classifiers: Artificial Neural NetworksGraph Model Classifiers:. . .
36 / 56
Examples of Classification Algorithms
Many Possible AlgorithmsLinear Classifiers: PerceptronProbability Classifiers: Naive BayesKernel Methods Classifiers : Support Vector MachinesNon-Linear Classifiers: Artificial Neural NetworksGraph Model Classifiers:. . .
36 / 56
Examples of Classification Algorithms
Many Possible AlgorithmsLinear Classifiers: PerceptronProbability Classifiers: Naive BayesKernel Methods Classifiers : Support Vector MachinesNon-Linear Classifiers: Artificial Neural NetworksGraph Model Classifiers:. . .
36 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
37 / 56
Clustering Analysis
DefinitionGrouping unlabeled data into clusters, for the purpose of inference ofhidden structures or information.
Using, for exampleDissimilarity measurement
Angle : Inner product, . . .Non-metric : Rank, Intensity, . . .Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
DefinitionGrouping unlabeled data into clusters, for the purpose of inference ofhidden structures or information.
Using, for exampleDissimilarity measurement
Angle : Inner product, . . .Non-metric : Rank, Intensity, . . .Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
DefinitionGrouping unlabeled data into clusters, for the purpose of inference ofhidden structures or information.
Using, for exampleDissimilarity measurement
Angle : Inner product, . . .Non-metric : Rank, Intensity, . . .Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
DefinitionGrouping unlabeled data into clusters, for the purpose of inference ofhidden structures or information.
Using, for exampleDissimilarity measurement
Angle : Inner product, . . .Non-metric : Rank, Intensity, . . .Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
DefinitionGrouping unlabeled data into clusters, for the purpose of inference ofhidden structures or information.
Using, for exampleDissimilarity measurement
Angle : Inner product, . . .Non-metric : Rank, Intensity, . . .Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Example
39 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering1 Basic Clustering Algorithms
1 K-means2 Clustering Based in Cost Functions
1 Fuzzy C-means2 Possibilistic
3 Hierarchical Clustering1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
41 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):Extraction of interesting information or patterns from data in largedatabases.
Alternative names and their “inside stories”:Knowledge discovery(mining) in databases (KDD)Knowledge extractionData/pattern analysisData archeologyBusiness intelligenceetc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):Extraction of interesting information or patterns from data in largedatabases.
Alternative names and their “inside stories”:Knowledge discovery(mining) in databases (KDD)Knowledge extractionData/pattern analysisData archeologyBusiness intelligenceetc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):Extraction of interesting information or patterns from data in largedatabases.
Alternative names and their “inside stories”:Knowledge discovery(mining) in databases (KDD)Knowledge extractionData/pattern analysisData archeologyBusiness intelligenceetc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):Extraction of interesting information or patterns from data in largedatabases.
Alternative names and their “inside stories”:Knowledge discovery(mining) in databases (KDD)Knowledge extractionData/pattern analysisData archeologyBusiness intelligenceetc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):Extraction of interesting information or patterns from data in largedatabases.
Alternative names and their “inside stories”:Knowledge discovery(mining) in databases (KDD)Knowledge extractionData/pattern analysisData archeologyBusiness intelligenceetc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):Extraction of interesting information or patterns from data in largedatabases.
Alternative names and their “inside stories”:Knowledge discovery(mining) in databases (KDD)Knowledge extractionData/pattern analysisData archeologyBusiness intelligenceetc.
42 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?1 Look up phone number in phone directory2 Query a Web search engine for information about “Amazon”
What is Data Mining?1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?1 Look up phone number in phone directory2 Query a Web search engine for information about “Amazon”
What is Data Mining?1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?1 Look up phone number in phone directory2 Query a Web search engine for information about “Amazon”
What is Data Mining?1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?1 Look up phone number in phone directory2 Query a Web search engine for information about “Amazon”
What is Data Mining?1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
44 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
ApplicationsMining the Web for Structured DataNear Neighbor Search in High Dimensional Data.
I Frequent itemsets and Association Rules
Structure of the webgraphI PageRankI Link Analysis
Proximity on GraphsMining data streams.Large scale supervised machine learning techniques.
45 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
46 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model1 On the one hand, we have items.2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)2 They are small.
Examples1 {Cat, and, dog, bites}2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}3 {Cat, killer, likely, is, a, big, dog}4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Then, we do the followingTransaction ID Cat Dog and a mated
1 1 1 1 0 02 1 1 1 1 13 1 1 0 1 04 0 1 0 0 0
48 / 56
Combinatorial Problem
ProblemHow many subsets we have?
But we can do the followingGiven the itemset x in a database D and a set of transactions {ti}i∈I
supp(x ,D) = |{ti ∈ D|x ∈ ti}| (2)
Then, setting a threshold εHow many frequent (supp(x ,D) > ε) itemsets?
49 / 56
Combinatorial Problem
ProblemHow many subsets we have?
But we can do the followingGiven the itemset x in a database D and a set of transactions {ti}i∈I
supp(x ,D) = |{ti ∈ D|x ∈ ti}| (2)
Then, setting a threshold εHow many frequent (supp(x ,D) > ε) itemsets?
49 / 56
Combinatorial Problem
ProblemHow many subsets we have?
But we can do the followingGiven the itemset x in a database D and a set of transactions {ti}i∈I
supp(x ,D) = |{ti ∈ D|x ∈ ti}| (2)
Then, setting a threshold εHow many frequent (supp(x ,D) > ε) itemsets?
49 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
50 / 56
Hardware Solutions: ASICS
Application-Specific Integrated Circuit (ASIC)An ASIC is an integrated circuit customized for a particular use, ratherthan intended for general-purpose use.
It allows for1 Lower Power Consumption.2 Better Colling Approaches.
Example: From Microsoft Research
51 / 56
Hardware Solutions: ASICS
Application-Specific Integrated Circuit (ASIC)An ASIC is an integrated circuit customized for a particular use, ratherthan intended for general-purpose use.
It allows for1 Lower Power Consumption.2 Better Colling Approaches.
Example: From Microsoft Research
51 / 56
Hardware Solutions: ASICS
Application-Specific Integrated Circuit (ASIC)An ASIC is an integrated circuit customized for a particular use, ratherthan intended for general-purpose use.
It allows for1 Lower Power Consumption.2 Better Colling Approaches.
Example: From Microsoft Research
51 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
52 / 56
Hardware Solutions: GPU’sIDEAS
Based on CUDA parallel computing architecture from NvidiaEmphasis on executing many concurrent LIGHT threads instead ofone HEAVY thread as in CPUs
Hardware for 8800
53 / 56
AdvantagesI Massively parallelI Hundreds of cores, millions of threadsI High throughput
LimitationsI May not be applicable for all tasksI Generic hardware (CPUs) closing the gap
54 / 56
Outline1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’sComplexityData Everywhere
2 Machine LearningMachine Learning ProcessFeaturesClassificationClustering Analysis
3 Data MiningDefinitionApplicationsExample: Frequent Itemsets
4 Hardware SupportASICSGPU’s
5 ProjectsWhat projects can you do?
55 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:Oil exploration detection.Association Rule Preprocessing Project.Neural Network-Based Financial Market Forecasting Project.Page Ranking - Improving over the Google MatrixInfluence Maximization in Social Networks.Web Word Relevance Measures.Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56