serious data mining - sim-flanders · iminds dept. medical it ku leuven esat-stadius serious data...
TRANSCRIPT
![Page 1: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/1.jpg)
iMinds Dept. MEDICAL IT
KU Leuven ESAT-STADIUS
Serious Data Mining
Prof.Dr. Bart De Moor
![Page 2: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/2.jpg)
Content
Big DataWhatWho
Six issues DataCompute InfrastructureStorage InfrastructureAnalyticsVisualizationSecurity & Privacy
Machine learning as a commodity
Expertise of ESAT-STADIUS, KU Leuven
Books & Spin-offsAlgorithmsApplications 2
![Page 3: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/3.jpg)
1 million = 1 000 0001 billion = 1 000 000 0001 trillion = 1 000 000 000 0001 quadrillion = 1 000 000 000 000 000
1 kB = 1 000 1 MB = 1 000 0001 GB = 1 000 000 0001 TB = 1 000 000 000 0001 PB = 1 000 000 000 000 000
1 TB = large university library= 212 DVD discs = 1430 CDs= 3 year music in CD quality
3
![Page 4: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/4.jpg)
4
![Page 5: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/5.jpg)
5
![Page 6: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/6.jpg)
6
![Page 7: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/7.jpg)
7
![Page 8: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/8.jpg)
Moore’s law:
computing power
doubles
every 18 months
Next GenerationSequencing
Carlson’s law:
complexity/cost
evolves
exponentially
8
![Page 9: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/9.jpg)
Genome data
9
• Human genome project (2003)
– 13 year project
– $300 million value with 2002 technology
• Personal genome (2007)
– Genome of James Watson, 2 months
– $1 000 000
• €1000-genome
– Expected 2012-2020
1,00E-07
1,00E-06
1,00E-05
1,00E-04
1,00E-03
1,00E-02
1,00E-01
1,00E+00
1,00E+01
1,00E+02
1,00E+03
1,00E+04
1,00E+05
1,00E+06
1,00E+07
1,00E+08
1,00E+09
1,00E+10
1,00E+11
1990 1995 2000 2002 2005 2007 2010 2015
Cost per base pair
Genome cost
GS-FLX Roche Applied Science 454
Sequencers
![Page 10: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/10.jpg)
Tsunami of medical data
PACS
UZ Leuven
1,6 PetaByte
Genomics core
HiSeq 2000 full
speed exome
sequencing
1 TeraByte / week
1 small
animal
image
1
GigaByte1 CD-ROM
750
MegaByte
sequencing all newborns
by 2020 (125k births /
year)
125 PetaByte / year
index of 20
million
Biomedical
PubMed
records
23 GigaByte
1 slice mouse
brain MSI at
10 μm
resolution
81 GigaByte
raw NGS data
of 1 full genome
1 TeraByte
10
![Page 11: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/11.jpg)
11
![Page 12: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/12.jpg)
Data explosion in finance
Growing ~ 30-50% every year, half of this is unstructured!
Data storage became 30% cheaper, yet budgets for data storage are still rising.
12
![Page 13: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/13.jpg)
How big is big?
?
Microsoft
1 million servers (2013)
Amazon
?
13
![Page 14: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/14.jpg)
14
![Page 15: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/15.jpg)
15
![Page 16: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/16.jpg)
16
![Page 17: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/17.jpg)
Content
Big DataWhatWho
Six issues DataCompute InfrastructureStorage InfrastructureAnalyticsVisualizationSecurity & Privacy
Machine learning as a commodity
Expertise of ESAT-STADIUS, KU Leuven
Books & Spin-offsAlgorithmsApplications 17
![Page 18: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/18.jpg)
Big Data
18
![Page 19: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/19.jpg)
Data
19
![Page 20: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/20.jpg)
Compute infrastructure
20
![Page 21: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/21.jpg)
Storage
21
![Page 22: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/22.jpg)
22
![Page 23: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/23.jpg)
AnalyticsFlowchart
23
![Page 24: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/24.jpg)
Visualization
24
![Page 25: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/25.jpg)
Security
25
![Page 26: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/26.jpg)
Content
Big DataWhatWho
Six issues DataCompute InfrastructureStorage InfrastructureAnalyticsVisualizationSecurity & Privacy
Machine learning as a commodity
Expertise of ESAT-STADIUS, KU Leuven
Books & Spin-offsAlgorithmsApplications 26
![Page 27: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/27.jpg)
Big Data Landscape
More and moreanalytics as a commodity!
27
![Page 28: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/28.jpg)
Machine Learning as a commodity
28
![Page 29: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/29.jpg)
Big Data LandscapeMany possible applications!
Energy
Industry
Environment
Social networks
Fraud and predictive analysis
Health
…
Focus onSerious Big Data
29
![Page 30: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/30.jpg)
Content
Big DataWhatWho
Six issues DataCompute InfrastructureStorage InfrastructureAnalyticsVisualizationSecurity & Privacy
Machine learning as a commodity
Expertise of ESAT-STADIUS, KU Leuven
Books & Spin-offsAlgorithmsApplications 30
![Page 31: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/31.jpg)
Analytics
Big Data Analytics
Numerical algorithms Rule-based
31
![Page 32: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/32.jpg)
Main tasks
Prediction Segmentation Anomalies
Regression Clustering
Classification
Outlier
32
![Page 33: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/33.jpg)
What can we do?
Shopping cart analysis
Fraud detection
Face recognition
Movie reccomendation
Just-in-time production
Credit worthiness
Disease spreadingTraffic management
33
![Page 34: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/34.jpg)
black blond
orange
blue
long
short
Hair color
length
Color clothes
FeauturesClustersSimilarityDecision
Clustering/Classification
34
![Page 35: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/35.jpg)
Forecasting
35
![Page 36: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/36.jpg)
Deep Learning● Neural networks.● New algorithms.● Multiple layers on
top of each other.● Each layer learns a
more complex representation.
● Learn feature hierarchies.
![Page 37: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/37.jpg)
Stadius - Books
37
![Page 38: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/38.jpg)
Stadius - Spin-offs
38
![Page 39: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/39.jpg)
DATA
ANALYTICS
Supervised learning Unsupervised learning Statistical methodsOther analytical
techniques
General
methodology: Dusan
Popovic, Marc
Claesen, Jack Simm,
Peter Roelants
Esemble Methods :
Dusan Popovic, Marc
Claesen, Jaak Simm
Semi-supervised
learning : Marc
Claesen
Large-scale learning :
Marc Claesen, Nico
Verbeeck
Hyperparameter
optimization : Marc
Claesen, Dusan
Popovic
Optimization meta-
heuristics : Dusan
Popovic, Maira
Rodriguez
Neural Networks :
Peter Roelants, Dusan
Popovic
Deep learning : Peter
Roelants
ICA : Nico Verbeeck
Clustering : Yousef el
Alamaat
Matrix factorization :
Jaak Simm
Manifold learning :
Xian Mao
Decision Trees :
Dusan Popovic, Jaak
Simm
Regularized
regression & GLM :
Yousef el Alamaat,
Arnaud Installe
Wavelet compression
:
Nico Verbeeck
Visualization : Toni
Verbeiren, Ryo Sakai
Multi-task learning :
Jaak Simm
Data fusion : Dusan
Popovic, Pooya Zakeri
, Gorana Nikolic
General
methodology:
Yousef el Alamaat,
Dusan Popovic, Marc
Claesen
SVMs & Kernel
Methods : Marc
Claesen, Jaak Simm,
Pooya Zakeri
Survival analysis :
Marc Claesen
Algorithms in Stadius
39
![Page 40: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/40.jpg)
Big DataApplications
Large-scalescience
Bio-Informatics
Data Assimilation
Health Smart City
Electricity
Internet of Things
Traffic
Finance
Risk Assesment
Stock Trading
Customer Relation
Management
Fraud Detection
Churn
Text Mining
Stadius
40
![Page 41: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/41.jpg)
BIOMEDICAL APPLICATIONS
Diseases/Bodily
processes/Tissues
Technological platforms/data
types
Cancer : Dusan Popovic, Xian
Mao
Diabetes : Marc Claesen
Endometriosis : Yousef el
Alamat, Dusan Popovic, Nico
Verbeek
Mass Spectrometry: Nico
Verbeek, Yousef el Alamat, Xian
Mao
Genomics : Inge Thijs, Dusan
Popovic, Maira Rodriguez, Ryo
Sakai
Proteomics : Inge Thijs, Yousef
el Alamat, Xian Mao, Ryo Sakai,
Pooya Zakeri
Microarrays : Dusan Popovic,
Griet Laenen, Pooya Zakeri
miRNA : Jaak Simm
RNA sequencing : Jaak Simm
NGS : Amin Ardeshirdavani, Jaak
Simm
Metagenomics : Jaak Simm
Rare genetic disorders : Dusan
Popovic, Ryo Sakai
IT support
Object-oriented programming
: Arnaud Installe, Amin
Ardeshirdavani, Gorana Nikolic,
Marc Claesen, Toni
Verbeiren,Xian Mao
Web applications : Amin
Ardeshirdavani, Gorana Nikolic,
Toni Verbeiren
Databases : Arnaud Installe,
Amin Ardeshirdavani, Gorana
Nikolic
Big data technologies (ex.
Hadoop , MapReduce) : Amin
Ardeshirdavani, Jaak Simm, Toni
Verbeiren
Distributed computing : Toni
Verbeiren
Glucose level monitoring : Tom
Van Herpe
Mouse brain : Nico Verbeek,
Yousef el Alamat, Xian Mao
High-performance computing :
Jaak Simm
Bacteria&Fungi : Jaak SimmAgent-based systems : Maira
Rodriguez
Clinical data : Arnaud Installe,
Dusan Popovic, Yousef el Alamat
Annotation data : Pooya Zakeri
Biological networks : Griet
Laenen, Pooya Zakeri
Drug discovery : Griet Laenen,
Pooya Zakeri
41
![Page 42: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/42.jpg)
Energy
Industry
Environment
Social networks
Fraud and predictive analysis
Health
42
![Page 43: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/43.jpg)
Power grid
43
![Page 44: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/44.jpg)
Electric load forecastingProblem
Energy Production
Energy Demand
How to forecast the demand?
44
![Page 45: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/45.jpg)
Electric load forecasting
Data sets
Elia KMI
Transmission system operator
Hourly data
10 substationsconsidered
Meteorological Institute
Weatherconditions
45
![Page 46: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/46.jpg)
Electric load forecasting
Seasonal Effects
What hour? What day? What week?
Weather Effects
Temperature? Sunny?
Accurate forecast
Model update: Every week!
46
![Page 47: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/47.jpg)
47
250 transformer substationsEvery 15 min, 5 years
1 post, 1 week 1 post, four seasons
![Page 48: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/48.jpg)
Electric load forecasting
48
![Page 49: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/49.jpg)
Seasonalities in the load: day, week, year, holidays
49
![Page 50: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/50.jpg)
6 posts, 1 yearSeasonalities, calender holidays !
50
![Page 51: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/51.jpg)
51
![Page 52: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/52.jpg)
- Seasonalities = a priori information (regress Monday on Monday !)- Normalization:
- remove effects of temperature, cloudiness,…. - remove effect of holidays – calender days (use dummies)
- Calculate ‘eigenprofiles’ = daily shape per post
52
![Page 53: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/53.jpg)
Energy
Industry
Environment
Social networks
Fraud and predictive analysis
Health
53
![Page 54: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/54.jpg)
Top 99 %
Bottom 1 %
steam 15.2 T/h
Top 99.75 % (99.8%)
Bottom 0.25 % (0.2 %)
steam 20 T/h
Top 99.8 % (99.8%)
Bottom 0.31 % (0.2 %)
steam 20 T/h
Setpoint
changes
Idealrank top
3 -> 2
54
![Page 55: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/55.jpg)
Modelling for control
900
901
902
903
900
901
902
903
100 200 300 400 500 600 700 800 900 1000
900
901
902
903
No control, Quasi steady state and fast control
100
0 10 20 30 40 50 60 70 80 90 100
Histograms
No c
on
trol
Qu
asi s
tea
dy
co
ntro
lF
ast d
yn
am
ic
co
ntro
l
55
![Page 56: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/56.jpg)
vgl,
hgl
qbg
vopw,
hopw
qman
qopw
K7 A
v1,
h1
E
vs, hs
qK7
qA qE
vs2, hs2
qs
vs3, hs3
D
qs2 qs3
vvg, hvg
qhopw
vs4, hs4
qs4 qhs
vhopw,
hhopw
qvs qD
q2
qzbopw qzb1
qgopw
qgafw
v3,
h3
qvopw
hvopw
qK18
q3 q4
vw,
hw
qK19 qK30
qbgopw
qK7lg
qgl
vbg,
hbg
qK7bg
qzb3
q7 Demer
Zw
artew
ater
Zwartebeek
Vlootgracht
Schulensmeer
Webbekom
Herk
G
ete
Beg
ijn
eb
eek
Velp
e
Leu
geb
eek
Gro
te L
eig
ra
ch
t
Ho
uw
ersb
eek
q6 q5
qK31
qzb2
v4,
h4
vzb,
hzb
qzw
q1
vlg,
hlg
qK24B
hbgopw
qK24A
vzw,
hzw
qzwopw
qhs2
v2,
h2
qh
qsa
MPC Estimator
disturbances
Model Based Predictive Control for Flood Regulation: Demer
56
![Page 57: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/57.jpg)
57
![Page 58: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/58.jpg)
Energy
Industry
Environment
Social networks
Fraud and predictive analysis
Health
58
![Page 59: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/59.jpg)
Data assimilation is the common name given to several numerical techniques thatcombine the outputs of a numerical model with observational data in order toimprove the quality of the model predictions.
Some data assimilation techniques: 3DVAR, 4DVAR, Ensemble Kalman Filter (EnKF)and its variants, Optimal Interpolation (OI), particle filters, etc.
Measurements
Numerical model
( 1) ( ), ( ) ( )k k k k x f x u Gw
Data assimilation techniques
EnKFDEnKF
ETKF
Parameters of the algorithms
Better estimation of
x(k)OI
Data Assimilation
59
![Page 60: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/60.jpg)
0 50 100 150 200 250 3000
20
40
60
80
100
120
140
Concentr
ation [
ug/m
3]
Time [Hours]
Measurements
Aurora (Model)
OI
DEnKF
O3 air-quality stationsAverage of the O3 concentration over
the validation stations
The Deterministic Ensemble Kalman Filter (DEnKF) and the OI technique have beenused to improve the O3 estimates of the Air-quality model AURORA.
- Validation stations
- Assimilation stations Starting date: May 28th, 2005 at midnight
3 E 4 E 5 E 6 E
50 N
51 N
2
46
40
73
22
784
1667
56
19
Data Assimilation
60
![Page 61: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/61.jpg)
o - Validation station
x - Assimilation station
3/g m
Optimal Interpolation
3/g m
DEnKF
3/g m
Free-run of Aurora
3/g m
Average of the O3 concentration field over the 14 day period
61
![Page 62: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/62.jpg)
PM10 air-quality stationsAverage of the PM10 concentration over
the validation stations
The Deterministic Ensemble Kalman Filter (DEnKF) and the OI technique have beenused to improve the PM10 estimates of the Air-quality model AURORA.
- Validation stations
- Assimilation stations Starting date: January 20th, 2010 at midnight
3 E 4 E 5 E 6 E
50 N
51 N 3
11
13
18
0 50 100 150 200 2500
20
40
60
80
100
120
140
Concentr
ation [
ug/m
3]
Time [Hours]
Measurements
Aurora
DEnKF
OI
Data Assimilation
62
![Page 63: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/63.jpg)
Average of the PM10 concentration field
o - Validation station
x - Assimilation station
Optimal Interpolation
3/g m
DEnKF
3/g m
Free-run of Aurora
3/g m
63
![Page 64: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/64.jpg)
Average of the PM10 concentration field
3/g m 3/g m64
![Page 65: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/65.jpg)
Energy
Industry
Environment
Social networks
Fraud and predictive analysis
Health
65
![Page 66: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/66.jpg)
Journal Clustering
Find all aboutspecific topic?
66
![Page 67: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/67.jpg)
Journal Clustering
67
![Page 68: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/68.jpg)
Journal Clustering
68
![Page 69: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/69.jpg)
Journal Clustering
69
![Page 70: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/70.jpg)
Author Collaboration Clustering
70
![Page 71: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/71.jpg)
Author Collaboration Clustering
71
![Page 72: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/72.jpg)
Golub around the world commemoration February 29 2008
72
![Page 73: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/73.jpg)
138 seed papers + all cited and citing publications
Result: 4943 nodes, 6216 edges
Link based clustering identifiestopically homogeneous clusters.
13 papers are writtenby another L. Ljung.
Web of Science based literature network for Lennart Ljung
73
![Page 74: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/74.jpg)
“Bibliometrics”
“Patent analysis” “Information retrieval”
“Social aspects”
“Webometrics”
Community detection
74
![Page 75: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/75.jpg)
Energy
Industry
Environment
Social networks
Fraud and predictive analysis
Health
75
![Page 76: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/76.jpg)
Customer Intelligence
76
![Page 77: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/77.jpg)
Customer Intelligence
Score & Target
Analyze & Model
Measure response
Contact 77
![Page 78: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/78.jpg)
Short
Duration Long
Duration High
Frequency International Same
Destination Off
Peak Call
Forwarding Behaviour
Change Direct call
selling X X X X PABX fraud
X X X X X Freephone
fraud X X X X Premium
rate fraud X X X X Subscription
fraud X Handset
theft X X X X X
Call frequency
Average call duration
Fraud detection on mobile phone network
78
![Page 79: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/79.jpg)
Energy
Industry
Environment
Social networks
Fraud and predictive analysis
Health
79
![Page 80: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/80.jpg)
Participatory
IOTA app:population based assessment of ovariantumour malignancy:
IOTA app available in iTunes app store and on http://homes.esat.kuleuven.be/~sistawww/biomed/iota/
Leuven
Malmö
Monza
London
Maurepas Paris
RomeNapels
Milan
# patients: 1066 + 1938
Lund
Lublin
Genk
Ontario, Canada
Bologna
Sardinia
Beijing, China
80
![Page 81: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/81.jpg)
Performance
Performance of an expert Performance of IOTA models
Performance of old models
Performance of non-experts
You share, we care ! 81
![Page 82: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/82.jpg)
© Armstrong SA et al. Nat Genet. 2002 Jan;30(1):41-7. 12 600 genes 72 patients
- 28 Acute Lymphoblastic Leukemia (ALL)- 24 Acute Myeloid Leukemia (AML)- 20 Mixed Linkage Leukemia (MLL)
Genomic markers for Leukemia
82
![Page 83: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/83.jpg)
High-throughputgenomics
Data analysis Candidate genes
Information sources
Candidate prioritization
Validation
Endeavour: Aerts et al., Nature Biotechnology, 2006
Genomic Data Fusion
83
![Page 84: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/84.jpg)
84
![Page 85: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/85.jpg)
85
![Page 86: Serious Data Mining - SIM-Flanders · iMinds Dept. MEDICAL IT KU Leuven ESAT-STADIUS Serious Data Mining Prof.Dr. Bart De Moor Bart.DeMoor@iminds.be 1](https://reader031.vdocuments.mx/reader031/viewer/2022031507/5c94dbe409d3f2a67b8baa68/html5/thumbnails/86.jpg)
86