international master's cb degree programme in da ... · computational big data analytics ......
TRANSCRIPT
International Master's Degree Programme in
Computational Big Data Analytics
International Master's Degree Programme in
Computational Big Data Analytics
CBDA
Data analytics is not justData analytics is not justnumbers in a table or graphs onnumbers in a table or graphs ona paper!a paper!
Data analytics is a means forsociety, industry, and science tocontrol uncertaintyand to makediscoveries!
Costello et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nature Biotechnology, 2014.
A. Ilin, H. Valpola and E. Oja. Exploratory Analysis of Climate Data Using Source Separation Methods. Neural Networks, 19(2):155-167, 2006.
The spatial patterns of the four leading interannual components extracted from climate data.
José Caldas, Nils Gehlenborg, Ali Faisal, Alvis Brazma, and Samuel Kaski. Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics,25:i145–i153, 2009.
Jaakko Peltonen and Samuel Kaski. Generative Modeling for Maximizing Precision and Recall in Information Visualization. In Geoffrey Gordon, David Dunson, and MiroslavDudik, eds., Proceedings of AISTATS 2011, the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP, vol. 15, 2011.
Big data for big business - analyticsBig data for big business - analyticsare no longer optional are no longer optional (The Globe and(The Globe andMail, August 2015)Mail, August 2015)
Lawyers Are Turning to Big DataLawyers Are Turning to Big DataAnalysis Analysis (The National Law Journal,(The National Law Journal,July 2015)July 2015)
Why big data isn't always theWhy big data isn't always theanswer answer (ComputerWorld, August 2015)(ComputerWorld, August 2015)
Intel Unveils Analytics TechnologiesIntel Unveils Analytics Technologiesfor Big Data, IoT for Big Data, IoT (eWeek, August 2015)(eWeek, August 2015)
Making Sense of Our Big Data World:Making Sense of Our Big Data World:Statistics for the 99% Statistics for the 99% (Business 2(Business 2Community, August 2015)Community, August 2015)
Put big data to work with CortanaPut big data to work with CortanaAnalytics Analytics (TechRepublic, July 2015)(TechRepublic, July 2015)
How the age of Big Data madeHow the age of Big Data madestatistics the hottest job aroundstatistics the hottest job around(Canadian Business, April 2015)(Canadian Business, April 2015)
'Big data' useful but caution is still'Big data' useful but caution is stillneeded needed (Daily Record, August 2015)(Daily Record, August 2015)
How To Identify A Good/Bad DataHow To Identify A Good/Bad DataScientist In A Job Interview?Scientist In A Job Interview?(LinkedIn, August 2015)(LinkedIn, August 2015)
What can big data do for smallWhat can big data do for smallstartups? startups? (VentureBurn, August 2015)(VentureBurn, August 2015)
Growth in big data drawsGrowth in big data drawswomen to statistics women to statistics (FWC.com,(FWC.com,February 2015)February 2015)
Data Scientist: The Sexiest Job ofData Scientist: The Sexiest Job ofthe 21st Century the 21st Century (Harvard Business(Harvard BusinessReview, October 2012)Review, October 2012)
Why your kids will want to be dataWhy your kids will want to be datascientists scientists (CNBC, June 2014)(CNBC, June 2014)
The roots of statisticsare in probability theory, which
begun from investigation of games of chance.
The roots of statisticsare in probability theory, which
begun from investigation of games of chance.
Statistics in CBDAStatistics in CBDA
International Master's Degree Programme in Computational Big Data AnalyticsInternational Master's Degree Programme in Computational Big Data Analytics
Statistics in the CBDA programme:Large data sets incude many kinds of variation. Expertise isneeded to go from mere measurements to models andunderstanding.
It is hard to judge based only on looking which of the possibletrends are ”real” and which ones are only coincidences.Computers can search for possible trends among large sets ofalternatives, but they need to be told how to evaluate thegoodness of the findings.
Statistics studies in CBDA tell:● what kinds of statistical structure and trends to look for ● how to measure whether they are ”real”● tools and methods to find them and to present the results
CBDA
Statistics is versatile data analysis including management of chanceand variation, extraction of information from data and modeling.
Statistics has a close connection to data mining and machinelearning - in CBDA this connection becomes strongly visible.
An important modern trend is computational statistics, whereinteresting nonlinear characteristics are sought from data sets, andcomplicated models are solved e.g. by advanced and distributedoptimization and computation methods. CBDA teaching in statisticsand computer science enables you to use computational statistics.
Our teaching familiarizes you with the central theory, most importantmethods of data acquisition and analysis, and how to apply these in acomputer based fashion.Distributions, prediction, hypothesis testing, time series analysis, multivariatemethods, information visualization, learning from multiple sources...
Statistics is versatile data analysis including management of chanceand variation, extraction of information from data and modeling.
Statistics has a close connection to data mining and machinelearning - in CBDA this connection becomes strongly visible.
An important modern trend is computational statistics, whereinteresting nonlinear characteristics are sought from data sets, andcomplicated models are solved e.g. by advanced and distributedoptimization and computation methods. CBDA teaching in statisticsand computer science enables you to use computational statistics.
Our teaching familiarizes you with the central theory, most importantmethods of data acquisition and analysis, and how to apply these in acomputer based fashion.Distributions, prediction, hypothesis testing, time series analysis, multivariatemethods, information visualization, learning from multiple sources...
Statistics in CBDAStatistics in CBDA
Poor use of measurementsand statistics can lead tofalse and misleading conclusions
Poor use of measurementsand statistics can lead tofalse and misleading conclusions
Statistics in CBDAStatistics in CBDA
”The numbers have taken over.Numbers lie and are misused. Theyare used to prove just anything.People believe in numbers even ifthey have been computedincorrectly.”
”The amount of random chance istoo large” (discussion ofconclusions of a research study)
Oakland A's GM Billy Beane isOakland A's GM Billy Beane ishandicapped with the lowesthandicapped with the lowestsalary constraint in baseball. Ifsalary constraint in baseball. Ifhe ever wants to win the Worldhe ever wants to win the WorldSeries, Billy must find aSeries, Billy must find acompetitive advantage. Billy iscompetitive advantage. Billy isabout to turn baseball on its earabout to turn baseball on its earwhen when he uses statistical datahe uses statistical datato analyze and place valueto analyze and place valueon the players he picks foron the players he picks forthe team.the team.
"geek-stats book turned into a"geek-stats book turned into amovie with a lot of heart"movie with a lot of heart"
"persuasively exposed front"persuasively exposed frontoffice tension between ... oldoffice tension between ... oldschool "eye-balling" of playersschool "eye-balling" of playersand newer models of data-and newer models of data-driven statistical analysis”driven statistical analysis”
Texts from IMDB, Wikipedia
Thomas Bayess. 1702Thomas Bayess. 1702
Pierre-SimonLaplaces. 1749
Pierre-SimonLaplaces. 1749
BlaisePascals. 1623
BlaisePascals. 1623
CarlFriedrichGausss. 1777
CarlFriedrichGausss. 1777
Karl Pearsons. 1857Karl Pearsons. 1857
Ronald Fishers. 1890Ronald Fishers. 1890
Peter HallUniversity of Melbourne
George BoxUniversity of
Wisconsin Madison
Bradley EfronStanford University
Robert TibshiraniStandford University
Trevor HastieStanford University
Jianqing FanPrinceton University
Peter J. BickelUniversity of
California Berkeley
James Stephen MarronUniversity of North Carolina Chapel Hill
Donald RubinHarvard University
Erich Leo LehmannUniversity of
California Berkeley
Raymond CarrollTexas A&M University
Theodore W. AndersonStanford University
Kanti V. MardiaUniversity of Leeds
Dan-Yu LinUniversity of North
Carolina Chapel Hill
David DonohoStanford University
James BergerDuke University
Gareth O. RobertsUniversity of Warwick
David O. SiegmundStanford University
John W. TukeyPrinceton University
Enno MammenUniversity of Mannheim
David RuppertMoscow State
Pedagogical University
Ingram OlkinStanford University
David A. FreedmanUniversity of
California Berkeley
Ole Barndorff-NielsenAarhus University
Alan GelfandDuke University
Wolfgang Karl HärdleHumboldt University
of Berlin
Michael B. WoodroofeUniversity of Michigan
Joseph G. IbrahimUniversity of North
Carolina Chapel Hill
George CasellaUniversity of Florida
Hans-Georg MullerUniversity of
California Davis
Andrew GelmanColumbia University
Peter BuhlmannETH Zurich
Alexandre TsybakovCREST & Universite
Paris VI
Peter J. RousseeuwUniversity of Antwerp
Hira Lal KoulMichigan State UniversityPeter Diggle
Lancaster University
Iain M. JohnstoneStanford University
Bernard W. SilvermanUniversity of Oxford
Jerome H. FriedmanThe MITRE Corporation
Harvey GoldsteinUniversity of Bristol
Holger DetteRuhr University Bochum
David B. DunsonDuke University
Hirotugu AkaikeInstitute of
Statistical Mathematics
Christian P. RobertParis Dauphine University
Jon A. WellnerUniversity of Washington
Alan AgrestiUniversity of Florida
Irene GijbelsCatholic University
of Leuven
Stephen L. PortnoyUniversity of IllinoisUrbana-Champaign
Norman R. DraperUniversity of
Wisconsin Madison
Noel CressieOhio StateUniversity
Paul RosenbaumUniversity ofPennsylvania
Nancy ReidUniversity of
Toronto
Marc HallinUniversite Libre
de Bruxelles
Marc YorPierre and MarieCurie University
Bruce LindsayPennsylvania
State University
Murad TaqquBoston University
William E. StrawdermanRutgers, the State
University of New Jersey
Persi DiaconisStanford University Luc Devroye
McGill University
Leo BreimanUniversity of
California Berkeley
Adrian RafteryUniversity ofWashington
Ricardo FraimanUniversidad de
San AndresBuenos Aires
Peter M. RobinsonLondon School ofEconomics andPolitical Science Richard David Gill
Leiden University
Peter HallUniversity of Melbourne
George BoxUniversity of
Wisconsin Madison
Bradley EfronStanford University
Robert TibshiraniStandford University
Trevor HastieStanford University
Jianqing FanPrinceton University
Peter J. BickelUniversity of
California Berkeley
James Stephen MarronUniversity of North Carolina Chapel Hill
Donald RubinHarvard University
Erich Leo LehmannUniversity of
California Berkeley
Raymond CarrollTexas A&M University
Theodore W. AndersonStanford University
Kanti V. MardiaUniversity of Leeds
Dan-Yu LinUniversity of North
Carolina Chapel Hill
David DonohoStanford University
James BergerDuke University
Gareth O. RobertsUniversity of Warwick
David O. SiegmundStanford University
John W. TukeyPrinceton University
Enno MammenUniversity of Mannheim
David RuppertMoscow State
Pedagogical University
Ingram OlkinStanford University
David A. FreedmanUniversity of
California Berkeley
Ole Barndorff-NielsenAarhus University
Alan GelfandDuke University
Wolfgang Karl HärdleHumboldt University
of Berlin
Michael B. WoodroofeUniversity of Michigan
Joseph G. IbrahimUniversity of North
Carolina Chapel Hill
George CasellaUniversity of Florida
Hans-Georg MullerUniversity of
California Davis
Andrew GelmanColumbia University
Peter BuhlmannETH Zurich
Alexandre TsybakovCREST & Universite
Paris VI
Peter J. RousseeuwUniversity of Antwerp
Hira Lal KoulMichigan State UniversityPeter Diggle
Lancaster University
Iain M. JohnstoneStanford University
Bernard W. SilvermanUniversity of Oxford
Jerome H. FriedmanThe MITRE Corporation
Harvey GoldsteinUniversity of Bristol
Holger DetteRuhr University Bochum
David B. DunsonDuke University
Hirotugu AkaikeInstitute of
Statistical Mathematics
Christian P. RobertParis Dauphine University
Jon A. WellnerUniversity of Washington
Alan AgrestiUniversity of Florida
Irene GijbelsCatholic University
of Leuven
Stephen L. PortnoyUniversity of IllinoisUrbana-Champaign
Norman R. DraperUniversity of
Wisconsin Madison
Noel CressieOhio StateUniversity
Paul RosenbaumUniversity ofPennsylvania
Nancy ReidUniversity of
Toronto
Marc HallinUniversite Libre
de Bruxelles
Marc YorPierre and MarieCurie University
Bruce LindsayPennsylvania
State University
Murad TaqquBoston University
William E. StrawdermanRutgers, the State
University of New Jersey
Persi DiaconisStanford University Luc Devroye
McGill University
Leo BreimanUniversity of
California Berkeley
Adrian RafteryUniversity ofWashington
Ricardo FraimanUniversidad de
San AndresBuenos Aires
Peter M. RobinsonLondon School ofEconomics andPolitical Science Richard David Gill
Leiden University
YouUniversity of Tampere
A data scientist, combining expertise in statistics and computerscience, will work in cooperation with experts from other fields.
Application areas:
● Technology and natural sciences (technometrics, chemometrics)● Biology (biometrics, see e.g.
http://www.uta.fi/hes/tutkimus/tutkimusryhmat/Biometria.html)● Medicine (epidemiology)● Economics (econometrics)● Social and behavioral sciences (demometrics, psychometrics)
Jobs for Data Analytics Experts (DataScientists)Jobs for Data Analytics Experts (DataScientists)
See also (in Finnish) http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoittuminen.pdf
Optional studies can influence which field the student ends up in.
An example of statistics jobs (in Finnish): http://www.luonnontieteet.fi/tyo/tilastotiede
Examples of statistics jobs and employers (in Finnish)http://www.uta.fi/rekrytointi/opiskelijalle_ja_tyonhakijalle/uraseuranta/oppiainekoosteet/tilastotiede.html
Jobs for Data Analytics Experts (DataScientists)Jobs for Data Analytics Experts (DataScientists)
The career and recruitment services of the University of Tampere http://www.uta.fi/rekrytointi monitors placement of graduates in theworking life http://www.uta.fi/opiskelu/tyoelama/seurannat/index.html
A slightly old report on 2011 master's degree graduateshttp://www.uta.fi/opiskelu/tyoelama/seurannat/maisterit/index/sijoittumisseuranta%202011.pdf (1 year from graduation all statistics students were in permanent ortemporary jobs or as researchers funded by grants)
Tales from students of mathematics and statistics about studies andplacement in the working life http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoittuminen.pdf“researcher in government research institute”, “mathematician in agovernment agency research unit”, “head of quality control in anindustrial company”, “Data Mining analyst”
Jobs for Data Analytics Experts (DataScientists), information from graduatesJobs for Data Analytics Experts (DataScientists), information from graduates
Master's programme in Computational Big Data Analytics (CBDA)General Studies in Master's Degree Programmes given in English 2015-18 1–22 ECTS
General studies in the Master's degree programmes given in English are different depending on the student's educational background. Please choose below only one of the three options A, B or C.
B) General studies forstudents with education inFinnish and BSc degree takenoutside SIS 9–18 crCompulsory studies 9–13 crSwedish course is required onlyif no Swedish studies were takenin the Bachelor's degree.● SISYY006 Orientation, 2 cr● SISYY005 Study Skills and
Personal Study Planning, 2 cr● KKENMP3 Scientific Writing, 5
cr● KKRULUK Ruotsin kielen
kirjallinen ja suullinenviestintä, 4 cr
Free-choice studies 0–5 cr● YKYYV07 Introduction to
Science and Research, 2–5 cr
C) General studies forstudents who have taken theirBSc degree at SIS 1–11 crCompulsory studies 1 crBasics of Information Literacy 1cr is not required, only Personalstudy planning 1 cr fromSISYY005.● SISYY005 Study Skills and
Personal Study Planning, 2 cr
Free-choice studies 0–10 crScientific Writing isrecommended if the Master'sthesis is written in English.● KKENMP3 Scientific Writing,
5 cr● YKYYV07 Introduction to
Science and Research, 2–5cr
A) General studies forinternational students 12–22crCompulsory studies 12 cr● SISYY006 Orientation, 2 cr● SISYY005 Study Skills and
Personal Study Planning, 2 cr● KKENMP3 Scientific Writing,
5 cr● KKSU1 Finnish Elementary
Course 1, 3 cr
Free-choice studies 0–10 cr● YKYYKV1 Finnish Society
and Culture, 3–5 cr● YKYYV07 Introduction to
Science and Research, 2–5cr
Master's programme in Computational Big Data Analytics (CBDA)Advanced Studies in Big Data Analytics 85 cr
Compulsory AdvancedCourses in Big DataAnalytics 50 cr● MTTTS11 Master's
Seminar and Thesis, 40 cr● MTTTS12 Introduction to
Bayesian Analysis 1, 5 cr● TIETS01 Algorithms, 5 cr
Advanced Courses inMethods ofComputational Data-Analytics 15– cr● TIETS07
Neurocomputing, 5 cr● TIETS11 Data Mining, 5
cr● TIETS31 Knowledge
Discovery, 5–10 cr● TIETS39 Machine
Learning Algorithms, 5 cr● TIETS33 Advanced
Course in ComputerScience, 1–10 cr
Advanced Courses inMethods of StatisticalData-Analytics 20– cr● MTTTS13 Introduction to
Bayesian Analysis 2, 5 cr● MTTTS14 Statistical
Modeling 1, 5 cr● MTTTS15 Statistical
Modeling 2, 5 cr● MTTTS16 Learning from
Multiple Sources, 5 cr● MTTTS17 Dimensionality
Reduction andVisualization, 5 cr
● MTTTS18 Time SeriesAnalysis 1, 5 cr
● MTTTS19 AdvancedRegression Methods, 5 cr
● MTTTS21 StatisticalInference 2, 5 cr
● MTTS1 Other course(advanced)
Master's programme in Computational Big Data Analytics (CBDA)Other and optional Studies in Big Data Analytics Programme 13–29 cr
Compulsory IntroductoryStudies 5 cr● TIETA17 Introduction to
Big Data Processing, 5 cr
Complementing Studies
Complementing studiesdetermined based onprevious education
Optional Studies
Recommended studies inApplications of Data-Analytics● TIETS05 Digital Image
Processing, 5 cr● MTTTS20 Basics of Financial
Data-Analysis and RiskTheory, 5 cr
● ITIS13 Information retrievalmethods, 5 cr
● ITIS16 Information practicesliterature, 5–20 cr
● MTTA3 Internship, 2–10 cr
CBDA Courses Fall 2015
I: Introduction to Bayesian Analysis 1Prior and posterior distributions, Bayesestimators, posterior predictive distribution,interval estimation and hypothesis testing,single-parameter models, simplemultiparameter models.
I-II: Learning from Multiple SourcesData fusion, transfer learning, multitasklearning, multiview learning, and learningunder covariate shift
II: Time Series Analysis 1Simple time series models, stationary timeseries models (ARMA), nonstationary andseasonal time series models (SARIMA), timeseries regression, periodogram.
(Master's thesis and seminar runsevery fall and spring.)
I: Introduction to Big Data ProcessingTypical characteristics and commonapplications of big data; basics of distributedfile systems, databases and computing;practical data processing skills withMapReduce / Apache Hadoop
I-IV: Information practices literatureLiterature package on either: Informationpractices; Information retrieval systems;Interactive information retrieval; task-basedinformation retrieval
I-II: Knowledge Discoveryphases of the process of knowledgediscovery and its nature; basic dataprepocessing, data mining andpostprocessing tasks and methods;application in practical knowledge discoverytasks; advanced methods in knowledgediscovery; data management issues
CBDA Courses Spring 2016
III: Introduction to Bayesian Analysis 2Markov chains, MCMC methods, modelchecking and comparison, commonly usedstatistical models, such as hierarchical andregression models, binomial and count datamodels.
III-IV: Dimensionality Reduction andVisualizationProperties of high-dim data; FeatureSelection; Linear feature extraction; Graphicalexcellence; Human perception; Nonlineardimensionality reduction; Neighbor embeddingmethods; Graph visualization.
IV: Statistical Inference 2Roles of Modeling in Statistical Inference,Principles of Data Reduction,Estimation: Risk, Loss of estimators,... Largesample propertiesLikelihood-Based Methods, likelihood-basedtests and confidence regions
III: Data Miningpremises, objectives, relevance, and basicmethods of data mining; properties of dataand measurements, preprocessing methods,some data mining algorithms and theirapplications, for instance, for classificationand prediction of data.
I-IV: Information practices literatureLiterature package on either: Informationpractices; Information retrieval systems;Interactive information retrieval; task-basedinformation retrieval
IV: Machine Learning Algorithmsbasic and advanced machine learningmethods for data mining, pattern recognitionand other problems
CBDA Statistics Courses Fall 2016 (preliminary!) Spring 2017 (preliminary!)
I: Introduction to Bayesian Analysis 1Prior and posterior distributions, Bayesestimators, posterior predictive distribution,interval estimation and hypothesis testing,single-parameter models, simplemultiparameter models.
I-II: Learning from Multiple SourcesData fusion, transfer learning, multitasklearning, multiview learning, and learningunder covariate shift
II: Possibly ”Basics of financial dataanalysis and risk theory 5cr”, oranother course
III: Statistical Modeling 1Multinomial and ordinal regression,nonlinear regression, parametric survivalanalysis, counting process models,semiparametric hazard models.
III-IV: Dimensionality Reduction andVisualizationProperties of high-dim data; FeatureSelection; Linear feature extraction;Graphical excellence; Human perception;Nonlinear dimensionality reduction;Neighbor embedding methods; Graphvisualization.
IV: Statistical Modeling 2Normal mixed model and extensions,growth curve models, models for paneldiscrete (binary,count, categorical)observations, analysis of missing data,mixture or latent class regression,hierarchical and latent structure models