international master's cb degree programme in da ... · computational big data analytics ......

34
International Master's Degree Programme in Computational Big Data Analytics International Master's Degree Programme in Computational Big Data Analytics CB DA

Upload: vuminh

Post on 11-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

International Master's Degree Programme in

Computational Big Data Analytics

International Master's Degree Programme in

Computational Big Data Analytics

CBDA

MakeMake

BIG SENSEBIG SENSEoutof

outof BIG DATABIG DATA

Data analytics is not justData analytics is not justnumbers in a table or graphs onnumbers in a table or graphs ona paper!a paper!

Data analytics is a means forsociety, industry, and science tocontrol uncertaintyand to makediscoveries!

Costello et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nature Biotechnology, 2014.

Lähde: Morningstar Stock Report, morningstar.fi

A. Ilin, H. Valpola and E. Oja. Exploratory Analysis of Climate Data Using Source Separation Methods. Neural Networks, 19(2):155-167, 2006.

The spatial patterns of the four leading interannual components extracted from climate data.

?

? ? ?

José Caldas, Nils Gehlenborg, Ali Faisal, Alvis Brazma, and Samuel Kaski. Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics,25:i145–i153, 2009.

Jaakko Peltonen and Samuel Kaski. Generative Modeling for Maximizing Precision and Recall in Information Visualization. In Geoffrey Gordon, David Dunson, and MiroslavDudik, eds., Proceedings of AISTATS 2011, the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP, vol. 15, 2011.

Big data for big business - analyticsBig data for big business - analyticsare no longer optional are no longer optional (The Globe and(The Globe andMail, August 2015)Mail, August 2015)

Lawyers Are Turning to Big DataLawyers Are Turning to Big DataAnalysis Analysis (The National Law Journal,(The National Law Journal,July 2015)July 2015)

Why big data isn't always theWhy big data isn't always theanswer answer (ComputerWorld, August 2015)(ComputerWorld, August 2015)

Intel Unveils Analytics TechnologiesIntel Unveils Analytics Technologiesfor Big Data, IoT for Big Data, IoT (eWeek, August 2015)(eWeek, August 2015)

Making Sense of Our Big Data World:Making Sense of Our Big Data World:Statistics for the 99% Statistics for the 99% (Business 2(Business 2Community, August 2015)Community, August 2015)

Put big data to work with CortanaPut big data to work with CortanaAnalytics Analytics (TechRepublic, July 2015)(TechRepublic, July 2015)

How the age of Big Data madeHow the age of Big Data madestatistics the hottest job aroundstatistics the hottest job around(Canadian Business, April 2015)(Canadian Business, April 2015)

'Big data' useful but caution is still'Big data' useful but caution is stillneeded needed (Daily Record, August 2015)(Daily Record, August 2015)

How To Identify A Good/Bad DataHow To Identify A Good/Bad DataScientist In A Job Interview?Scientist In A Job Interview?(LinkedIn, August 2015)(LinkedIn, August 2015)

What can big data do for smallWhat can big data do for smallstartups? startups? (VentureBurn, August 2015)(VentureBurn, August 2015)

Growth in big data drawsGrowth in big data drawswomen to statistics women to statistics (FWC.com,(FWC.com,February 2015)February 2015)

Data Scientist: The Sexiest Job ofData Scientist: The Sexiest Job ofthe 21st Century the 21st Century (Harvard Business(Harvard BusinessReview, October 2012)Review, October 2012)

Why your kids will want to be dataWhy your kids will want to be datascientists scientists (CNBC, June 2014)(CNBC, June 2014)

The roots of statisticsare in probability theory, which

begun from investigation of games of chance.

The roots of statisticsare in probability theory, which

begun from investigation of games of chance.

Statistics in CBDAStatistics in CBDA

International Master's Degree Programme in Computational Big Data AnalyticsInternational Master's Degree Programme in Computational Big Data Analytics

Statistics in the CBDA programme:Large data sets incude many kinds of variation. Expertise isneeded to go from mere measurements to models andunderstanding.

It is hard to judge based only on looking which of the possibletrends are ”real” and which ones are only coincidences.Computers can search for possible trends among large sets ofalternatives, but they need to be told how to evaluate thegoodness of the findings.

Statistics studies in CBDA tell:● what kinds of statistical structure and trends to look for ● how to measure whether they are ”real”● tools and methods to find them and to present the results

CBDA

Statistics is versatile data analysis including management of chanceand variation, extraction of information from data and modeling.

Statistics has a close connection to data mining and machinelearning - in CBDA this connection becomes strongly visible.

An important modern trend is computational statistics, whereinteresting nonlinear characteristics are sought from data sets, andcomplicated models are solved e.g. by advanced and distributedoptimization and computation methods. CBDA teaching in statisticsand computer science enables you to use computational statistics.

Our teaching familiarizes you with the central theory, most importantmethods of data acquisition and analysis, and how to apply these in acomputer based fashion.Distributions, prediction, hypothesis testing, time series analysis, multivariatemethods, information visualization, learning from multiple sources...

Statistics is versatile data analysis including management of chanceand variation, extraction of information from data and modeling.

Statistics has a close connection to data mining and machinelearning - in CBDA this connection becomes strongly visible.

An important modern trend is computational statistics, whereinteresting nonlinear characteristics are sought from data sets, andcomplicated models are solved e.g. by advanced and distributedoptimization and computation methods. CBDA teaching in statisticsand computer science enables you to use computational statistics.

Our teaching familiarizes you with the central theory, most importantmethods of data acquisition and analysis, and how to apply these in acomputer based fashion.Distributions, prediction, hypothesis testing, time series analysis, multivariatemethods, information visualization, learning from multiple sources...

Statistics in CBDAStatistics in CBDA

Poor use of measurementsand statistics can lead tofalse and misleading conclusions

Poor use of measurementsand statistics can lead tofalse and misleading conclusions

Statistics in CBDAStatistics in CBDA

”The numbers have taken over.Numbers lie and are misused. Theyare used to prove just anything.People believe in numbers even ifthey have been computedincorrectly.”

”The amount of random chance istoo large” (discussion ofconclusions of a research study)

Oakland A's GM Billy Beane isOakland A's GM Billy Beane ishandicapped with the lowesthandicapped with the lowestsalary constraint in baseball. Ifsalary constraint in baseball. Ifhe ever wants to win the Worldhe ever wants to win the WorldSeries, Billy must find aSeries, Billy must find acompetitive advantage. Billy iscompetitive advantage. Billy isabout to turn baseball on its earabout to turn baseball on its earwhen when he uses statistical datahe uses statistical datato analyze and place valueto analyze and place valueon the players he picks foron the players he picks forthe team.the team.

"geek-stats book turned into a"geek-stats book turned into amovie with a lot of heart"movie with a lot of heart"

"persuasively exposed front"persuasively exposed frontoffice tension between ... oldoffice tension between ... oldschool "eye-balling" of playersschool "eye-balling" of playersand newer models of data-and newer models of data-driven statistical analysis”driven statistical analysis”

Texts from IMDB, Wikipedia

Thomas Bayess. 1702Thomas Bayess. 1702

Pierre-SimonLaplaces. 1749

Pierre-SimonLaplaces. 1749

BlaisePascals. 1623

BlaisePascals. 1623

CarlFriedrichGausss. 1777

CarlFriedrichGausss. 1777

Karl Pearsons. 1857Karl Pearsons. 1857

Ronald Fishers. 1890Ronald Fishers. 1890

Peter HallUniversity of Melbourne

George BoxUniversity of

Wisconsin Madison

Bradley EfronStanford University

Robert TibshiraniStandford University

Trevor HastieStanford University

Jianqing FanPrinceton University

Peter J. BickelUniversity of

California Berkeley

James Stephen MarronUniversity of North Carolina Chapel Hill

Donald RubinHarvard University

Erich Leo LehmannUniversity of

California Berkeley

Raymond CarrollTexas A&M University

Theodore W. AndersonStanford University

Kanti V. MardiaUniversity of Leeds

Dan-Yu LinUniversity of North

Carolina Chapel Hill

David DonohoStanford University

James BergerDuke University

Gareth O. RobertsUniversity of Warwick

David O. SiegmundStanford University

John W. TukeyPrinceton University

Enno MammenUniversity of Mannheim

David RuppertMoscow State

Pedagogical University

Ingram OlkinStanford University

David A. FreedmanUniversity of

California Berkeley

Ole Barndorff-NielsenAarhus University

Alan GelfandDuke University

Wolfgang Karl HärdleHumboldt University

of Berlin

Michael B. WoodroofeUniversity of Michigan

Joseph G. IbrahimUniversity of North

Carolina Chapel Hill

George CasellaUniversity of Florida

Hans-Georg MullerUniversity of

California Davis

Andrew GelmanColumbia University

Peter BuhlmannETH Zurich

Alexandre TsybakovCREST & Universite

Paris VI

Peter J. RousseeuwUniversity of Antwerp

Hira Lal KoulMichigan State UniversityPeter Diggle

Lancaster University

Iain M. JohnstoneStanford University

Bernard W. SilvermanUniversity of Oxford

Jerome H. FriedmanThe MITRE Corporation

Harvey GoldsteinUniversity of Bristol

Holger DetteRuhr University Bochum

David B. DunsonDuke University

Hirotugu AkaikeInstitute of

Statistical Mathematics

Christian P. RobertParis Dauphine University

Jon A. WellnerUniversity of Washington

Alan AgrestiUniversity of Florida

Irene GijbelsCatholic University

of Leuven

Stephen L. PortnoyUniversity of IllinoisUrbana-Champaign

Norman R. DraperUniversity of

Wisconsin Madison

Noel CressieOhio StateUniversity

Paul RosenbaumUniversity ofPennsylvania

Nancy ReidUniversity of

Toronto

Marc HallinUniversite Libre

de Bruxelles

Marc YorPierre and MarieCurie University

Bruce LindsayPennsylvania

State University

Murad TaqquBoston University

William E. StrawdermanRutgers, the State

University of New Jersey

Persi DiaconisStanford University Luc Devroye

McGill University

Leo BreimanUniversity of

California Berkeley

Adrian RafteryUniversity ofWashington

Ricardo FraimanUniversidad de

San AndresBuenos Aires

Peter M. RobinsonLondon School ofEconomics andPolitical Science Richard David Gill

Leiden University

Peter HallUniversity of Melbourne

George BoxUniversity of

Wisconsin Madison

Bradley EfronStanford University

Robert TibshiraniStandford University

Trevor HastieStanford University

Jianqing FanPrinceton University

Peter J. BickelUniversity of

California Berkeley

James Stephen MarronUniversity of North Carolina Chapel Hill

Donald RubinHarvard University

Erich Leo LehmannUniversity of

California Berkeley

Raymond CarrollTexas A&M University

Theodore W. AndersonStanford University

Kanti V. MardiaUniversity of Leeds

Dan-Yu LinUniversity of North

Carolina Chapel Hill

David DonohoStanford University

James BergerDuke University

Gareth O. RobertsUniversity of Warwick

David O. SiegmundStanford University

John W. TukeyPrinceton University

Enno MammenUniversity of Mannheim

David RuppertMoscow State

Pedagogical University

Ingram OlkinStanford University

David A. FreedmanUniversity of

California Berkeley

Ole Barndorff-NielsenAarhus University

Alan GelfandDuke University

Wolfgang Karl HärdleHumboldt University

of Berlin

Michael B. WoodroofeUniversity of Michigan

Joseph G. IbrahimUniversity of North

Carolina Chapel Hill

George CasellaUniversity of Florida

Hans-Georg MullerUniversity of

California Davis

Andrew GelmanColumbia University

Peter BuhlmannETH Zurich

Alexandre TsybakovCREST & Universite

Paris VI

Peter J. RousseeuwUniversity of Antwerp

Hira Lal KoulMichigan State UniversityPeter Diggle

Lancaster University

Iain M. JohnstoneStanford University

Bernard W. SilvermanUniversity of Oxford

Jerome H. FriedmanThe MITRE Corporation

Harvey GoldsteinUniversity of Bristol

Holger DetteRuhr University Bochum

David B. DunsonDuke University

Hirotugu AkaikeInstitute of

Statistical Mathematics

Christian P. RobertParis Dauphine University

Jon A. WellnerUniversity of Washington

Alan AgrestiUniversity of Florida

Irene GijbelsCatholic University

of Leuven

Stephen L. PortnoyUniversity of IllinoisUrbana-Champaign

Norman R. DraperUniversity of

Wisconsin Madison

Noel CressieOhio StateUniversity

Paul RosenbaumUniversity ofPennsylvania

Nancy ReidUniversity of

Toronto

Marc HallinUniversite Libre

de Bruxelles

Marc YorPierre and MarieCurie University

Bruce LindsayPennsylvania

State University

Murad TaqquBoston University

William E. StrawdermanRutgers, the State

University of New Jersey

Persi DiaconisStanford University Luc Devroye

McGill University

Leo BreimanUniversity of

California Berkeley

Adrian RafteryUniversity ofWashington

Ricardo FraimanUniversidad de

San AndresBuenos Aires

Peter M. RobinsonLondon School ofEconomics andPolitical Science Richard David Gill

Leiden University

YouUniversity of Tampere

A data scientist, combining expertise in statistics and computerscience, will work in cooperation with experts from other fields.

Application areas:

● Technology and natural sciences (technometrics, chemometrics)● Biology (biometrics, see e.g.

http://www.uta.fi/hes/tutkimus/tutkimusryhmat/Biometria.html)● Medicine (epidemiology)● Economics (econometrics)● Social and behavioral sciences (demometrics, psychometrics)

Jobs for Data Analytics Experts (DataScientists)Jobs for Data Analytics Experts (DataScientists)

Examples of Finnish jobs for data analystsExamples of Finnish jobs for data analysts

See also (in Finnish) http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoittuminen.pdf

Optional studies can influence which field the student ends up in.

An example of statistics jobs (in Finnish): http://www.luonnontieteet.fi/tyo/tilastotiede

Examples of statistics jobs and employers (in Finnish)http://www.uta.fi/rekrytointi/opiskelijalle_ja_tyonhakijalle/uraseuranta/oppiainekoosteet/tilastotiede.html

Jobs for Data Analytics Experts (DataScientists)Jobs for Data Analytics Experts (DataScientists)

The career and recruitment services of the University of Tampere http://www.uta.fi/rekrytointi monitors placement of graduates in theworking life http://www.uta.fi/opiskelu/tyoelama/seurannat/index.html

A slightly old report on 2011 master's degree graduateshttp://www.uta.fi/opiskelu/tyoelama/seurannat/maisterit/index/sijoittumisseuranta%202011.pdf (1 year from graduation all statistics students were in permanent ortemporary jobs or as researchers funded by grants)

Tales from students of mathematics and statistics about studies andplacement in the working life http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoittuminen.pdf“researcher in government research institute”, “mathematician in agovernment agency research unit”, “head of quality control in anindustrial company”, “Data Mining analyst”

Jobs for Data Analytics Experts (DataScientists), information from graduatesJobs for Data Analytics Experts (DataScientists), information from graduates

Structure of CBDAStructure of CBDAstudies: upcomingstudies: upcomingcoursescourses

Master's programme in Computational Big Data Analytics (CBDA)General Studies in Master's Degree Programmes given in English 2015-18 1–22 ECTS

General studies in the Master's degree programmes given in English are different depending on the student's educational background. Please choose below only one of the three options A, B or C.

B) General studies forstudents with education inFinnish and BSc degree takenoutside SIS 9–18 crCompulsory studies 9–13 crSwedish course is required onlyif no Swedish studies were takenin the Bachelor's degree.● SISYY006 Orientation, 2 cr● SISYY005 Study Skills and

Personal Study Planning, 2 cr● KKENMP3 Scientific Writing, 5

cr● KKRULUK Ruotsin kielen

kirjallinen ja suullinenviestintä, 4 cr

Free-choice studies 0–5 cr● YKYYV07 Introduction to

Science and Research, 2–5 cr

C) General studies forstudents who have taken theirBSc degree at SIS 1–11 crCompulsory studies 1 crBasics of Information Literacy 1cr is not required, only Personalstudy planning 1 cr fromSISYY005.● SISYY005 Study Skills and

Personal Study Planning, 2 cr

Free-choice studies 0–10 crScientific Writing isrecommended if the Master'sthesis is written in English.● KKENMP3 Scientific Writing,

5 cr● YKYYV07 Introduction to

Science and Research, 2–5cr

A) General studies forinternational students 12–22crCompulsory studies 12 cr● SISYY006 Orientation, 2 cr● SISYY005 Study Skills and

Personal Study Planning, 2 cr● KKENMP3 Scientific Writing,

5 cr● KKSU1 Finnish Elementary

Course 1, 3 cr

Free-choice studies 0–10 cr● YKYYKV1 Finnish Society

and Culture, 3–5 cr● YKYYV07 Introduction to

Science and Research, 2–5cr

Master's programme in Computational Big Data Analytics (CBDA)Advanced Studies in Big Data Analytics 85 cr

Compulsory AdvancedCourses in Big DataAnalytics 50 cr● MTTTS11 Master's

Seminar and Thesis, 40 cr● MTTTS12 Introduction to

Bayesian Analysis 1, 5 cr● TIETS01 Algorithms, 5 cr

Advanced Courses inMethods ofComputational Data-Analytics 15– cr● TIETS07

Neurocomputing, 5 cr● TIETS11 Data Mining, 5

cr● TIETS31 Knowledge

Discovery, 5–10 cr● TIETS39 Machine

Learning Algorithms, 5 cr● TIETS33 Advanced

Course in ComputerScience, 1–10 cr

Advanced Courses inMethods of StatisticalData-Analytics 20– cr● MTTTS13 Introduction to

Bayesian Analysis 2, 5 cr● MTTTS14 Statistical

Modeling 1, 5 cr● MTTTS15 Statistical

Modeling 2, 5 cr● MTTTS16 Learning from

Multiple Sources, 5 cr● MTTTS17 Dimensionality

Reduction andVisualization, 5 cr

● MTTTS18 Time SeriesAnalysis 1, 5 cr

● MTTTS19 AdvancedRegression Methods, 5 cr

● MTTTS21 StatisticalInference 2, 5 cr

● MTTS1 Other course(advanced)

Master's programme in Computational Big Data Analytics (CBDA)Other and optional Studies in Big Data Analytics Programme 13–29 cr

Compulsory IntroductoryStudies 5 cr● TIETA17 Introduction to

Big Data Processing, 5 cr

Complementing Studies

Complementing studiesdetermined based onprevious education

Optional Studies

Recommended studies inApplications of Data-Analytics● TIETS05 Digital Image

Processing, 5 cr● MTTTS20 Basics of Financial

Data-Analysis and RiskTheory, 5 cr

● ITIS13 Information retrievalmethods, 5 cr

● ITIS16 Information practicesliterature, 5–20 cr

● MTTA3 Internship, 2–10 cr

CBDA Courses Fall 2015

I: Introduction to Bayesian Analysis 1Prior and posterior distributions, Bayesestimators, posterior predictive distribution,interval estimation and hypothesis testing,single-parameter models, simplemultiparameter models.

I-II: Learning from Multiple SourcesData fusion, transfer learning, multitasklearning, multiview learning, and learningunder covariate shift

II: Time Series Analysis 1Simple time series models, stationary timeseries models (ARMA), nonstationary andseasonal time series models (SARIMA), timeseries regression, periodogram.

(Master's thesis and seminar runsevery fall and spring.)

I: Introduction to Big Data ProcessingTypical characteristics and commonapplications of big data; basics of distributedfile systems, databases and computing;practical data processing skills withMapReduce / Apache Hadoop

I-IV: Information practices literatureLiterature package on either: Informationpractices; Information retrieval systems;Interactive information retrieval; task-basedinformation retrieval

I-II: Knowledge Discoveryphases of the process of knowledgediscovery and its nature; basic dataprepocessing, data mining andpostprocessing tasks and methods;application in practical knowledge discoverytasks; advanced methods in knowledgediscovery; data management issues

CBDA Courses Spring 2016

III: Introduction to Bayesian Analysis 2Markov chains, MCMC methods, modelchecking and comparison, commonly usedstatistical models, such as hierarchical andregression models, binomial and count datamodels.

III-IV: Dimensionality Reduction andVisualizationProperties of high-dim data; FeatureSelection; Linear feature extraction; Graphicalexcellence; Human perception; Nonlineardimensionality reduction; Neighbor embeddingmethods; Graph visualization.

IV: Statistical Inference 2Roles of Modeling in Statistical Inference,Principles of Data Reduction,Estimation: Risk, Loss of estimators,... Largesample propertiesLikelihood-Based Methods, likelihood-basedtests and confidence regions

III: Data Miningpremises, objectives, relevance, and basicmethods of data mining; properties of dataand measurements, preprocessing methods,some data mining algorithms and theirapplications, for instance, for classificationand prediction of data.

I-IV: Information practices literatureLiterature package on either: Informationpractices; Information retrieval systems;Interactive information retrieval; task-basedinformation retrieval

IV: Machine Learning Algorithmsbasic and advanced machine learningmethods for data mining, pattern recognitionand other problems

CBDA Statistics Courses Fall 2016 (preliminary!) Spring 2017 (preliminary!)

I: Introduction to Bayesian Analysis 1Prior and posterior distributions, Bayesestimators, posterior predictive distribution,interval estimation and hypothesis testing,single-parameter models, simplemultiparameter models.

I-II: Learning from Multiple SourcesData fusion, transfer learning, multitasklearning, multiview learning, and learningunder covariate shift

II: Possibly ”Basics of financial dataanalysis and risk theory 5cr”, oranother course

III: Statistical Modeling 1Multinomial and ordinal regression,nonlinear regression, parametric survivalanalysis, counting process models,semiparametric hazard models.

III-IV: Dimensionality Reduction andVisualizationProperties of high-dim data; FeatureSelection; Linear feature extraction;Graphical excellence; Human perception;Nonlinear dimensionality reduction;Neighbor embedding methods; Graphvisualization.

IV: Statistical Modeling 2Normal mixed model and extensions,growth curve models, models for paneldiscrete (binary,count, categorical)observations, analysis of missing data,mixture or latent class regression,hierarchical and latent structure models

Data analytics is managementData analytics is managementof knowledge and uncertainty.of knowledge and uncertainty.

As long as there is uncertaintyAs long as there is uncertaintyin the world, there is a needin the world, there is a needfor data analytics.for data analytics.