dmtm 2015 - 01 course introduction

13
Prof. Pier Luca Lanzi Course Introduction Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Upload: pier-luca-lanzi

Post on 13-Aug-2015

54 views

Category:

Education


1 download

TRANSCRIPT

Prof. Pier Luca Lanzi

Course Introduction ���Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Prof. Pier Luca Lanzi

Data Mining and Text Mining

•  Prof. Pier Luca Lanzi���Dipartimento di Elettronica, Informazione e Bioingegneria���[email protected]���voice: 02 23993472���http://www.deib.polimi.it/people/lanzi

•  Office Hours���Wednesday, from 14:30 until 16:00

2

Prof. Pier Luca Lanzi

Course Structure

•  Introduction to basic Data Mining and Text Mining methods���(24 hours)���

•  Advaced Techniques and Applications���(16 hours)���

•  Final Project will involve an application to real-world data

3

Prof. Pier Luca Lanzi

Course Outline

• What is Data Mining?•  Data and knowledge representation•  Data exploration and preparation•  Data Mining tasks§ Associations § Clustering§ Classification

•  Advanced techniques and applications§ Text Mining§ Graph Mining§ Data Streams

4

Prof. Pier Luca Lanzi

syllabus

Prof. Pier Luca Lanzi

Prof. Pier Luca Lanzi

Course Material

•  “Data Mining and Analysis: Fundamental Concepts and Algorithms,” Mohammed Zaki and Wagner Meira Jr. Cambridge University Press in 2014. http://www.dataminingbook.info

•  “Mining of Massive Datasets Book,” by A. Rajaraman, J. Ullman.���http://www.mmds.org

•  Course slides available on BEEP and articles distributed during the course

•  Software§ R & Rstudio (http://www.rstudio.com)§ Python/IPython (numpy, scipy, scikit, etc.)§ BigML (http://www.bigml.com)§ Rapid Miner/Weka (http://rapid-i.com/)

7

Prof. Pier Luca Lanzi

Additional Material

•  “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” Second Edition, February 2009, Trevor Hastie, Robert Tibshirani, Jerome Friedman (http://statweb.stanford.edu/~tibs/ElemStatLearn/)

•  “An Introduction to Data Science,” Jeffrey Stanton ���https://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf

•  Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” 2nd Edition.

8

Prof. Pier Luca Lanzi

R Help Websites

•  Quick-R ���http://www.statmethods.net

•  R Cookbook���http://www.cookbook-r.com

•  R Bloggers���http://www.r-bloggers.com

•  R on Stackoverflow���http://stackoverflow.com/tags/r/info

•  Google R Styleguide���https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml

9

Prof. Pier Luca Lanzihttp://www.kdnuggets.com/2012/08/poll-analytics-data-mining-programming-languages.html

Prof. Pier Luca Lanzihttp://xkcd.com/353/

Prof. Pier Luca Lanzi

Evaluation

•  May 2015 First Midterm (15 points)

•  June 2015 Second Midterm (18 points)

•  July 2015 Full exam for those who failed midterms

12

Prof. Pier Luca Lanzi

Challenges and exercises might be proposedduring the course to substitute part of the written exam

There is also another way to pass the exam

http://www.kaggle.com

http://www.drivendata.org/

http://tunedit.org/data-competitions