research focus of uh-dmml

15
Department of Computer Science Research Focus of UH-DMML Christoph F. Eic Data Mining Geographica l Information Systems (GIS) High Performanc e Computing Machine Learning Data Analysis tput: Graduated 12 PhD students and 80 Master Students

Upload: kimball

Post on 13-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Research Focus of UH-DMML. Data Analysis. Geographical Information Systems (GIS). Machine Learning. Data Mining. High Performance Computing. Output : Graduated 12 PhD students and 80 Master Students. Christoph F. Eick. Research Areas. Clustering and Summary Generation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Research Focus of UH-DMML

Department of Computer Science

Research Focus of UH-DMML

Christoph F. Eick

Data MiningGeographical

Information Systems (GIS)

High Performance

Computing

Machine Learning

Data Analysis

Output: Graduated 12 PhD students and 80 Master Students

Page 2: Research Focus of UH-DMML

Department of Computer Science

Research Areas1.Clustering and Summary Generation 2.Spatial Data Mining and Analyzing Spatial

Data 3.Association Analysis (Correlation Mining,

Colocation Mining, Sequence Mining)4.Helping Scientists to Understand and

Summarize their Data5.Classification and Prediction

UH-DMML

Page 3: Research Focus of UH-DMML

Department of Computer Science

1. One focus is on developing novel data mining (and other) algorithms and novel interestingness (and other) measures.

2. Other research centers on developing methods to make sense of data / to summarize data.

3. Application-driven approach: Find interesting and important datasets develop frameworks and algorithms that produce “something useful” for those datasets

4. Some of our work is experimental in nature.5. Occasionally, we try to solve theoretical problems, but this

is not the main focus! 6. Work is kind of “hands on”. 7. Team work is encouraged.

UH-DMML

Characteristics of the Work We Do

Page 4: Research Focus of UH-DMML

Department of Computer Science

Current and Recent Research Projects1. Mining POI Datasets 2. Patch-based Prediction Techniques 3. Doing Things With and For Polygons 4. Non-Traditional Clustering Algorithms5. Collocation Mining6. …

Christoph F. Eick

Page 5: Research Focus of UH-DMML

Department of Computer Science

Mining POI Datasets Motivation: A lot of POI datasets (e.g. in Google Earth) are becoming available now. http://bloomington.in.gov/documents/viewDocument.php?document_id=2455;dir=building/buildingfootprints/shape https://data.cityofchicago.org/Buildings/Building-Footprints/w2v3-isjw

Buildings of the City of Chicago (830,000 Polygons) :

Challenges: Extract Valuable Knowledge from such datasets Data Mining Facilitate Querying and Visualizing of such dataset HPC / BigData

Initiative

Page 6: Research Focus of UH-DMML

Department of Computer Science

Patch-based Prediction Techniquesa. New Algorithms for Regression Tree Inductionb. New Decision Tree Induction Algorithms c. Multi-Target Regressiond. Spatial Prediction Techniques

Ch. Eick

Page 7: Research Focus of UH-DMML

Department of Computer Science

Doing Things With and For Polygons 1. Clustering Polygons 2. Using Polygons as Models for Spatial Clusters3. Fitting Polygons to Points Clouds 4. Computing Boundaries Between Spatial Clusters5. Measuring Emptiness in Polygons

UH-DMML

Page 8: Research Focus of UH-DMML

Department of Computer Science

Non-Traditional Clustering Algorithms

UH-DMML

Clustering Algorithms With plug-in Fitness Functions

MiningSpatio-Temporal

Datasets

Parallel ComputingPrototype-basedClustering

Randomized Hill ClimbingWith a Lot of Cores

AgglomerativeClustering and

Hotspot DiscoveryAlgorithms

Polygonal Clusteringand Clustering Polygons

Page 9: Research Focus of UH-DMML

Department of Computer Science

Helping Scientists to Make Sense Out of their Data

Ch. Eick

Figure 1: Co-location regions involving deep andshallow ice on Mars

Figure 2: Interestingness hotspots where both income and CTR are high.

Figure 3: Analyzing the Composition of Cities

Page 10: Research Focus of UH-DMML

Department of Computer Science

Potential “Future” Topics Trajectory Classification and Prediction Creating Parallel Versions of Clustering Algorithms Models for the Evolution of Spatial Datasets Urban Computing Educational Data Mining

? Ozone HotspotEvolution

3p 5p7p

Page 11: Research Focus of UH-DMML

Department of Computer Science

Some UH-DMML Graduates 1

Christoph F. Eick

Dr. Wei Ding, Assistant Professor Department of Computer Science,

University of Massachusetts, Boston

Sharon M. Tuttle, Professor,Department of Computer Science,

Humboldt State University, Arcata, California

Tae-wan Ryu, Professor, Department of Computer Science,

California State University, Fullerton

Page 12: Research Focus of UH-DMML

Department of Computer Science

Some UH-DMML Graduates 2

Christoph F. Eick

Ruth Miller Ruth Miller, PhD Washington University in St. Louis, Postdoc - Midwest Alcohol Research Center, Department of Psychiatry. Adjunct Instructor - Department of Computer Science

Chun-sheng Chen, PhD Amazon, Seattle (analyzing web traffic)

Rachsuda Jiamthapthaksin PhD Lecturer Assumption University, Bangkok, Thailand

Justin Thomas MS Section Supervisor at Johns Hopkins University Applied Physics Laboratory

Mei-kang Wu MS Microsoft, Bellevue, Washington

Jing Wang MS AOL, California

Page 13: Research Focus of UH-DMML

Department of Computer Science

UH-DMML Mission Statement

The Data Mining and Machine Learning Group at the University of Houston aims at the development of data analysis, data mining, and machine-learning techniques and to apply those techniques to challenging problems in geology, astronomy, urban computing, ecology, environmental sciences, web advertising and medicine. In general, our research group has a strong background in the areas of clustering and spatial data mining. Areas of our current research include: clustering algorithms with plug-in fitness functions, association analysis, mining related spatial data sets, patch-based prediction techniques, summarizing the composition of spatial datasets, change and progression analysis, and data mining with a lot of cores.

Website: http://www2.cs.uh.edu/~UH-DMML/index.html

Research Group Publications: http://www2.cs.uh.edu/~ceick/pub.html

Data Mining Course Website: http://www2.cs.uh.edu/~ceick/DM/DM.html Machine Learning Course Website: http://www2.cs.uh.edu/~ceick/ML/ML.html

Ch. Eick

Page 14: Research Focus of UH-DMML

Department of Computer Science

Reading Material

Urban Computing/Spatial Clustering: SIGKDD Urban Computing Workshop 2013 PaperAgglomerative Clustering: R. Jiamthapthaksin, C. F. Eick, and S. Lee, GAC-GEO: A Generic Agglomerative Clustering Framework for Geo-referenced Datasets, in Knowledge and Information Systems (KAIS).

Patch-based Prediction Techniques: MLDM 2013 Paper, ACM-GIS 2010 PaperData Mining with a lot of Cores: ParCo 2011 PaperGIS/Creating Polygon Models: ACM-GIS 2013 SubmissionMachine Learning Course Website: http://www2.cs.uh.edu/~ceick/ML/ML.html Collocation Mining: ACM-GIS 2008 PaperSpatial Clustering and Association Analysis: W. Ding, C. F. Eick, X. Yuan, J. Wang, and J.-P. Nicot, A Framework for Regional Association Rule Mining and Scoping in Spatial Datasets, Geoinformatica (2011) 15:1-28, DOI 10.1007/s10707-010-0111-6, January 2011.

Supervised Clustering: TAI 2005 Paper

Ch. Eick

Page 15: Research Focus of UH-DMML

Department of Computer Science

What Courses Should You Take to Conduct Research in this Research Group?

I. Data Mining II.Machine LearningIII.Parallel Programming, AI, Software Design,

Data Structures, Databases, Big Data, Visualization, Evolutionary Computing, Image Processing, GIS courses, Geometry, Optimization.

UH-DMML