research group data and web science - uni-mannheim.de€¦ · • experiment with rapidminer or...
TRANSCRIPT
Research Group Data and Web Science
Mannheim, 11. Februar 2019
Data and Web Science Group
• 7 Professors, 5 Post-docs, 20 PhD students
Research Areas
− Artificial Intelligence (Prof. Heiner Stuckenschmidt)• Knowledge representation formalisms and reasoning techniques for
information extraction and integration
− Data Analysis (Prof. Rainer Gemulla)• Methods for analyzing and mining large datasets as well as
their practical realizations and applications
− Natural Language Processing (Prof. Simone Ponzetto)• Knowledge acquisition, knowledge-rich language understanding,
Computational Social Science and Digital Humanities
Research Areas
− Statistical Nat. Lang. Processing (Prof. Goran Glavaš)• modeling meaning of language, understanding text, and
structuring knowledge from text
− Image Processing (Prof. Dr.-Ing. Margret Keuper)• Image Segmentation, Motion Segmentation, Efficient Video
Segmentation, Semantic Segmentation, Multiple Object Tracking
− Web-based Systems (Prof. Chris Bizer)• large-scale data integration, evolution of the World Wide
Web from a medium for the publication of documents into a global dataspace
− Data Science (Prof. Dr. Heiko Paulheim)• web data as background knowledge in data mining, and data
mining methods to create and improve large-scale knowledge bases
Research Goals
DWS Overall Research Goals:
1. Methods for understanding large and heterogeneous data 2. Application of these methods in different contexts
Schema.org Data
IsA Database
Social
Sciences
Web Search
Data Analytics
Business
Applications
Info
rma
tion E
xtra
ctio
n
Data
Inte
gra
tion
Data
Min
ing &
Re
aso
nin
g
Ap
plic
atio
ns
Web Tables
Teaching Overview: Courses
Decision Support
Data Mining II
Web MiningWeb Data Integration
Semantic Web Technologies
Information Retrieval
Text Analytics
Data Mining ILarge-Scale Data
Management
Data Mining andMatrices
FSS
Image Processing
Database Technology (MMDS)
Computer Vision
RelationalLearning
HWS
Hot Topics in Machine Learning
IE 500: Data Mining 1
• Content: the basics of “torturing data”:
1. Cluster Analysis: How to automatically organize your MP3 collection?
2. Classification: Will your bank grant you a loan?
3. Regression: How to determine the price of a house?
4. Association Analysis: Which products to place together
in a supermarket to maximize customer purchases?
5. Text Mining: Do students on Twitter like or dislike this lecture?
• Exercises
• Experiment with Rapidminer or Phython
• Student project:
• Mine some data of your choice
• Teaching staff:
• Prof. Dr. Christian Bizer (Lectures)
• Anna Primpeli, Oliver Lehmberg (Exercises)
• Data integration is the process of consolidating data from
heterogeneous data sources into a single uniform representation.
• Data integration is critical within many application domains
• Business: CRM, Business Intelligence
• Science: Exploitation of existing research data
• The Web: Comparison shopping, job search
• Topics of the Course
1. The Data Integration Process
2. Web Data Formats
3. Schema Mapping and Data Translationa?
4. Identity Resolution
5. Data Fusion
DB1
DB4
DB3
IntegratedData
IE 670 + IE683: Web Data Integration
⚫ Lecture (IE670)
• Introduces the principle methods of data integration
• Discusses how to evaluate data integration results
• Instructor: Prof. Dr. Christian Bizer
• Grading: Written Exam
⚫ Student Projects (IE683)
• Teams (five students) realize a data integration project
1. data gathering
2. schema matching and data translation
3. identity resolution
4. data quality assessment and data fusion
• Teams will use commercial data integration tools
as well as the Java data integration framework Winte.r
• Instructors: Anna Primpeli, Oliver Lehmberg
• Grading: Project report and presentation
IE 670 + IE683: Web Data Integration
• Advanced Data Mining methods
• Dimensionality Reduction
• Anomaly Detection
• Time Series Analysis
• Parameter Tuning
• Ensemble Learning
• Neural Networks & Deep Learning
• Organization:• Lectures and Exercises
• Participation in Data Mining Cup
• Teaching Staff
• Prof. Dr. Heiko Paulheim (Lectures)
• Nicolas Heist (Exercises)
IE 672: Data Mining II
starts nextweek!!!
• MMDS fundamental course− Foundations of Relational Databases− Relational Modeling− Normal Forms− Query Processing and Optimization− Transactions, Concurrency, and Recovery
• Teaching Staff:• Prof. Dr. Heiko Paulheim (Lectures)• Sven Hertling (Exercises)
CS 460: Database Technology
IE 650: Semantic Web Technologies
• Prerequisites: • Basic programming skills (e.g. Java, Python)
• Topics:
• Understanding the vision of the Semantic Web
• Acquaintance with foundations of W3C standards for building
semantic web applications
• Data Integration and Access: XML, RDF and SPARQL
• Knowledge Representation: RDFS and OWL
• Ontology Management: Engineering, Learning and Alignment
• Programming skills and IT competence: practical usage of
technologies for building semantic web applications
• Teaching staff:• Prof. Dr. Heiko Paulheim (Lecture)
• Sven Hertling (Exercises)
IE 560: Decision Support
)|(maxarg eaEUactiona
=
• Decision-making is an important part of all
science-based professions• Specialists apply their knowledge in a given area to
make informed decisions.
• Models that help to formulate and algorithmically
solve decision making problems− Find a solution that maximizes the expected
benefit of the outcome.
• Topics include: • Probabilistic Graphical Models
• Decision Theory and Decision Networks
• Game Theory and Mechanism Design
IE 560: Decision Support
• Lectures & Exercises
• Teaching staff: • Prof. Dr. Heiner Stuckenschmidt (Lecture)
• Dr. Melisachew Wudage Chekol (Exercises)
• Literature:
• Stuart Russel and Peter Norvig: Artificial Intelligence – A
modern Approach.• Pearson 2013.
• Chapters 2,7,10,11 and 13-17
• A Reader will be available
IE 689: Relational Learning
active(M) ring_size_5(M, R), element(Y, R), bond(M, Y, Z, 2).
Example Problem: learn a relational
definition of active componentsLecture
Prof. Dr. Heiner Stuckenschmidt
Every second Monday 12:00-13:30
Exercises/Tutorial:
Manuel Fink / Dr. Christian Meilicke
Every second Monday 12:00-13:30
Heiner Manuel Christian
Solution:
IE 660: Text Analytics
• Methods to automatically process natural language from a
computational / algorithmic perspective
• An introduction to NLP in three main blocks:Computational linguistics
Machine Learning and NLP
Applications
IE 660: Text Analytics
• Lectures + Exercises (2+2 SWS)
• Course personnel:
• Prof. Dr. Simone Ponzetto
• Prof. Dr. Goran Glavaš
• Topics:
• Finite state methods
• Language models (N-Gram models)
• Semantics in a sparse/dense vector space
• Sequence labeling
• Neural networks and deep learning for NLP
• Aplications: machine translation, sentiment analysis, etc.
IE 671: Web Mining
• Approaches to mine knowledge from the Web
• Web Usage Mining
• Web Structure Mining
• Web Content Mining
• Course Structure:
• Lectures and exercises
• Projects (during the second half)
• Teaching staff:
• Prof. Dr. Simone Ponzetto
• Prof. Dr. Goran Glavaš
• Dr. Dmitry Ustalov
IE 663 + IE 681:Information Retrieval
• Lectures (IE 663)
• Boolean and vector space retrieval models
• Probabilistic and lang. modeling retrieval
• Semantic and Latent Retrieval
• Web search: Link-based algorithms
• Teaching staff:
• Prof. Goran Glavaš (Lectures)
• Robert Litschko (Exercises
• Team Project (IE 681)
• Build your own search engine!
• What you need to know to work with Big Data
• Fundamental concepts and computational paradigms for
large-scale data management and Big Data
CS 560: Large-Scale Data Management
CS 560: Large-Scale Data Management
• Teaching staff:
• Prof. Rainer Gemulla (lectures)
• Daniel Ruffinelli (exercises/tutorials)
• 2 SWS lecture, 2 SWS tutorial, 6 ECTS
• Lecture: concepts, methods, systems
• Tutorial: In-depth discussion, exercises, hands-on
assignments
• Prerequisites
• Database Systems I or equivalent
• Programming experience
• Passing requirements
• Written exam
IE 673: Data Mining and Matrices
• Matrices & tensors are powerful data representations
• Data points, sets, graphs, relational data, knowledge bases, ...
• Course goal: Learn how to analyze such data• Course covers theory and applications of dimensionality reduction, embeddings, denoising, discovery of
latent structure, visualization, prediction, clustering, pattern mining, topic modelling, …
• Focus is on unsupervised and semi-supervised learning & matrix decompositions
IE 673: Data Mining and Matrices
• Instructor: Rainer Gemulla
• Tutor: Daniel Ruffinelli
• 2 SWS lecture, 2 SWS tutorium, 6 ECTS
• IE 500 Data Mining I recommended
• Gain hands-on experience• Smaller exercises to deepen lecture material• Homework assignments to analyze real data• Learn R
• Passing requirements• Regular assignments• Final exam or oral examination
IE 674: Hot Topics in Machine Learning
Machine learning
How can we build computer systems that automatically improve with
experience, and what are the fundamental laws that govern all
learning processes?
Goal: in-depth understanding of underlying algorithms and concepts
Focus: basics + selected “hot topics” and their applications
• Instructor: Rainer Gemulla
• Tutor: TBA
• 2 SWS lecture, 2 SWS tutorial, 6 ECTS
• Recommended prerequisites
• IE 500 Data Mining I, IE 560 Decision support
• Basic knowledge of probability and statistics
• Gain hands-on experience
• Smaller exercises to deepen lecture material
• Homework assignments to analyze real data
• Learn Python, NumPy, scikit-learn, PyTorch, Stan, ...
• Passing requirements
• Written exam or oral examination
• Assignments
IE 674: Hot Topics in Machine Learning
CS647: Image Processing
• Lecture contents
• Basics of Imaging
• Noise and basic operations
• Variational Methods
• Image Feature Extraction
• Segmentation
• Image Sequences and Motion
• Organization
• Lectures and Exercises
• Gain practical python and C++ coding experience in the exercises
• Teaching Staff
• Prof. Margret Keuper (Lectures and Exercises)
CS 646: Higher Level Computer Vision
• Lecture contents
• Object Detection
• Semantic Image Segmentation
• Optical Flow
• Video and Motion Segmentation
• Deep Learning for Computer Vision
• Organization
• Lectures and Exercises
• Gain practical python and Matlab coding experience in the
exercises
• Teaching Staff
• Margret Keuper (Lectures and Exercises)
CS 707: Data and Web Science Seminar
• Learn about recent advancements in data and web science
• Read, understand, explore, present, and peer-review scientific literature
• This term: Graph Mining and Learning from Graphs
• Topics: graph mining, graph representation learning, graph analysis frameworks, applications
Instructors: Kiril Gashteovski, Rainer Gemulla
CS 709: Text Analytics Seminar
• Goals:• Examine and explore cutting-edge research in the
areas of natural language processing,
computational linguistics, and information retrieval
• Learn how to read and interpret scientific work in
this area of research
• Learn how to write a survey/overview paper on
the assigned topic, covering a specific task
• Instructors:• Simone Ponzetto, Goran Glavaš
• Not offered this term
CS 710: Knowledge Graphs Seminar
• Gain insights into…
• Construction of Knowledge Graphs
• Contents of Open Knowledge Graph
• Application Areas
• Instructor: Heiko Paulheim
CS 715: Large Scale Data Integration Seminar
− Covers current topics in the area of
− large-scale schema matching, identity resolution,
− data fusion, set completion, data search, and
− data exploration, and data profiling
− Concrete topics change from semester to semester
− You summarize a current research topic in a concise report
− You systematically compare different state of the art methods
− You give a presentation about your topic
− Good start for writing your master thesis at the chair
• To work on:
• Data and Web Mining projects
• Information Extraction and Integration projects
• Knowledge Representation and Reasoning projects
• Natural Language Processing projects
• Implement open source tools
• 30-60 h/month contracts are possible
• Contact PostDoc or Professor responsible for the
project/area that you are interested in.
• include CV and transcript of records.
• Good start for writing your master thesis within group.
DWS hires good students!