csc411- machine learning and data mining

13
CSC411- Machine Learning and Data Mining Tutorial 10– March 23 th , 2007 University of Toronto (Mississauga Campus)

Upload: ojal

Post on 10-Jan-2016

58 views

Category:

Documents


0 download

DESCRIPTION

University of Toronto (Mississauga Campus). CSC411- Machine Learning and Data Mining. Tutorial 10– March 23 th , 2007. Data Mining and Machine Learning Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSC411- Machine Learning and Data Mining

CSC411- Machine Learning and Data Mining

Tutorial 10– March 23th, 2007

University of Toronto (Mississauga Campus)

Page 2: CSC411- Machine Learning and Data Mining

Case 1: In order to improve the business, a national-chain supermarket starts a project to keep track of their customers. Regular customers can collect points or receive discounts by using their store card on each purchase. Temporary customers who are not members to the store will be assigned to a same temporary store card. Now supermarket is hiring the data mining analyst to help them on this project.

Question: If you are the data mining analyst, how will you design the project and what data you need for the project?

Data Mining and Machine Learning Applications

Page 3: CSC411- Machine Learning and Data Mining

Case 2: Researchers found that individuals have different responses or reactions to the same drug treatment. For example, two smokers have the same smoking history. One is detected to have lung cancer and the other one does not. Single Nucleotide Polymorphisms (SNPs) are an important resource to explain these phenomenons. One possible project is study the association between the SNPs and the DNA sequences.

Question: If you are the researcher, how will you design this project?

Data Mining and Machine Learning Applications

Page 4: CSC411- Machine Learning and Data Mining

Cancer – Different Fates

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 5: CSC411- Machine Learning and Data Mining

SNPs A SNPs B

SNPs C SNPs D

SNPs May Be the Solution

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs):

http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 6: CSC411- Machine Learning and Data Mining

What Is Variation in the Genome?Common Sequence

Variations

Polymorphism

Deletions

Translocations

Insertions

Chromosome

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 7: CSC411- Machine Learning and Data Mining

SNPs Are the Most CommonType of Variation

At least 1 percent of the populationMost of the population

Common sequence

G to C

SNP site

Variant sequence

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 8: CSC411- Machine Learning and Data Mining

The Genome Contains Genes

Gene 2 Coding region Protein 2

Protein 1

Noncoding region

Noncoding region

Gene 1 Coding region

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 9: CSC411- Machine Learning and Data Mining

Variation in the Human Genome

Person 1 Person 2

= Variations in DNAThis slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 10: CSC411- Machine Learning and Data Mining

Variations Causing No Changes

= Variations in DNA that cause no changes

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 11: CSC411- Machine Learning and Data Mining

Variations Causing Harmless Changes

= Variations in DNA that cause harmless changesThis slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 12: CSC411- Machine Learning and Data Mining

Variations Causing Harmful Changes

= Variation in DNA that causes harmful change

No Disease

No Disease Hemophilia

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation

Page 13: CSC411- Machine Learning and Data Mining

Variations Causing Latent Changes

Many years laterMany years later

= Variations in DNA that cause latent effects

This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation