analyzing human population genetic history through the study of genetic variation mark mata mentor:...

39
Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Post on 19-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Analyzing human population genetic

history through the study of genetic variation

Mark MataMentor: Eleazar Eskin

UCLA Zar LabSoCalBSI 2009

Page 2: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Background

To study human population genetic history is to study parts of human evolution

Human evolution is one of the fundamental questions in science We ask ourselves many questions like:

Where do we come from? Why are we all different? How are we all different?

Page 3: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Background

The ZarLab does studies with the most recent events in human evolution: Now that we have modern humans, what

variations have occurred in our genes since our ancient African ancestors

To answer this question our group is looking at human variation to produce a genetic history of these changes

Page 4: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Why do we care?

Many diseases are caused by variations that have occurred in our genetic history

Better understanding of our genetic history and human variation may eventually lead to better treatment plans

Personalized medicine: “The right drug, in the right dose, to the right person,

at the right time.”

PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps

Page 5: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Human Variation Modern humans share 99.9% of our DNA

0.1% account for variations between humansOf this, 80% of the variation are the result of SNPs

SNP (single-nucleotide polymorphism) – position in the genome where there are two different bases present in the population. The base at a SNP on a chromosome is referred to as the “allele”

A haplotype is the sequence of alleles on a genomeThe other 20% are from deletions or insertions on the

genome

PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps

Page 6: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

International HapMap Project Study done by the International HapMap Consortium

“…create a public, genome-wide database of common human sequence variation…”

Identified SNPs and compiled the SNP alleles into a database of haplotypes for four different populations (Phase 1)

Population used were a group of 60 Mormons in Utah Have been widely studied in the past Western and Northern European descent Have very detailed records Used their chromosome 19

“A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005

Page 7: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

My Project

Goals Reconstruct human genetic history

This is a very difficult problemSub-problem: Identify recent genetic

events Make the assumption that these new genetic

events are rare or very few in number Easier to classify and identify relationships when

compared to older more common haplotypes These new events are important because they

identify shared recent ancestry Disease causing variations could be from recent

events

Page 8: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Identifying Recent Genetic Events1. Select a region in a haplotype and find the

frequency of variation

2. Group variations into common and rare

3. Find recent point mutations

4. Find recent recombinations

Page 9: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Workflow

Individual’s Frequency of IdentifyHaplotypes Variation Events

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT Common AAAAAAAAAT*AAAAAAAAAAAAAAA AAAAAAAAAA – 49%

TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTT Rare AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%

AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA

TTTTTTA*TTT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare

3. Find recent point mutations4. Find recent recombination events

Page 10: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Frequency of Variation

Individual’s Region How ManyHaplotypeTTTTTTTTTTTTTTT TTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAATTTTTTTTTTTTTTT TTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAATTTTTTTTTTTTTTT TTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAA AAAAAAAAAA - 59TTTTTTTTTTTTTTT TTTTTTTTTT TTTTTTTTTT - 58AAAAAAAAATTTTTT AAAAAAAAAT AAAAAAAAAT - 1AATTTTTTTTTTTTT AATTTTTTTT AATTTTTTTT - 1TTTTTTATTTTTTTT TTTTTTATTT TTTTTTATTT - 1AAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 11: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Frequency of Variation

Individual’s How Many Frequency ofHaplotype VariationTTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAATTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAATTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAA AAAAAAAAAA – 59/120 ~49%TTTTTTTTTT|TTTTT TTTTTTTTTT – 58/120 ~48%AAAAAAAAAT|TTTTT AAAAAAAAAT – 1/120 ~1%AATTTTTTTT|TTTTT AATTTTTTTT – 1/120 ~1%TTTTTTATTT|TTTTT TTTTTTATTT – 1/120 ~1%AAAAAAAAAA|AAAAAAAAAAAAAAA|AAAAA

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 12: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Grouping Variations

Classified as either common or rare haplotypes

Make the assumption that new genetic events are rare or very few in number

A cut off rate of 5% frequency or higher was used to separate common subsequences from rare subsequences

5% was a number that came from the International HapMap Consortium study

“A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare

3. Find recent point mutations4. Find recent recombination events

Page 13: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Grouping Variations

Individual’s Frequency of GroupGenes VariationTTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAATTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAA Common:TTTTTTTTTT|TTTTT AAAAAAAAAAAAAAAAAAAA|AAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTT|TTTTT TTTTTTTTTT – 48%AAAAAAAAAT|TTTTT AAAAAAAAAT – 1% Rare:AATTTTTTTT|TTTTT AATTTTTTTT – 1% AAAAAAAAATTTTTTTATTT|TTTTT TTTTTTATTT – 1% AATTTTTTTTAAAAAAAAAA|AAAAA TTTTTTATTTAAAAAAAAAA|AAAAA

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare

3. Find recent point mutations4. Find recent recombination events

Page 14: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Recent Events

Make comparisons to identify two forms of variation: Point mutations Recombination events

Common: Rare:AAAAAAAAAA AAAAAAAAATTTTTTTTTTT AATTTTTTTT

TTTTTTATTT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 15: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Point Mutations

Individual’s Frequency of IdentifyHaplotypes Variation Events

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%

AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA

TTTTTTA*TTT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 16: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Point MutationsIndividual’s Frequency of IdentifyHaplotypes Variation EventsTTTTTTTTTTTTTTTAAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAA – 49%TTTTTTTTTTTTTTT TTTTTTTTTT – 48%AAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTATTTTTTTT TTTTTTATTT – 1%

AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA

TTTTTTA*TTT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 17: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Recent Events

Point mutationsAre found by comparing a common haplotype

and with a rare haplotypeA difference of one shows that a rare haplotype

is a point mutation of a common haplotypeMarked by a “*” next to the point mutation

Common: TTTTTTTTTTTTTTTTA*TTT

Rare: TTTTTTATTT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 18: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Recombination

Individual’s Frequency of IdentifyHaplotypes Variation Events

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%

AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA

TTTTTTA*TTT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 19: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

RecombinationIndividual’s Frequency of IdentifyHaplotypes Variation EventsTTTTTTTTTTTTTTTAAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49%TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 20: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Recent Events

RecombinationCombine portions of two common haplotypes and

see if they form a rare haplotype

Common: Possible Recombinations:AAAAAAAAAA AA|TTTTTTTTTTTTTTTTTT AAA|TTTTTTT

AAAA|TTTTTTAAAAA|TTTTTAAAAAA|TTTTAAAAAAA|TTTAAAAAAAA|TT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 21: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Rare Mutations Marked by a “|” at the border between one

haplotype and another haplotype

Possible Recombinations: Actual Recombinations:AA|TTTTTTTT AA|TTTTTTTTAAA|TTTTTTTAAAA|TTTTTTAAAAA|TTTTTAAAAAA|TTTTAAAAAAA|TTTAAAAAAAA|TT

1. Select a region in a haplotype and find the frequency of variation

2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events

Page 22: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Sample input and outputchr-haplotypes.txt: new_chr-haplotypes.txt:

Indv1 Indv1TTTTTTTTTTTTTTT T T T T T T T T T T

Indv1 Indv1AAAAAAAAATTTTTT A A A A A A A A A T*

Indv2 Indv2AATTTTTTTTTTTTT A A|T T T T T T T T

Indv2 Indv2TTTTTTATTTTTTTT T T T T T T A*T T T

Page 23: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Visualization Tool

Page 24: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Expanding to the Whole Chromosome Now that we have a way to look for

variations in regions of a chromosome, we can expand the technique to look for variations in a whole chromosome

We used a technique of overlapping windows

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |AAAAAAAAAA|

|AAAAAAAAAA||AAAAAAAAAA|

|AAAAAAAAAA| |AAAAAAAAAA|

Page 25: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Overlapping Windows

Individual’s Frequency of IdentifyHaplotypes Variation Events

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%

AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA

TTTTTTA*TTT

Page 26: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Overlapping WindowsIndividual’s Frequency of IdentifyHaplotypes Variation Events

TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA

TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAA – 49%TTTTTTTTTTTTTTT TTTTTTTTTT – 48%AAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1%TTTTTTATTTTTTTT TTTTTTATTT – 1%AAAAAAAAAAAAAAA

Page 27: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Overlapping Recombination events that looked like point

mutationsCommon: AAAAAAAAAAAAAAA

TTTTTTTTTTTTTTTRare: AAAAAAAAATTTTTT

First 10 Slide over 5 and next 10

Common: AAAAAAAAAA Common: AAAAAAAAAATTTTTTTTTT

Rare: AAAAAAAAAT* Rare: AAAA|TTTTTT

AAAAAAAAA|T*TTTTT

AAAAAAAAA|TTTTTT

Page 28: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Applying to a Population’s Chromosome Now that we have a technique to look for

new variations in a whole chromosome We can apply it to a population and identify

regions where recent genetic events took place

Page 29: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Identified Recent Genetic Events

In chromosome 19:Unique point mutations = 13723Unique recombination events = 4065Total unique events = 15697

Total point mutations = 46072Total recombination events = 11381Total number of events = 57453

Average point mutations per individual = 383Average recombination events per individual = 94Average events per individual = 478

Page 30: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Point Mutations

SNP Position in the Haplotype

Page 31: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Recombination Events

Haplotype SNP Position in the Haplotype

Page 32: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Point Mutations and Recombination Events

Haplotype SNP Position in the Haplotype

Page 33: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Conclusion

We have developed an algorithm for identifying recent genetic events in an individual

There were more point mutations identified than there were recombination events

Certain regions in the genome where there were many recent genetic events and there are regions with fewrecent genetic events

Page 34: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Future Work

Run the algorithm over the whole genome Extend the algorithm to multiple

populations Identify recent events that are unique

to a population vs. ones that are shared Identify genetic relations between common

haplotypes Create a chronological order of recent

events in an individual Adapt the algorithm for high-throughput

sequencing data

Page 35: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

UCLA ZarLab Dr. Eleazar Eskin All the lab people

SoCalBSI Dr. Jamil Momand Dr. Sandra Sharp Dr. Nancy Warter-Perez Dr. Wendie Johnston Dr. Beverly Krilowicz Dr. Silvia Heubach Dr. Jennifer Faust Ronnie Cheng Funded By: SoCalBSI 2009 Interns

Page 36: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

The other ancestors are determined through SNP differences of 2 or more

Determining ancestors

Page 37: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

My Project

Red linePoint Mutation

Blue lineAncestor to common relationship

Black dashed lineHaplotype resulted from cross over mutation

Page 38: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Graph

Graph is generated by a program called Graphviz which is a graphical visualization program

Page 39: Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Graph