snps map of fungi - fasta · snps map of fungi • the aim was to look for snps that could be ......

12
11/4/15 1 Mapping SNPs in Fungi Omon Isi SNPs MAP of Fungi The aim was to look for SNPs that could be used a unique ID for fungi strains Genome sequence for Pleurotus ostreatus was downloaded FASTQ files were downloaded for Tremella fusiformis FASTQ sequences were mapped against the reference genome. Why?

Upload: others

Post on 17-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

1

MappingSNPsinFungi

OmonIsi

SNPsMAPofFungi

•  TheaimwastolookforSNPsthatcouldbeusedauniqueIDforfungistrains

•  GenomesequenceforPleurotusostreatuswasdownloaded

•  FASTQfilesweredownloadedforTremellafusiformis

•  FASTQsequencesweremappedagainstthereferencegenome.Why?

Page 2: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

2

Results

1.Therewereonly1017SNPsfound

2.Thisishighlyunlikely,sowhy?

3.TheReferencegenomewasnotadequate,itisadifferentspecies

Page 3: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

3

Conclusions/Lessons

IhavelearnedhowtodoSNPscallusingGalaxy

Thefactthatonlyrelatedorganismsshouldbeusedasreferencegenome

isobvious.

Page 4: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

4

IdenWficaWonofsomaWcmutaWonsandcopynumbervariaWonsinCLL

JianYanThomasDeRaedtOmarAhmad

-  NormalandLeukemiawholegenomesequencing-  addstats-  analyzedwithGATK-  idenWfiedanumberofdrivermutaWons

-  NormalandLeukemiawholegenomesequencing-  IlluminaHiSeq2000-  PairedEndReads(100bp)

Page 5: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

5

FastQC

BamNormal

BamTumor

Pairedendsequencing

fastq

SAMTOOLSMpileup

versionhg19Varscan

CopyNumberVariaWon=>genelist

SNP-indelValidaWon

GATKRe-align

RealignedBAM

BWAversushg19

FastQGroomer

Pipeline

FastQCPairedendsequencing

fastq

Galaxy

Page 6: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

6

BamNormal

BamTumor

Pairedendsequencing

fastq

BWAversushg19

FastQGroomer

Galaxy

FastQGroomer

BWA

BamNormal

BamTumor

VarscanSAMTOOLSMpileup

versionhg19

SAMTOOLS:Mpileup-  SelectrightoutputfileIssueswithrunningVarscanforTumorsonGalaxy-  NoopWontosubtractnormal-  NoopWontosetp-valueforNormal-Tumorcomparison

Page 7: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

7

#----------------------------script------------------------------------------#------------1---------------#----------------callingSNPsandLOH-------------#bashscripttocallsomaWcmutaWonsfromtumorandnormalpair#author:JianYan,UCSD#!/usr/binbam=/mnt/silencer2/home/j4yan/CSHL/bam#directorylocaWonforbamfilescript=/mnt/silencer2/home/j4yan/CSHL/script#directorylocaWonforVarScanref=/mnt/silencer2/home/j4yan/bowWe_index/hg19/hg19.fa#directoryforreferencegenomeoutput=/mnt/silencer2/home/j4yan/CSHL/output/SNP#locaWonforoutputforiin15doecho"$istarts"samtoolsmpileup-B-q1-f$ref$bam/CLL00${i}_normal.bam>$output/CLL00${i}.nor.mpileupsamtoolsmpileup-B-q1-f$ref$bam/CLL00${i}_tumor.bam>$output/CLL00${i}.tum.mpileupjava-jar$script/VarScan.v2.3.9.jarsomaWc$output/CLL00${i}.nor.mpileup$output/CLL00${i}.tum.mpileup$output/out.CLL00${i}.basename-min-coverage10-min-var-freq0.08-somaWc-p-value0.05java-jar$script/VarScan.v2.3.9.jarprocessSomaWc$output/out.CLL00${i}.basename.snpjava-jar$script/VarScan.v2.3.9.jarprocessSomaWc$output/out.CLL00${i}.basename.indeljava-jar$script/VarScan.v2.3.9.jarsomaWcFilter$output/out.CLL00${i}.basename.snp.SomaWc.hc-indel-file$output/out.CLL00${i}.basename.indel-output-file$output/out.CLL00${i}.basename.snp.SomaWc.hc.filterecho"$ifinished"done

Varscan

SNP-indelValidaWon

chrom posiWon ref var nr1 nr2 n_freqgt tr1 tr2 t_freq t_gt Status VarP SomaWcP chr2 81426719 C T 104 2 1.89% C 60 52 46.43%Y SomaWc 1.0 2.677960782616631E-16 34 26 28 24 57 47 1 1chr2 92323206 A C 28 0 0% A 15 4 21.05%M SomaWc 1.0 0.021730720713143611 4 2 2 14 14 0 0chr2 92323213 G T 29 0 0% G 15 4 21.05%K SomaWc 1.0 0.019919827320381504 10 5 2 2 14 15 0 0chr2 92325170 T C 24 1 4% T 18 6 25% Y SomaWc 1.0 0.04320114983152931 10 8 3 3 16 8 1 0chr2 97366152 C A 122 3 2.4% C 77 33 30% M SomaWc 1.0 1.1652835525027084E-9 23 54 6 27 33 89 1 2chr2 133020331 A T 16 0 0% A 11 4 26.67%W SomaWc 1.0 0.04338153503893158 2 9 2 2 8 8 0 0chr2 155555169 A C 18 0 0% A 15 6 28.57%M SomaWc 1.0 0.016632016632015884 11 4 6 0 13 5 0 0chr2 190352891 G T 11 0 0% G 5 7 58.33%K SomaWc 1.0 0.0032305828509893624 3 2 6 1 10 1 0 0chr2 198266834 T C 308 10 3.14% T 168 153 47.66%Y SomaWc 1.0 2.506589526816947E-43 106 62 93 60 192 116 4 6chr2 216234847 T G 21 0 0% T 27 7 20.59%K SomaWc 1.0 0.026510009906234835 27 0 7 0 20 1 0 0

VarscanSNPoutput

Varscan

SNP-indelValidaWon

Normal

Tumor

Muta=oninCodingSequenceSF3B1

BAMinIGV(Integra=veGenomeViewer)

Page 8: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

8

Varscan

SNP-indelValidaWon

Muta=oninCodingSequenceSF3B1

Varscan

SNP-indelValidaWon

Variantvalida=on

Page 9: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

9

VarScantocallCNV#Part1:callCNVs

#!/usr/bin/bashbam=/mnt/silencer2/home/j4yan/CSHL/bamscript=/mnt/silencer2/home/j4yan/CSHL/scriptref=/mnt/silencer2/home/j4yan/bowWe_index/hg19/hg19.faoutput=/mnt/silencer2/home/j4yan/CSHL/output/CNVpileup=/mnt/silencer2/home/j4yan/CSHL/outputforiin15doecho"$istarts”

samtoolsmpileup-B-q1-f$ref$bam/CLL00${i}_normal.bam>$pileup/CLL00${i}.nor.mpileupsamtoolsmpileup-B-q1-f$ref$bam/CLL00${i}_tumor.bam>$pileup/CLL00${i}.tum.mpileup

java-jar$script/VarScan.v2.3.9.jarcopynumber$pileup/CLL00${i}.nor.mpileup$pileup/CLL00${i}.tum.mpileup$output/

out.CLL00${i}.basename#calculatethecopynumbercoverageoftumorandnormalcells

java-jar$script/VarScan.v2.3.9.jarcopyCaller$output/out.CLL00${i}.basename.copynumber-output-file$output/out.CLL00${i}.basename.copynumber.called--homdel-file$output/out.CLL00${i}.basename.copynumber.hmodel#callcopynumbervariantsecho"$ifinished"done

Varscan

CopyNumberVariaWon=>genelist

output[j4yan@silencerCNV]$headout.CLL005.basename.copynumberchrom chr_start chr_stop num nd td log2 gcchr1 10028 10112 85 11.5 16.7 0.541 51.8chr1 131349 131448 100 25.1 24.8 -0.015 64.0chr1 131449 131548 100 23.1 20.7 -0.155 58.0chr1 131549 131617 69 14.8 15.7 0.093 59.4chr1 133364 133463 100 17.5 16.1 -0.121 66.0chr1 133464 133491 28 12.5 11.0 -0.189 71.4chr1 133544 133643 100 14.5 6.3 -1.208 62.0chr1 567545 567607 63 11.3 8.0 -0.495 46.0chr1 657741 657756 16 10.3 23.6 1.192 62.5[j4yan@silencerCNV]$headout.CLL005.basename.copynumber.calledchrom chr_start chr_stop num nd td adj.loggc region raw_raWochr1 131349 131448 100 25.1 24.8 0.019 64.0 neutral -0.015chr1 131449 131548 100 23.1 20.7 -0.136 58.0 neutral -0.155chr1 657862 657961 100 56.6 53.7 -0.076 52.0 neutral -0.075chr1 657962 658061 100 51.4 44.3 -0.194 57.0 neutral -0.213chr1 658378 658477 100 21.7 17.3 -0.303 60.0 del -0.327chr1 761960 762059 100 163.8 136.4 -0.284 47.0 del -0.264chr1 762060 762159 100 756.5 764.6 -0.011 46.0 neutral 0.015chr1 762160 762259 100 861.3 847.2 -0.005 57.0 neutral -0.024chr1 762260 762359 100 343.9 336.3 -0.003 62.0 neutral -0.032

Varscan

CopyNumberVariaWon

R

Page 10: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

10

#Part2,usingDNAcopytoperformsta=s=cs

#Rscriptsource("hwp://bioconductor.org/biocLite.R")biocLite("DNAcopy")library(DNAcopy)cn<-read.table("Desktop/CSHL/out.CLL005.basename.copynumber.called",header=T)#readtableCNA.object<-CNA(genomdat=cn$adjusted_log_raWo,chrom=cn$chrom,maploc=cn$chr_start,data.type='lograWo')#Createsa‘copynumberarray’dataobjectusedforDNAcopynumberanalysesbyprogramssuch#ascircularbinarysegmentaWon(CBS).CNA.smoothed<-smooth.CNA(CNA.object)#DetectoutliersandsmooththedatapriortoanalysisbyprogramssuchascircularbinarysegmentaWon(CBS).segment<-segment(CNA.smoothed,verbose=0,min.width=2,undo.SD=3)#ThisfuncWonimplementsthecicularbinarysegmentaWon(CBS)algorithmofOlshenandVenka-traman(2004).p.segment<-segments.p(segment)#Thisprogramcomputespseudop-valuesandconfidenceintervalsforthechange-pointsfoundbythecircularbinary#segmentaWon(CBS)algorithm.pdf("Desktop/CSHL/CLL005.CNV.pdf")plot(segment,type="w")dev.off()write.table(p.segment,file="Desktop/CSHL/CLL005.copynumber.called.segments.p_value",sep="\t”)

CopyNumberVariaWon

R

imageviewoutput CopyNumberVariaWon

R

Page 11: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

11

Copynumbergain#outputofthetablechr_stopnum_posiWons normal_depth tumor_depth adjusted_log_raWo gc_content region_call raw_raWo"99" "Sample.1" "chr14" 22670491 22749509 21 2.0536 7.80717368159807 1.43726069686465e-13 22749309 22749509"100" "Sample.1" "chr14" 22749609 22788857 9 2.6009 6.28011088585941 4.68660642046018e-09 22788857 22891686

"101" "Sample.1" "chr14" 22891586 22934605 9 3.1491 10.8815481513653 2.80606427200614e-26 22934605 22934605"235" "Sample.1" "chr7" 38313022 38339628 7 2.5386 8.23033979314098 2.08277896231573e-15 38331348 38339628

Genesinvolved:TCRA:Tcellreceptoralpha(chr14)TRGC2:Tcellreceptorgamma2,chainCregion(chr7)

CopyNumberVariaWon

R

Copynumberloss

chr_stopnum_posiWons normal_depth tumor_depth adjusted_log_raWo gc_content region_call raw_raWo

"28" "Sample.1" "chr10" 37823494 37823658 3 -1.3527 8.97662044342639 1.0579677959203e-16 37823658 37823658"52" "Sample.1" "chr12" 278350427836042 -1.3865 7.55817926989938 2.74030322197209e-11 27836042783604"79" "Sample.1" "chr13" 50035024 50038158 6 -1.3845 4.82793481236089 6.40097713084299e-05 50035424 50042087

"151" "Sample.1" "chr19" 45322781 45322881 2 -1.306 7.1938280311902 2.12377619682129e-10 45322881 45324000"180" "Sample.1" "chr20" 55108449 55108535 5 -2.1962 18.400935829044 6.80717991820467e-73 55108535 55108535

Genesinvolved:chr10:genedeserts:MTRNR2L7:MT-RNR2-Like7,PlaysaroleasaneuroprotecWveandanWapoptoWcfactor.

LINC00993:longintergenicnon-proteincodingRNA993

CACNA1C:CalciumChannel,Voltage-Dependent,LType,Alpha1CSubunit(chr12)

SETDB2:Histone-LysineN-Methyltransferase(chr13)BCAM:BasalCellAdhesionMolecule(LutheranBloodGroup)(chr19)

FAM209B:FamilyWithSequenceSimilarity209,MemberB(chr20)

CopyNumberVariaWon

R

Page 12: SNPs MAP of Fungi - FASTA · SNPs MAP of Fungi • The aim was to look for SNPs that could be ... Pipeline FastQC Paired end sequencing fastq Galaxy 11/4/15 6 Bam Normal Bam Tumor

11/4/15

12

ThankYouDanny!!

AndallotherTeachersStaffandClassmates