haixiao dong and ryan oliveira statistical genomics (crops...

Haixiao Dong and Ryan Oliveira Statistical Genomics (CROPS 545), Spring 2017 Homework #4 (1-3) An R software package was developed to perform GWAS using a general linear model using SNPs, user-input covariates, and principal components analysis (PCA) as the fixed effects. The addition of PCA is considered an added feature of the package and features prominently in its name: Association Leveraging PrincipAl Components Analysis (ALPACA). However, PCA is not used if any of the columns are found to be linearly dependent on the user-provided covariates. The total inputs include phenotypes (y), genotypes (X), covariates (C), chromosome positions for SNPs (GM), a manual p-value cutoff (cutoff), and positions of known QTNs if running a simulation (QTN.position). The covariates, cutoff value, and QTN positions are all optional. If the covariates or QTN positions are not entered, they are not used in analysis (with SNPs and PCAs alone used as fixed effects). If a manual cutoff value is not used, the subsequent cutoff value is based off a simple Bonferroni correction applied to an alpha value of 0.05. The function returns a list including probability values for each marker to its phenotype and a list of ordered SNPs. If it is used to run a simulation with known QTNs, it outputs power, false discovery rate, and type I error. The program also creates a QQ plot, a Manhattan plot, and a principal component plot. If known QTNs are used, it generates a plot of power versus false discovery rate. In addition, a csv file is generated with a list of all tested SNPs, their chromosome numbers and position, their p-values, and their rank by p-value. The software package is attached as ALPACA.R. There is also a user manual PDF and an associated R File for the examples used in the manual (ALPACAExamples.R).

Upload: others

Post on 18-Jul-2020

3 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Haixiao Dong and Ryan Oliveira

Statistical Genomics (CROPS 545), Spring 2017 Homework #4

(1-3) An R software package was developed to perform GWAS using a general linear model using SNPs,

user-input covariates, and principal components analysis (PCA) as the fixed effects. The addition of PCA

is considered an added feature of the package and features prominently in its name: Association

Leveraging PrincipAl Components Analysis (ALPACA). However, PCA is not used if any of the columns

are found to be linearly dependent on the user-provided covariates. The total inputs include

phenotypes (y), genotypes (X), covariates (C), chromosome positions for SNPs (GM), a manual p-value

cutoff (cutoff), and positions of known QTNs if running a simulation (QTN.position). The covariates,

cutoff value, and QTN positions are all optional. If the covariates or QTN positions are not entered, they

are not used in analysis (with SNPs and PCAs alone used as fixed effects). If a manual cutoff value is not

used, the subsequent cutoff value is based off a simple Bonferroni correction applied to an alpha value

of 0.05. The function returns a list including probability values for each marker to its phenotype and a

list of ordered SNPs. If it is used to run a simulation with known QTNs, it outputs power, false discovery

rate, and type I error. The program also creates a QQ plot, a Manhattan plot, and a principal component

plot. If known QTNs are used, it generates a plot of power versus false discovery rate. In addition, a csv

file is generated with a list of all tested SNPs, their chromosome numbers and position, their p-values,

and their rank by p-value.

The software package is attached as ALPACA.R. There is also a user manual PDF and an associated R File

for the examples used in the manual (ALPACAExamples.R).

Page 2: Haixiao Dong and Ryan Oliveira Statistical Genomics (CROPS ...zzlab.net/StaGen/2017/HW/HW4/ALPACA/Homework4_ALPACA.pdf · Haixiao Dong and Ryan Oliveira Statistical Genomics (CROPS

(4) GWAS using ALPACA.R is performed on the provided genotypes, phenotypes, and covariates from

ZZLab.net. As the principal components generated by ALPACA were found to be linearly dependent on

the provided covariates, they were eliminated from the analysis. Four SNPs were found to be

significant and are described in the table below:

Table 1. Significant SNPs detected

The Manhattan and QQ plots display this data. As the first two SNPs are very close on the list by

location (positions 2715 and 2717), they appear as one marker on the Manhattan plot but are distinct

on the QQ plot. A generated Bonferroni cutoff is visible as a green line on the Manhattan plot; the value

is 1.62 x 10-5.

Fig 1. Manhattan plot and QQ plot for Question 4

QTN position SNP ID Probability value

2715 PZA03058.17 0

2717 PZA03058.21 0

1013 PZA02699.1 1.35e-10

937 PZA00615.3 8.69e-7

Page 3: Haixiao Dong and Ryan Oliveira Statistical Genomics (CROPS ...zzlab.net/StaGen/2017/HW/HW4/ALPACA/Homework4_ALPACA.pdf · Haixiao Dong and Ryan Oliveira Statistical Genomics (CROPS

(5) GWAS using ALPACA.R and GWAS by correlation (using the function GWASbyCor from ZZLab) are run

on simulated phenotypes using myG2P from ZZLab.net. In the Manhattan and QQ plots (Fig 2), ALPACA

(“GLM Manhattan”) resolves the same number of QTNs as the Cor function with fewer false positives.

The QQ plot is closer to the line with separation of the highest p-values. Based on 100 times replicates

for various number of QTNs (NQTN) and heritability (h2), in Fig 3, the area under the curve for power

versus FDR is always higher for GWAS using ALPACA (“GLM”) than for GWASbyCor, indicating it is more

specific at resolving quantitative trait nucleotides.

Fig 2. Manhattan plots and QQ plots for GWAS by GLM or Cor

Page 4: Haixiao Dong and Ryan Oliveira Statistical Genomics (CROPS ...zzlab.net/StaGen/2017/HW/HW4/ALPACA/Homework4_ALPACA.pdf · Haixiao Dong and Ryan Oliveira Statistical Genomics (CROPS

Fig 3. Comparison of Power and FDR for GWAS by GLM or Cor

2015. Jason Wallace. Applying high throughput genomics to crops for the developing world

RESEARCH ARTICLE Haixiao LIU, Li LI, Zhikang OUYANG, Wei

ASIC 200 GENOMICS 2014 ONE (WEB) · Revolution Technology, Humanity Scientific Method, Biology . Society and Prediction: ... (GM crops/food) • Legal and regulatory systems • Domestic

Preview: Some illustrations of graphs in Integrative Genomics Genomics Transcriptomics: Alternative Splicing Genomics Phenotype: Genetic Mapping Comparative

Genomics in Society: Genomics, Preventive Medicine, and Society

GM CROPS (GENETICALLY MODIFIED CROPS)

Comparative Genomics. What is Comparative Genomics? It is the comparison of one genome to another. Genomics DNA (Gene) Functional Genomics TranscriptomicsRNA

Classroom Language Teaching and Learning in the Era of Technology Wang Haixiao Nanjing University

Structural Genomics, ISGO, and Structural Genomics Task Forces Open ISGO Structural Genomics Task Force Meeting ISGO International Structural Genomics

Track 1: Mobility & Accessibility Planning Robert Cervero, Elizabeth Deakin, Chris Cherry, Jennie Day Associates: Pan Haixiao (Tongji University); Ronaldo

stage du M1 LI Haixiao

Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison

GENOMICS IN SOCIETY · Integrated GE3LS Research Review Report Appendices 5 Appendix one: Glossary ABC Applied Genomics Research in Bioproducts or Crops Competition (2008)1 CIHR Canadian

Genomics Interventions in Crop Breeding for Sustainable ...oar.icrisat.org/59/1/merged_document.pdfsustainable production of healthy and safe crops and the results are encouraging

VNMKV Vasantrao Naik Marathwada Krishi Vidyapeeth …...Crops viz., Solanaceous vegetables, cole crops, cucurbits: bulb crops, root crops, leafy vegetables, okra, leguminous crops,

Joe Tohme International Center for Tropical Agriculture ... · An Overview of Root and Tuber Crops Genomics . Joe Tohme . International Center for Tropical Agriculture (CIAT) January

Functional genomics approaches to disease genomics

Genomics of Major Crops and Model Plant Species

Counsyl, Pathway Genomics, 23andMe, Color Genomics | Company Showdown

23andMe, Pathway Genomics, Complete Genomics, Counsyl | Company Showdown

Cardiovascular Genomics in 2007: Careers in Genomics and

Towards CRISPR/Cas crops – bringing together … review Scheben...IV. Identifying targets for genome editing using functional genomics 4 V. Using CRISPR/Cas to improve crops 4 VI

What are Genomics and Computational Genomics?

Google Genomics Documentation - Read the Docsmedia.readthedocs.org/pdf/google-genomics/latest/google-genomics… · Google Genomics Documentation, Release v1 The 1000 Genomes Project

Dr Jean-Marcel Ribaut at CROPS 2015 - Genomics enabled crop breeding & improvement

Soybean Genomics Research Program Strategic Plan ... 2012-2016.pdf · interaction not only ensured timely research progress in all legume crops associated with the Initiative, but

stage du M2 LI Haixiao

Crops and Soils Cover Crops

Developing integrated crop knowledge networks to advance … · Knowledge discovery, crop genomics 1. Introduction The development of improved agricultural crops is a critical societal

CucCAP: Leveraging applied genomics to increase … · increase disease resistance in cucurbit crops. ... and their application to practical breeding ... Leveraging applied genomics

· Web viewGenetics and Genomics. Genetics and Genomics. Genetics and Genomics. Genetics and Genomics. Cell cycle, divisions, gametogenesis. Mutations and polymorphisms. Cytogenetics

DISCOVERY by genomics and functional genomics by genomics and functional genomics Drug ... unpublished, B. Williams and G. Marinov, Wold ... DISCOVERY by genomics and functional genomics

SELECTED INDICATORS · Staple food crops, Edible oil crops, Horticultural crops, Fibre crops, Other crops, Livestock, Agricultural trade, Fisheries, Forestry, Nutrition and Other

Animal Biotechnology and Genomics Education Use of GE crops and animals in CA agriculture Alison Van Eenennaam, Ph.D. Cooperative Extension Specialist

Genomics‑assisted breeding in four major pulse crops of ...oar.icrisat.org/7774/1/TAG_2014_review.pdf · 1 3 Theor Appl Genet DOI 10.1007/s00122-014-2301-3 RevIew Genomics‑assisted