digital biology laboratory€¦ · digital biology laboratory bioinformatics application areas next...

1
Digital Biology Laboratory Bioinformatics Application Areas Next Generation Sequencing: The latest trend in molecular biology is conducting genomic studies using next- generation sequencing (NGS) technology. NGS technologies can be used in SNP and GWAS studies for genotype-phenotype relationships, RNA-Seq transcriptomics studies for gene expression, microRNA studies for gene regulation, Chip-Seq analyses for transcription factor binding, and bisulphite sequencing for epigenomics studies. Our lab has performed computational analyses for all these applications areas. The results of NGS analyses are visualized in the UCSC Genome Browser that we have set up locally. Furthermore, we conduct many computational analyses, such as differential expression analysis of transcriptomics datasets, hierarchical clustering, gene enrichment test, and pathway model, all of which offer meaningful inferences. Plant Informatics: We have conducted various computational studies for Arabidopsis, soybean and maize. We integrate the soybean data and analyses into Soybean Knowledge Base (SoyKB), a one- stop shop comprehensive web resource for soybean research that we developed. SoyKB integrates all types of multi-omics high- throughput datasets along with miRNA, functional annotations and pathway information. It has many useful tools such as 3D Protein Structure Viewer, Affymetrix Probe ID Search, Gene Family Search, and tools for soybean breeders. We are also developing an innovative cyber-studio that allows soybean researchers to generate and test biological hypotheses utilizing the multi- omics datasets as evidence. Cancer Studies: We performed computational analyses for DNA methylation in cancer. DNA methylation plays an important role in the regulation of gene expression in both normal and dysfunctional cells, and it is also considered to be a hallmark of various types of cancer. We studied DNA methylation patterns in different cancer cell lines and of various cancer related genes. We also studied the enzyme activities of DNA methyltransferases and their relationships to cancers. As a marker of Helicobacter pylori, Cytotoxin-associated gene A (cagA) has been revealed to be the major virulence factor causing gastroduodenal diseases. To further understand the relationship between CagA sequence and its virulence to gastric cancer, we developed a systematic entropy-based method to identify the cancer-related residues in CagA and employed a supervised machine learning method for cancer and non-cancer classification. Our method provides not only a useful tool to predict the correlation between the novel CagA strain and diseases, but also a general new framework for detecting biological sequence biomarkers in population studies. Protein Structure Prediction: The 3D structure of protein holds the key in understanding its biological function at the molecular level. Knowledge of protein structure also allows researchers to identify and characterize disease targets and provides a rational approach to drug design. We collaborate with numerous biologists to predict structures for their proteins of interest, including researchers from University of Missouri, National Institutes of Environmental Health Sciences in USA,, First Affiliated Hospital at Nanjing Medical University in China, National Laboratory of Protein Engineering and Plant Genetic Engineering in China, and Spanish National Cancer Research Centre. MUFOLD structural models were also used for building protein complexes with cryo-EM data. Protein Modifications: Proteins usually exist in chemically modified forms. To predict these modifications, Musite, a novel open-source software toolkit specifically designed for large-scale prediction of both general and kinase-specific phosphorylation sites, also became a general tool in prediction of other post- translational modification sites such as acetylation, methylation, and sumoylation. It has been applied in a variety of organisms such as A. thaliana, B. napus, C. elegans, D. melanogaster, G. max, H. sapiens, M. musculus, M. truncatula, O. sativa, S. cerevisiae, and Z. mays. Selected Gene Tree: Laurent_SoybeanAffy_Bj (Default Interpretation) Colored by: Laurent_SoybeanAffy_Bj (Default Interpretation) Gene List: 1-Way ANOVA Bj (1436) The research has been funded by NIH, NSF, DOE, USDA, US Army, United Soybean Board, Missouri Soybean Merchandising Council, Missouri Life Science Trust Fund, Monsanto Research Fund, Cerner Corporation, National Center for Soybean Biotechnology, and University of Missouri. For more details, please visit http://digbio.missouri.edu.

Upload: others

Post on 15-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Biology Laboratory€¦ · Digital Biology Laboratory Bioinformatics Application Areas Next Generation Sequencing: The latest trend in molecular biology is conducting genomic

Digital Biology Laboratory Bioinformatics Application Areas

Next Generation Sequencing: The latest trend in molecular biology is conducting genomic studies using next-generation sequencing (NGS) technology. NGS technologies can be used in SNP and GWAS studies for genotype-phenotype relationships, RNA-Seq transcriptomics studies for gene expression, microRNA studies for gene regulation, Chip-Seq analyses for transcription factor binding, and bisulphite sequencing for epigenomics studies. Our lab has performed computational analyses for all these applications areas. The results of NGS analyses are visualized in the UCSC Genome Browser that we have set up locally. Furthermore, we conduct many computational analyses, such as differential expression analysis of transcriptomics datasets, hierarchical clustering, gene enrichment test, and pathway model, all of which offer meaningful inferences. Plant Informatics: We have conducted various computational studies for Arabidopsis, soybean and maize. We integrate the soybean data and analyses into Soybean Knowledge Base (SoyKB), a one-stop shop comprehensive web resource for soybean research that we developed. SoyKB integrates all types of multi-omics high-throughput datasets along with miRNA, functional annotations and pathway information. It has many useful tools such as 3D Protein Structure Viewer, Affymetrix Probe ID Search, Gene Family Search, and tools for soybean breeders. We are also developing an innovative cyber-studio that allows soybean researchers to generate and test biological hypotheses utilizing the multi-omics datasets as evidence. Cancer Studies: We performed computational analyses for DNA methylation in cancer. DNA methylation plays an important role in the regulation of gene expression in both normal and dysfunctional cells, and it is also considered to be a hallmark of various types of cancer. We studied DNA methylation patterns in different cancer cell lines and of various cancer related genes. We also studied the enzyme activities of DNA methyltransferases and their relationships to cancers.

As a marker of Helicobacter pylori, Cytotoxin-associated gene A (cagA) has been revealed to be the major virulence factor causing gastroduodenal diseases. To further understand the relationship between CagA sequence and its virulence to gastric cancer, we developed a systematic entropy-based method to identify the cancer-related residues in CagA and employed a supervised machine learning method for cancer and non-cancer classification. Our method provides not only a useful tool to predict the correlation between the novel CagA strain and diseases, but also a general new framework for detecting biological sequence biomarkers in population studies. Protein Structure Prediction: The 3D structure of protein holds the key in understanding its biological function at the molecular level. Knowledge of protein structure also allows researchers to identify and characterize disease targets and provides a rational approach to drug design. We collaborate with numerous biologists to predict structures for their proteins of interest, including researchers from University of Missouri, National Institutes of Environmental Health Sciences in USA,, First Affiliated Hospital at Nanjing Medical University in China, National Laboratory of Protein Engineering and Plant Genetic Engineering in China, and Spanish National Cancer Research Centre. MUFOLD structural models were also used for building protein complexes with cryo-EM data. Protein Modifications: Proteins usually exist in chemically modified forms. To predict these modifications, Musite, a novel open-source software toolkit specifically designed for large-scale prediction of both general and kinase-specific phosphorylation sites, also became a general tool in prediction of other post-translational modification sites such as acetylation, methylation, and sumoylation. It has been applied in a variety of organisms such as A. thaliana, B. napus, C. elegans, D. melanogaster, G. max, H. sapiens, M. musculus, M. truncatula, O. sativa, S. cerevisiae, and Z. mays.

Selected Gene Tree: Laurent_SoybeanAffy_Bj (Default Interpretation)

Colored by:Laurent_SoybeanAffy_Bj (Default Interpretation)

Gene List:1-Way ANOVA Bj (1436)

The research has been funded by NIH, NSF, DOE, USDA, US Army, United Soybean Board, Missouri Soybean Merchandising Council, Missouri Life Science Trust Fund, Monsanto Research Fund, Cerner Corporation, National Center for Soybean Biotechnology, and University of Missouri.

For more details, please visit http://digbio.missouri.edu.