s 021972001100580 x
TRANSCRIPT
-
7/27/2019 s 021972001100580 x
1/3
Journal of Bioinformatics and Computational BiologyVol. 9, No. 6 (2011) vviic Imperial College Press
DOI: 10.1142/S021972001100580X
INTRODUCTION
Some New Results and Tools for Protein Function Prediction,
RNA Target Site Prediction, Genotype Calling,
Environmental Genomics, and More
This issue of the Journal of Bioinformatics and Computational Biology contains
six original research articles presenting new results and tools for a wide variety of
bioinformatics problems. In addition, it contains a review on some recently released
free bioinformatics software packages and a tutorial on data-driven normalization
strategies for miRNA qPCR data. A brief introduction to these works is given
below.
Proteins are important building blocks that contribute to key processes within
cells. The elucidation of mechanisms underlying protein functionality is an impor-tant pursuit and remains a challenging task in computational biology.1 Sequence
similarity search methods like BLAST and their refinements (e.g. Ref. 2) are the
primary tools for this problem. However, a non-negligible proportion of protein
sequences do not have identifiable informative homologs in current databases. In
this issue, Wang and Li3 propose a new approach to infer protein function from
proteinprotein interaction networks. Their sequential neighborhood propagation
method is a semi-supervised approach that is able to predict the function of unla-
beled proteins farther away from labeled ones. Experiments show that this approachhas higher sensitivity and precision than many competing methods.
Similarity of GO terms is important in a variety of analyses such as alignment of
biological pathways,4 detection of reproducible group signatures from gene expres-
sion data,5 etc. In this issue, Alvarez and Yan6 present a method for measuring the
similarity between a pair of GO terms. Their method takes into account the short-
est path between the GO terms, the depth of their nearest common ancestor, and
the similarity of the definitions of the two GO terms. Although the method does
not use information from annotated gene products, it achieves high competitive
performance compared to other methods.The importance of RNARNA interactions in the regulation of crucial functions
in an organism is now clearly recognized. In this issue, Poolsap et al.7 present a novel
approach to predict the binding sites of target RNAs for an antisense RNA. The
method uses profiles of intermolecular interactions and achieves lower time com-
plexity compared with earlier methods while delivering good prediction accuracy.
v
J.Bioinform
.Comput.Biol.2011.0
9:v-vii.
Downloadedfromw
ww.worldscientif
ic.com
by85.7
4.8
4.1
34on10/23/12
.Forpersonaluseonly.
http://dx.doi.org/10.1142/S021972001100580Xhttp://dx.doi.org/10.1142/S021972001100580X -
7/27/2019 s 021972001100580 x
2/3
vi Introduction
The number of GWAS based on SNP chips has increased significantly in recent
years. Algorithms for genotype calling for SNP chip data are thus important. Many
existing solutions are inaccurate for SNPs with low minor allele frequency and/or
are computationally inefficient. In this issue, Fu and Xu8 present a new two-stagegenotype calling method that combines unsupervised classification (first stage) and
supervised classification (second stage) techniques. Experiments show that this
method outperforms or matches existing methods in terms of accuracy, especially
in small-sample-size situations, and is several times more efficient.
The next article in this issue9 is a most unusual paper. In most cases, we do
not really know what the correct output of a bioinformatics software should be, so
how to test and debug bioinformatics software is a very interesting issue. Sadi et al.
make an interesting observation that, in many cases, while we do not know whatthe output should be, we do have a fairly good idea of how much the output should
change when a certain amount of changes are made to the input. This leads them
to present an approach called metamorphic testing to exploit this observation.
In metamorphic testing, we determine whether changes in the output of a software
correlate with changes to its input.
Environmental genomics requires computational infrastructure at a new level of
complexity, especially that associated with large quantities of short metagenomic
sequences.10,11 In this issue, Nebel et al.12 describe JAguc, a standalone platform
for environmental diversity analysis. JAguc provides many built-in operations andanalysis functions on sampling saturation curves, rank abundance plots, etc. in a
nice graphical user interface. Its implementation makes effective use of multicore
features of modern computers; thus it can handle large sample sizes well.
The remaining two articles of this issue are a review on software packages for 2D
gel image analysis13 and a tutorial on normalization strategies for miRNA quantiti-
ave real-time PCR arrays.14 The former offers interesting insights and comparison
of a couple of recently released free academic software for 2D gel image analy-
sis. The latter compares several data-driven normalization methods for TaqMan
low-density qPCR arrays, assesses their performance using endogenous controls,
and shows that these data-driven methods reduce variation and represent robust
alternatives to using endogenous controls.
Limsoon Wong
Managing Editor
References
1. Hawkins T, Kihara D, Function prediction of uncharacterized proteins, J BioinformComput Biol 5(1):130, 2007.
2. Alexandrov K, Sobolev B, Filimonov D, Poroikov V, Recognition of protein functionusing local similarity, J Bioinform Comput Biol 6(4):709725, 2008.
3. Wang J, Li Y, Sequential linear neighbourhood propagation for semi-supervised pro-tein function prediction, J Bioinform Comput Biol 9(6):663679, 2011.
J.Bioinform
.Comput.Biol.2011.0
9:v-vii.
Downloadedfromw
ww.worldscientif
ic.com
by85.7
4.8
4.1
34on10/23/12
.Forpersonaluseonly.
-
7/27/2019 s 021972001100580 x
3/3
Introduction vii
4. Gamalielsson J, Olsson B, Gene Ontology-based semantic alignment of biologicalpathways by evolutionary search, J Bioinform Comput Biol 6(4):825842, 2008.
5. Licamele L, Getoor L, A method for the detection of meaningful and reproducible
group signatures from gene expression profiles, J Bioinform Comput Biol9
(3):431451, 2011.6. Alvarez MA, Yan C, A graph-based semantic similarity measure for the Gene Ontol-
ogy, J Bioinform Comput Biol 9(6):681695, 2011.7. Poolsap U, Kato Y, Sato K, Akutsu T, Using binding profiles to predict binding sites
of target RNAs, J Bioinform Comput Biol 9(6):697713, 2011.8. Fu B, Xu J, A new genotype calling method for Affymetrix SNP arrays, J Bioinform
Comput Biol 9(6):715728, 2011.9. Sadi MS, Kuo FC, Ho JWK, Charleston MA, Chen TY, Verification of phylogenetic
inference programs using metamorphic testing, J Bioinform Comput Biol 9(6):729
747, 2011.10. Kaplarevic M, Murray AE, Cary SC, Gao GR, EnGenIUS environmental genomeinformational utility system, J Bioinform Comput Biol 6(6):1193-1211, 2008.
11. Ye Y, Tang H, An ORFome assembly approach to metagenomics sequences analysis,J Bioinform Comput Biol 7(3):455471, 2009.
12. Nebel ME, Wild S, Holzhauser M et al., JAguc a software package for environmentdiversity analyses, J Bioinform Comput Biol 9(6):749773, 2011.
13. Wu Y, Zhang L, Comparison of two academic software packages for analyzing 2D gelimages, JJ Bioinform Comput Biol 9(6):775794, 2011.
14. Deo A, Carlsson J, Lindlof A, How to choose a normalization strategy for miRNAquantitative real-time (qPCR) arrays, J Bioinform Comput Biol 9(6):795812, 2011.
J.Bioinform
.Comput.Biol.2011.0
9:v-vii.
Downloadedfromw
ww.worldscientif
ic.com
by85.7
4.8
4.1
34on10/23/12
.Forpersonaluseonly.