Bioinformatics. Группа авторов

Читать онлайн книгу.

Bioinformatics - Группа авторов


Скачать книгу

      Several studies have attempted to answer the “which method is better” question by performing systematic analyses with test datasets (Pearson 1995; Agarawal and States 1998; Chen 2003). In one such study, Brenner et al. (1998) performed tests using a dataset derived from already known homologies documented in the Structural Classification of Proteins database (SCOP; Chapter 12). They found that FASTA performed better than BLAST in finding relationships between proteins having >30% sequence identity, and that the performance of all methods declines below 30%. Importantly, while the statistical values reported by BLAST slightly underestimated the true extent of errors when looking for known relationships, they found that BLAST and FASTA (with ktup = 2) were both able to detect most known relationships, calling them both “appropriate for rapid initial searches.”

      Internet Resources

BLAST
European Bioinformatics Institute (EBI) www.ebi.ac.uk/blastall
National Center for Biotechnology Information (NCBI) blast.ncbi.nlm.nih.gov
BLAST-Like Alignment Tool (BLAT) genome.ucsc.edu/cgi-bin/hgBlat
NCBI Conserved Domain Database (CDD) ncbi.nlm.nih.gov/cdd
Cancer Genome Anatomy Project (CGAP) ocg.cancer.gov/programs/cgap
FASTA
EBI www.ebi.ac.uk/Tools/sss/fasta
University of Virginia fasta.bioch.virginia.edu
RefSeq ncbi.nlm.nih.gov/refseq
Structural Classification of Proteins (SCOP) scop.berkeley.edu
Swiss-Prot www.uniprot.org

      1 Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. (1994). Issues in searching molecular sequence databases. Nat. Genet. 6: 119–129. A review of the issues that are of importance in using sequence similarity search programs, including potential pitfalls.

      2 Fitch, W. (2000). Homology: a personal view on some of the problems. Trends Genet. 16: 227–231. A classic treatise on the importance of using precise terminology when describing the relationships between biological sequences.

      3 Henikoff, S. and Henikoff, J.G. (2000). Amino acid substitution matrices. Adv. Protein Chem. 54: 73–97. A comprehensive review covering the factors critical to the construction of protein scoring matrices.

      4 Koonin, E. (2005. Orthologs, paralogs, and evolutionary genomics). Annu. Rev. Genet. 39: 309–338. An in-depth explanation of orthologs, paralogs, and their subtypes, with a discussion of their evolutionary origin and strategies for their detection.

      5 Pearson, W.R. (2016). Finding protein and nucleotide similarities with FASTA. Curr. Protoc. Bioinf. 53: 3.9.1–3.9.23. An in-depth discussion of the FASTA algorithm, including worked examples and additional information regarding run options and use scenarios.

      6 Wheeler, D.G. (2003). Selecting the right protein scoring matrix. Curr. Protoc. Bioinf. 1: 3.5.1–3.5.6. A discussion of PAM, BLOSUM, and specialized scoring matrices, with guidance regarding the proper choice of matrices for particular types of protein-based analyses.

      1 Agarawal, P. and States, D.J. (1998). Comparative accuracy of methods for protein similarity search. Bioinformatics. 14: 40–47.

      2 Altschul, S.F. (1991). Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219: 555–565.

      3 Altschul, S.F. and Koonin, E.V. (1998). Iterated profile searches with PSI-BLAST: a tool for discovery in protein databases. Trends Biochem. Sci. 23: 444–447.

      4 Altschul, S.F., Gish, W., Miller, W. et al. (1991). Basic local alignment search tool. J. Mol. Biol. 215: 403–410.

      5 Altschul, S.F., Madden, T.L., Schäffer, A.A. et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402.

      6 Brenner, S.E., Chothia, C., and Hubbard, T.J.P. (1998). Assessing sequence comparison methods with reliable structurally identified evolutionary relationships. Proc. Natl. Acad. Sci. USA. 95: 6073–6078.

      7 Bücher, P., Karplus, K., Moeri, N., and Hofmann, K. (1996). A flexible motif search technique based on generalized profiles. Comput. Chem. 20: 3–23.

      8 Chen, Z. (2003). Assessing sequence comparison methods with the average precision criterion. Bioinformatics. 19: 2456–2460.

      9 Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. (1978). A model of evolutionary change in proteins. In: Atlas of Protein Sequence and Structure, vol. 5 (ed. M.O. Dayhoff), 345–352. Washington, DC: National Biomedical Research Foundation.

      10 Doolittle, R.F. (1981). Similar amino acid sequences: chance or common ancestry. Science 214: 149–159.

      11 Doolittle, R.F. (1989). Similar amino acid sequences revisited. Trends Biochem. Sci. 14: 244–245.

      12 Gonnet,


Скачать книгу