Bioinformatics. Группа авторов

Читать онлайн книгу.

Bioinformatics - Группа авторов


Скачать книгу
considered."/>

      Running a FASTA Search

      The University of Virginia provides a web front-end for issuing FASTA queries. Various protein and nucleotide databases are available, and up to two databases can be selected for use in a single run. From this page, the user can also specify the scoring matrix to be used, gap and extension penalties, and the value for ktup. The default values for ktup are 2 for protein-based searches and 6 for nucleotide-based searches; lowering the value of ktup increases the sensitivity of the run, at the expense of speed. The user can also limit the results returned to particular E values.

      Statistical Significance of Results

      As before, the E values from a FASTA search represent the probability that a hit has occurred purely by chance. Pearson (2016) puts forth the following guidelines for inferring homology from protein-based searches, which are slightly different than those previously described for BLAST: an E value < 10−6 almost certainly implies homology. When E < 10−3, the query and found sequences are almost always homologous, but the user should guarantee that the highest scoring unrelated sequence has an E value near 1.

      Comparing FASTA and BLAST

Snapshot depicts the search summary from a protein–protein FASTA search, using the sequence of histone H2B.3 from Hydractinia echinata as the query and BLOSUM62 as the scoring matrix. The header indicates that the query is against the Swiss-Prot database. The histogram indicates the distribution of all similarity scores computed for this search. The left-most column provides a normalized similarity score, and the column marked opt gives the number of sequences with that score. Snapshot depicts the hit list for the protein–protein FASTA search.

       FASTA begins the search by looking for exact matches of words, while BLAST allows for conservative substitutions in the first step.

       BLAST allows for automatic masking of sequences, while FASTA does not.

       FASTA will return one and only one alignment for a sequence in the hit list, while BLAST can return multiple results for the same sequence, each result representing a distinct HSP.

       Since FASTA uses a version of the more rigorous Smith–Waterman alignment method, it generally produces better final alignments and is more apt to find distantly related sequences than BLAST. For highly similar sequences, their performance is fairly similar.

       When comparing translated DNA sequences with protein sequences or vice versa, FASTA (specifically, FASTX/FASTY for translated DNA → protein and TFASTX/TFASTY for protein → translated DNA) allows for frameshifts.

       BLAST runs faster than FASTA,


Скачать книгу