Bioinformatics. Группа авторов

Читать онлайн книгу.

Bioinformatics - Группа авторов


Скачать книгу
regions are defined simply as regions of biased composition (Wootton and Federhen 1993). These may include homopolymeric runs, short-period repeats, or the subtle over-representation of several residues in a sequence. The biological role of these low-complexity regions is not understood; it is thought that they may represent the results of either DNA replication errors or unequal crossing-over events. It is important to determine whether sequences of interest contain low-complexity regions; they tend to prove problematic when performing sequence alignments and can lead to false-positive results, as they are generally similar across unrelated proteins. Finally, before issuing the query, be sure to check the box marked “Show results in a new window.” This leaves the original query window (or tab) in place, making it easier to go back and refine or change search parameters, as needed.

      Understanding the BLAST Output

Snapshot depicts the BLASTP results. Snapshot depicts the BLASTP hit list. Snapshot depicts the detailed information on a representative BLASTP hit in which the header provides the identity of the hit, as well as the score and E value.

      Suggested BLAST Cut-Offs

      As was previously alluded to, the listing of a hit in a BLAST report does not automatically mean that the hit is biologically significant. Over time, and based on both the methodical testing and the personal experience of many investigators, many guidelines have been put forward as being appropriate for establishing a boundary that separates meaningful hits from the rest. For nucleotide-based searches, one should look for E values of 10−6 or less and sequence identities of 70% or more. For protein-based searches, one should look for hits with E values of 10−3 or less and sequence identities of 25% or more. Using less-stringent cut-offs risks entry into what is called the “twilight zone,” the low-identity region where any conclusions regarding the relationship between two sequences may be questionable at best (Doolittle 1981, 1989; Vogt et al. 1995; Rost 1999).

      The reader is cautioned not to use these cut-offs (or any other set of suggested cut-offs) blindly, particularly in the region right around the dividing line. Users should always keep in mind whether the correct scoring matrix was used. Likewise, they should manually inspect the pairwise alignments and investigate the biology behind any putative homology by reading the literature to convince themselves whether hits on either side of the suggested cut-offs actually make good biological sense.


Скачать книгу