Bioinformatics. Группа авторов

Читать онлайн книгу.

Bioinformatics - Группа авторов


Скачать книгу
a BLAT or BLAST search by following the link at the top of any page. Earlier in this chapter, Figure 4.11 outlined how to use BLAT to look for a lizard homolog of the human ADAM18 gene. Ensembl data can be searched by the more sensitive BLAST algorithm, including the TBLASTN program that is used to compare a protein query with a nucleotide database translated in all six reading frames. Copy and paste the FASTA-formatted protein sequence of NCBI RefSeq NP_001307242.1 into the Sequence data box on the BLAST page and carry out a TBLASTN search against the anole lizard genomic sequence. The sequence alignment of the top hit is shown in Figure 4.21. The human protein query is on the top line, and the translated lizard genomic sequence on the second. The sequences share only 32% sequence identity, but the alignment spans 650 amino acids, and some key sequence features are conserved; note the alignment of almost every cysteine residue. Thus, this lizard genomic sequence is indeed a homolog of human ADAM18. The BLAST algorithm, although about two orders of magnitude slower than BLAT for the same query, is able to find a lizard ortholog of the human protein.

      In this example, we will identify the mouse orthologs of the human mRNA reference sequences that are associated with common diseases or traits. To do this, we will start with the output of the UCSC Table Browser, the mRNA reference sequences that overlap with a variant from the GWAS Catalog, pull out the corresponding Ensembl gene and transcript identifiers, and then link to the mouse orthologs. The initial step is to retrieve the RefSeq accession numbers that overlap with a variant from the GWAS Catalog by reproducing the search shown in Figure 4.12d, this time changing the output format to sequence. Copy and paste the output from the Table Browser into your favorite text editor to create a list that contains only the accession numbers. Note that BioMart does not accept the accession.version format used by NCBI, so an accession number like NM_001042682.1 would need to be rewritten as NM_001042682.

      Image described by caption. Image described by caption.

      Retrieving the mouse orthologs of the NCBI reference sequences must be done as a separate step, as it is not possible


Скачать книгу