Bioinformatics. Группа авторов
Читать онлайн книгу.may change on a weekly basis. Each Ensembl page has a link at the bottom called View in archive site. The archive site provides links to older versions of that page, including previous annotation sets on the same genome assembly, as well as prior genome assemblies.
The Ensembl Browser provides many of the same types of resources and tools as does the UCSC Genome Browser. Sequences can be aligned to the assembled genomes using either BLAT or BLAST, and data can be returned in various tabular formats using BioMart (Kinsella et al. 2011). Data and software can be retrieved from the Downloads menu, available from most browser pages. In the Tools menu, Ensembl provides a number of additional tools to manipulate data, including the Variant Effect Predictor (VEP) (McLaren et al. 2016), which predicts functional consequences of known and unknown variants, File Chameleon, which reformats files available on the Ensembl FTP site, and Assembly Converter, which is like UCSC's liftOver and is used to convert coordinates between genome assemblies. The Help & Documentation menu provides substantial written and video-based information about how to navigate and interpret the Ensembl site, far beyond the level of detail presented in this chapter.
Ensembl also provides ways for users to upload their data into the browser. Properly formatted tracks can be added to the display by selecting the Custom tracks option from the left side of any species-specific page. The data can be uploaded to Ensembl from a file on the user's computer or, if it is saved on a web server, the browser can read it from a URL. Users who create an account at Ensembl can save track data to the Ensembl database server and view them later from any computer. To share custom tracks or even a customized view of the Genome Browser with colleagues, click on the Share this Page link on the left sidebar. Ensembl also supports Track Hubs, both public ones that are registered on the EMBL-EBI Track Hub Registry as well as private ones.
Figure 4.13 The home page of the Ensembl Genome Browser, showing a query for the human gene PAH. The browser suggests results based on the search term submitted. By default, the search box interfaces with the most recent version of the genome assembly, GRCh38, at the time of this writing. A link to the previous human genome assembly, GRCh37, is provided at the bottom of the page. Older assemblies from other organisms are available in the Ensembl archives.
Figure 4.14 The Gene tab for the human PAH gene. This landing page provides links to many gene-specific resources.
Like the UCSC Genome Browser home page, the home page of Ensembl is a stepping-off point for many Ensembl resources. Links to commonly used tools, such as BLAST and BLAT, are provided on the top and middle sections of the page, and recent data updates are highlighted in the right column. The home page for each genome can be accessed by selecting the organism name in the pull-down menu in the Browse a Genome section in the center of the page. A search box at the top of the page provides access to Ensembl. To search for the human PAH gene, select Human from the pull-down menu and type the term PAH
in the search box. Ensembl will provide several suggested hits, including a direct link to the human PAH gene (Figure 4.13).
Ensembl data displays are organized in tabs. The Gene tab (Figure 4.14) has links to a number of gene-specific views and resources. For example, from the index on the left side of the Gene tab view, the Comparative Genomics → Orthologues link lists the computationally predicted orthologs of the selected gene that Ensembl has identified among the available genome assemblies (Herrero et al. 2016; Figure 4.15). The Location tab provides a graphical view of the genomic context of the gene, similar to the view available at UCSC. The link to the Location tab is at the top of the Gene tab view in Figure 4.14. The Location tab view is shown in Figure 4.16 and depicts, at three different zoom levels, the genomic context of the PAH gene on the GRCh38 genome assembly. The PAH gene has been mapped to chromosome 12, and the top panel shows a cartoon of that chromosome, with the region surrounding the PAH gene outlined in a red box. This red box is expanded in the middle panel of the figure, which shows ∼1 Mb of chromosome 12 around the PAH gene. The genes are shown as colored blocks, with their identifiers noted below them. The region outlined in red in this middle section is further expanded in the large bottom panel, which zooms in on the PAH gene itself. Individual tracks are visible in this view. Note the track called Contigs, a blue bar that represents the underlying assembled contigs. By convention, any transcripts shown above this track are transcribed from left to right. Transcripts drawn below the Contigs track, such as the PAH transcripts, are transcribed on the opposite strand, from right to left.
Figure 4.15 Computationally predicted orthologs of the human PAH gene, from the Comparative Genomics → Orthologues link in Figure 4.14. Ensembl provides a detailed analysis of the orthologs calculated for each gene. Orthologs are grouped by species, such as primates, rodents, and sauropsids. Links to individual orthologs are shown at the bottom of the page.
The default human gene set used by Ensembl is the GENCODE Comprehensive set (Box 4.2). Ensembl displays 18 PAH isoforms, each with a slightly different pattern of exons (Figure 4.16). Coding exons are depicted as solid blocks, non-coding exons as outlined blocks, and introns are the lines that connect them. The transcripts are color coded to indicate their status: gold transcripts are protein coding and have been annotated by both the Ensembl and HAVANA team at the WTSI, red transcripts are protein coding and have been annotated by either Ensembl or HAVANA, and blue transcripts are processed transcripts that are non-protein coding. Clicking on a transcript pops up a box with additional information about that feature, including its accession number, and, for a transcript, the transcript type and gene prediction source (Box 4.4; Figure 4.16).
Figure 4.16 The Location tab for the human PAH gene. The Location tab is divided into three sections. The top section shows a cartoon of human chromosome 12, with the region surrounding the PAH gene outlined in a red box. Other red and green lines on the cartoon indicate assembly exceptions, or regions of alternative