Bioinformatics. Группа авторов
Читать онлайн книгу.4.5 The genomic context of the human HIF1A gene, after displaying RefSeq Curated genes in full mode. Each RefSeq transcript is now drawn on a separate line, so that individual exons, as well as the direction of transcription, are visible. Compare this rendition with Figure 4.2, where all RefSeq transcripts are condensed on a single line.
Figure 4.6 The Get Genomic Sequence page that provides an interface for users to retrieve the sequence for a feature of interest. Click on an individual transcript in the GENCODE or RefSeq track to open a page with additional details for that transcript. On either of those details pages, click the link for Genomic Sequence to open the page displayed here, which provides choices for retrieving sequences upstream or downstream of the transcript, as well as intron or exon sequences. In this example, retrieve the sequence 1000 nt upstream of the annotated transcription start site. Shown in the inset is the result of retrieving the FASTA-formatted sequence 1000 nt upstream of the HIF1A transcript.
Further down on the graphical view shown in Figure 4.3 are tracks from the ENCODE Regulation super-track: Layered H3K27Ac and DNase Clusters. These data were generated by the Encyclopedia of DNA Elements (ENCODE) Consortium between 2003 and 2012 (ENCODE Project Consortium 2012). The ENCODE Consortium has developed reagents and tools to identify all functional elements in the human genome sequence. The Layered H3K27Ac track indicates regions where there are modified histones that may indicate active enhancers (Box 4.3).
Box 4.3 Histone Marks
Histone proteins package DNA into chromosomes. Post-translational modifications of these histones can affect gene expression, as well as DNA replication and repair, by changing chromatin structure or recruiting histone modifiers (Lawrence et al. 2016). The post-translational modifications include methylation, phosphorylation, acetylation, ubiquitylation, and sumoylation. Histone H3 is primarily acetylated on lysine residues, methylated at arginine or lysine, or phosphorylated on serine or threonine. Histone H4 is primarily acetylated on lysine, methylated at arginine or lysine, or phosphorylated on serine.
Histone modification (or “marking”) is identified by the name of the histone, the residue on which it is marked, and the type of mark. Thus, H3K27Ac is histone H3 that is acetylated on lysine 27, while H3K79me2 is histone H3 that is dimethylated on lysine 79. Different histone marks are associated with different types of chromatin structure. Some are more likely found near enhancers and others near promoters and, while some cause an increase of expression from nearby genes, others cause less. For example, H3K4me3 is associated with active promoters, and H3K27me3 is associated with developmentally controlled repressive chromatin states.
The DNase Clusters track depicts regions where chromatin is hypersensitive to cutting by the DNaseI enzyme. In these hypersensitive regions, the nucleosome structure is less compacted, meaning that the DNA is available to bind transcription factors. Thus, regulatory regions, especially promoters, tend to be DNase sensitive. The track settings for the ENCODE Regulation super-track allows other ENCODE tracks to be added to the browser window, including additional histone modification and DNaseI hypersensitivity data. Changing the display of the H3K4Me3 peaks from hide to full highlights the peaks in the H3K4Me3 track near the 5′ ends of the HIF1A and SNAPC1 transcripts that overlap with DNase hypersensitive sites (Figure 4.7, blue highlights). These peaks may represent promoter elements that regulate the start of transcription.
The UCSC Genome Browser displays data from NCBI's Single Nucleotide Polymorphism Database (dbSNP) in four SNP tracks. Common SNPs contains SNPs and small insertions and deletions (indels) from NCBI's dbSNP that have a minor allele frequency of at least 1% and are mapped to a single location in the genome. Researchers looking for disease-causing SNPs can use this track to filter their data, hypothesizing that their variant of interest will be rare and therefore not displayed in this track. Flagged SNPs are those that are deemed by NCBI to be clinically associated, while Mult. SNPs have been mapped to more than one region in the genome. NCBI filters out most multiple-mapping SNPs as they may not be true SNPs, so there are not many variants in this track. All SNPs includes all SNPs from the three subcategories. dbSNP is in a continuous state of growth, and new data are incorporated a few times each year as a new release, or new build, of dbSNP. These four SNP tracks are available for a few of the most recent builds of dbSNP, indicated by the number in the track name. Thus, for example, Common SNPs (150) are SNPs found in ≥1% of samples from dbSNP build 150.
By default, the Common SNPs (150) track is displayed in dense mode, with all variants in the region compressed onto a single line. Variants in the Common SNPs track are color coded by function. Open the Track Settings for this track in order to modify the display (Figure 4.8). Set the Display mode to pack in order to show each variant separately. At the same time, modify the Coloring Options so that SNPs in UTRs of transcripts are set to blue and SNPs in coding regions of transcripts are set to green if they are synonymous (no change to the protein sequence) or red if they are non-synonymous (altering the protein sequence), with all remaining classes of SNPs set to display in black. Note the changes in the resulting browser window, with the green synonymous and blue untranslated SNPs clearly visible (Figure 4.9).
Figure 4.7 The genomic context of the human HIF1A gene, after changing the display of the H3K4Me3 peaks from hide to full. The H3K4Me3 track is part of the ENCODE Regulation super-track. Below the graphic display window in Figure 4.5, open up the ENCODE Regulation Super-track, in the Regulation menu. Change the track display from hide to full to reproduce the page shown here. Note that the H3K4Me3 peaks, which can indicate promoter regions (Box 4.3), overlap with the transcription starts of the SNAPC1 and HIF1A genes (light blue highlight). These regions also overlap with the DNase HS track, indicating that the chromatin should be available to bind transcription factors in this region. The highlights were added within the Genome Browser using the Drag-and-select tool. This tool is accessed by clicking anywhere in the Scale track at the top of the Genome Browser display and dragging the selection window across a region of interest. The Drag-and-select tool provides options to Highlight the selected region or Zoom directly to it.
Figure 4.8 Configuring the track settings for the Common SNPs(150) track. Set the Coloring Options so that all SNPs are black, except for untranslated SNPs (blue), coding-synonymous SNPs