Bioinformatics. Группа авторов
Читать онлайн книгу.[VOL]
AND
, OR
, or NOT
For each of the found papers shown in the Summary view in Figure 2.2, the user is presented with the title of the paper, the authors of that paper, and the citation. To look at any of the papers resulting from the search, the user can simply click on any of the hyperlinked titles. For this example, consider the third reference in the list, by Srour et al. (2010). Clicking on the title takes the user to the Abstract view shown in Figure 2.3. This view presents the name of the paper, the list of authors, their institutional affiliation, and the abstract itself. Below the abstract is a gray bar labeled “MeSH terms, Substances”; clicking on the plus sign at the end of the gray bar reveals cataloging information (MeSH terms, for medical subject headings) and indexed substances related to the manuscript. Several alternative formats are available for displaying this information, and these various formats can be selected using the Format pull-down menu found in the upper left corner of the window. Switching to MEDLINE format produces the MEDLINE layout, with two-letter codes corresponding to the contents of each field going down the left-hand side of the entry (e.g. the author field is again denoted by the code AU
). Lists of entries in this format can be saved to the desktop and easily imported into third-party bibliography management programs.
Figure 2.3 An example of a PubMed record in Abstract format, as returned through Entrez. This Abstract view is for the third reference shown in Figure 2.2. This view provides connections to related articles, sequence information, and the full-text journal article through the Discovery Column that runs down the right-hand side of the page. See text for details.
The column on the right-hand side of this window – aptly named the Discovery Column – provides access to the full-text version of the paper and, more importantly, contains many useful links to additional information related to this manuscript. The Similar articles section provides one of the entry points from which the user can take advantage of the neighboring and hard link relationships described earlier and, in the examples that follow, we will return to this page several times to illustrate a selected cross-section of the kinds of information available to the user. To begin this journey, if the user clicks on the See all link at the bottom of that section, Entrez will return a list of 104 references related to the original Rouleau paper at the time of this writing; the first six of these papers are shown in Figure 2.4. The first paper in the list is the same Rouleau paper because, by definition, it is most related to itself (the “parent” entry). The order in which the related papers follow is based on statistical similarity. Thus, the entry closest to the parent is deemed to be the closest in subject matter to the parent. By scanning the titles, the user can easily find related information on other studies, as well as quickly amass a bibliography of relevant references. This is a particularly useful and time-saving function when one is writing grants or papers, as abstracts can easily be scanned and papers of real interest can be identified quickly.
Figure 2.4 Neighbors to an entry found in PubMed. The original entry from Figure 2.3 (Srour et al. 2010) is at the top of the list, indicating that this is the parent entry. Additional neighbors to each of the papers in this list can be found by clicking the Similar articles link found below each entry. See text for details.
Figure 2.5 The Entrez Gene page for the DCC (deleted in colorectal carcinoma) netrin-1 receptor from human. The entry indicates that this is a protein-coding gene at map location 18q21.2, and information on the genomic context of DCC, as well as alternative gene names and information on the encoded protein, is provided. An extensive collection of links to other National Center for Biotechnology Information (NCBI) and external databases is also provided. See text for details.
Returning to the Abstract view presented in Figure 2.3, at the bottom of the Discovery Column is a series of hard-link connections to other databases within the Entrez system that can take the user directly to an extensive set of information related to the content of the publication of interest. Here, selecting the Gene link takes the user to Entrez Gene, a feature of Entrez that provides a wealth of information about the gene in question (Figure 2.5). The data are gathered from a variety of sources, including RefSeq. Here, we see that DCC is the official symbol of a protein-coding gene for a netrin-1 receptor in humans. The Genomic context section of this page indicates that the DCC is a protein-coding gene at map location 18q21.2. Immediately below, summary information on the genomic region, transcripts, and products of the DCC gene are presented graphically, with genomic coordinates provided. Additional content not shown in the figure can be found by scrolling down the Gene page, where the user will find relevant functional information (such as gene expression data), associated phenotypes, information on protein–protein interactions, pathway information, Gene Ontology assignments, and homologies to similar sequences in selected organisms. Shortcut links to these sections can be found in the Table of contents at the top of the Discovery Column. Further down the Discovery Column are extensive lists of links to additional resources provided through NCBI and other sources. One link of note is the SNP: Gene View link, taking the user to data derived from dbSNP (Figure 2.6). The information found within dbSNP goes beyond just single-nucleotide polymorphisms (SNPs), including data on short genetic variations such as short insertions and deletions, short tandem repeats, and microsatellites. Here, we will focus on the table shown in Figure 2.6, which is a straightforward way to view information about individual SNPs. Each SNP entry occupies two or more lines of the table, with one line showing the contig reference (the more common allele) and the other showing the SNP (the less common allele). Consider the first three lines of the table, showing a contig reference G for which there are two documented SNPs, changing the G at that position to either an A or a C. At the protein level, this changes the amino acid at position 2 of the DCC protein from glutamic acid to lysine (for the G-to-A substitution) or to glutamine (for the G-to-C substitution). These rows are colored red since these are “non-synonymous SNPs” – that is, the SNP produces a discrete change at the amino acid level. In contrast, consider the first set of green rows in the table, with the green indicating that this is a “synonymous SNP,” where the codons for the contig reference (G) and the SNP allele (A) ultimately produce the same amino acid (Glu); this is not altogether surprising, with the SNP being in the wobble position of the codon, where there is often redundancy in the genetic code. Additional information on human SNPs can be found in Chapter 15.