Bioinformatics. Группа авторов
Читать онлайн книгу.much more straightforward than the neighboring approaches described above. Hard links are applied between entries in different databases and exist wherever there is a logical connection between entries. For instance, if a PubMed entry describes the sequencing of a chromosomal region containing a gene of interest, a hard link is established between the PubMed entry and the corresponding nucleotide entry for that gene. If an open reading frame in that gene codes for a known protein, a hard link is established between the nucleotide entry and the protein entry. If the protein entry has an experimentally deduced structure, a hard link would be placed between the protein entry and the structural entry.
Searches can begin anywhere within the Entrez ecosystem – there are no constraints on the user as to where the foray into this information space must begin. However, depending on which database is used as the jumping-off point, different database fields will be available for searching. This stands to reason, as the entries in different databases are necessarily organized differently, reflecting the biological nature of the entities that each database is trying to catalog.
The Entrez Discovery Pathway
The best way to illustrate the integrated nature of the Entrez system and to drive home the power of neighboring is by considering some biological examples. The simplest way to query Entrez is through the use of individual search terms, coupled together by Boolean operators such as AND, OR, or NOT. Consider the case in which one wants to retrieve all available information on a gene named DCC (deleted in colorectal carcinoma), limiting the returned information to publications where an investigator named Guy A. Rouleau is an author. There is a very simple query interface at the top of the NCBI home page, allowing the user to select which database they want to search from a pull-down menu and a text box where the query terms can be entered. In this case, to search for published papers, PubMed would be selected from the pull-down menu and, within the text box to the right, the user would type DCC
AND
"Rouleau GA"
[AU]
. The [AU]
qualifying the second search term indicates to Entrez that this is an author term, so only the author field in entries should be considered when evaluating this part of the search statement. The result of the query is shown in Figure 2.2. Here, three entries matching the query were found in PubMed. The user can further narrow down the query by adding additional terms if the user is interested in a more specific aspect of this gene or if there are quite simply too many entries returned by the initial query. A list of available field delimiters is given in Table 2.1.
Figure 2.2 Results of a text-based Entrez query against PubMed using Boolean operators and field delimiters. The initial query (DCC AND "Rouleau GA" [AU]
) is shown in the search box near the top of the window, with the three papers identified using this query following below. Each entry gives the title of the manuscript, the names of the authors, and the citation information. The actual record can be retrieved by clicking on the name of the manuscript.
Table 2.1 Entrez Boolean search statements.
General syntax: search term [tag] Boolean operator search term [tag] ... where [tag] =
|
|
[ACCN]
|
Accession |
[AD]
|
Affiliation |
[ALL]
|
All fields |
[AU]
|
Author nameLentz R [AU] yields all of Lentz RA, Lentz RB, etc."Lentz R" [AU] yields only Lentz R
|
[AUID]
|
Unique author identifier, such as an ORCID ID |
[ECNO]
|
Enzyme Commission numbers |
[EDAT]
|
Entrez dateYYYY/MM/DD , YYYY/MM , or YYYY ; insert a colon for date range, e.g. 2016:2018
|
[GENE]
|
Gene name |
[ISS]
|
Issue of journal |
[JOUR]
|
Journal title, official abbreviation, or ISSN numberJournal of Biological Chemistry J Biol Chem 0021-9258
|
[LA]
|
Language |
[MAJR]
|
MeSH major topicOne of the major topics discussed in the article |
[MH]
|
MeSH termsControlled vocabulary of biomedical terms (subject) |
[ORGN]
|
Organism |
[PDAT]
|
Publication dateYYYY/MM/DD , YYYY/MM , or YYYY ; insert a colon for date range, e.g. 2016:2018
|
[PMID]
|
PubMed ID |
[PROT]
|
Protein name (for sequence records) |
[PT]
|
Publication type, includes:Review Clinical Trial Lectures Letter Technical Report
|
[SH]
|
MeSH subheadingUsed to modify MeSH Termsstenosis [MH] AND pharmacology [SH]
|
[SUBS]
|
Substance nameName of chemical discussed in article |
[SI]
|
Secondary source IDNames of secondary source databanks and/or accession numbers of sequences discussed in article |
[TITL]
|
Title wordOnly words in the definition line (not available in Structure database) |
[WORD]
|
Text wordsAll words and numbers in the title and abstract, MeSH terms, subheadings, chemical substance names, personal name as subject, and MEDLINE secondary sources |