Bioinformatics. Группа авторов

Читать онлайн книгу.

Bioinformatics - Группа авторов


Скачать книгу
nodes in a taxonomic tree, with the most general grouping (Eukaryota) given first.

       OS Drosophila melanogaster (fruit fly) OC Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; OC Neoptera; Holometabola; Diptera; Brachycera; Muscomorpha; Ephydroidea; OC Drosophilidae; Drosophila; Sophophora.

      Each record must have at least one reference or citation, noted within what are called reference blocks. These reference blocks offer scientific credit and set a context explaining why this particular sequence was determined. The reference blocks take the following form.

       RN [1] RP 1-2881 RX DOI; .1074/jbc.271.27.16393. RX PUBMED; 8663200. RA Lavoie C.A., Lachance P.E., Sonenberg N., Lasko P.; RT "Alternatively spliced transcripts from the Drosophila eIF4E gene produce RT two different Cap-binding proteins"; RL J Biol Chem 271(27):16393-16398(1996). XX RN [2] RP 1-2881 RA Lasko P.F.; RT ; RL Submitted (09-APR-1996) to the INSDC. RL Paul F. Lasko, Biology, McGill University, 1205 Avenue Docteur Penfield, RL Montreal, QC H3A 1B1, Canada

      Some headers may contain COMMENT (DDBJ/GenBank) or CC (ENA) lines. These lines can include a great variety of notes and comments (descriptors) that refer to the entire record. Often, genome centers will use these lines to provide contact information and to confer acknowledgments. Comments also may include the history of the sequence. If the sequence of a particular record is updated, the comment will contain a pointer to the previous versions of the record. Alternatively, if an earlier version of the record is retrieved, the comment will point forward to the newer version, as well as backwards, if there was a still earlier version. Finally, there are database cross-reference lines (marked DR) that provide links to allied databases containing information related to the sequence of interest. Here, a cross-reference to FlyBase can be seen in the complete header for this record in Appendix 1.1. Note that the corresponding DDBJ/GenBank header in Appendix 1.2 does not contain these cross-references.

      The Feature Table

      Early on in the collaboration between INSDC partner organizations, an effort was made to come up with a common way to represent the biological information found within a given database record. This common representation is called the feature table, consisting of feature keys (a single word or abbreviation indicating the described biological property), location information denoting where the feature is located within the sequence, and additional qualifiers providing additional descriptive information about the feature. The online INSDC feature table documentation is extensive and describes in great detail what features are allowed and what qualifiers can be used with each individual feature. Wording within the feature table uses common biological research terminology wherever possible and is consistent between DDBJ, ENA, and GenBank entries.

      Here, we will dissect the feature table for the eukaryotic transcription factor 4E gene from Drosophila melanogaster, shown in its entirety in both Appendices 1.3 (in ENA format) and 1.4 (in DDBJ/GenBank format). This particular sequence is alternatively spliced, producing two distinct gene products, 4E-I and 4E-II. The first block of information in the feature table is always the source feature, indicating the biological source of the sequence and additional information relating to the entire sequence. This feature must be present in all INSDC entries, as all DNA or RNA sequences derive from some specific biological source, including synthetic DNA.

       FT source 1..2881 FT /organism="Drosophila melanogaster" FT /chromosome="3" FT /map="67A8-B2" FT /mol_type="genomic DNA" FT /db_xref="taxon:7227" FT gene 80..2881 FT /gene="eIF4E"

       FT mRNA join(80..224,892..1458,1550..1920,1986..2085,2317..2404, FT 2466..2881) FT /gene="eIF4E" FT /product="eukaryotic initiation factor 4E-I" FT mRNA join(80..224,1550..1920,1986..2085,2317..2404,2466..2881) FT /gene="eIF4E" FT /product="eukaryotic initiation factor 4E-II"

345 Single position within the sequence
345..500 A continuous range of positions bounded by and including the indicated positions
<345..500 A continuous range of positions, where the exact lower boundary is not known; the feature begins somewhere prior to position 345 but ends at position 500
345..>500 A continuous range of positions, where the exact upper boundary is not known; the feature begins at position 345 but ends somewhere after position 500
<1..888 The feature starts before the first sequenced base and continues to position 888
(102.110) Indicates that the exact location is unknown, but that it is one of the positions between 102 and 110, inclusive
123^124 Points to a site between positions 123 and 124
123^177 Points to a site between two adjacent nucleotides or amino acids anywhere between positions 123 and 177
join(12..78,134..202) Regions 12–78 and 134–202 are joined to form one contiguous sequence
complement(4918..5126) The sequence complementary to that found from 4918 to 5126 in the sequence record
J00194:100..202
Скачать книгу