Compilation of tRNA sequences and sequences of tRNA
genes
September 2007 edition
Mathias Sprinzl*1,
Konstantin S. Vassilenko2
1 Laboratorium für
Biochemie, Universität Bayreuth, 95440 Bayreuth,
Germany and 2 Institute of Protein Research, Russian Academy of
Sciences, 142290 Puschchino, Moscow Region, Russia
*To
whom correspondence should be addressed
Tel.:
+49 921 552420
Fax:
+49 921 552432
email:
mathias.sprinzl@uni-bayreuth.de;
INTRODUCTION
The
new compilation of tRNA Sequences and Sequences of tRNA genes contains in addition to 3279 sequences of the
last edition from 1998 (1) the completely new Genomic tRNA
Compilation including the sequences of tRNA genes
from complete genomes published up to september 2004.
The current Database consists of three parts:
1.
Genomic tRNA Compilation (MS Excel® file, ZIPed)
2.
Compilation of tRNA Sequences (MS
Excel® file, ZIPed)
3.
Compilation of tRNA Genes (MS
Excel® file, ZIPed)
Genomic
tRNA Compilation,
is the compilation of the sequences of cytoplasmic tRNA genes derived
from sequences of complete genomes included into DNA databases. Since sequences
of tRNA genes originating from cellular organelles
(e.g. mitochondria of mammalian cells) frequently can not be processed to the
general cloverleaf scheme, they were not included in the Genomic tRNA Compilation. There are specialised databases dealing
with these sequences (see links below).
Current
Genomic tRNA Compilation consists of about 7600 tRNA gene sequences from 131 organisms covering archaea, bacteria, higher and lower eukarya
(this Compilation was last time updated in 2004). The database includes the tRNA genes sequences collected in GtRDB
(2) as well as those from the additional complete genomes found in DNA
databases. tRNA genes were
identified by sequencing teams using common tRNA
search programs [a.g. tRNAScan
(2)]. If the genomes of the different strains of the same organism were
sequenced, the corresponding tRNA genes were added to
the database independently.
Compilation
of tRNA Sequences,
is a summary of tRNA
sequences, including modified bases and references of the corresponding
publications. The references are restricted to the first publication of the
complete sequence unless additional information (e.g. base modification,
corrections, etc.) was later obtained. In such cases additional references were
added. This compilation is updated up to September 2007. The table contains the
known tRNA sequences of all organisms including organells. This is the continuation of the original tRNA compilation first published in 1978.
Compilation
of tRNA Genes,
is a summary of the published sequences of tRNA genes, which were sequenced individually, not as a
part of the whole genome. It contains tRNA gene
sequences of all organisms and organels. This table
contains about 350 sequences of cytoplasmic tRNA genes that are not included in the Genomic tRNA Database. Most of the tRNA
gene entries in this table have references of the publications in which the
sequence was communicated.
PRESENTATION
OF SEQUENCES
Sequences
are presented as MS Excel® workbooks. All the information collected
is split into different indexed tables according to the type of data
(specificity, sequence, organism, etc.) and the descriptions of certain genes
are summarised in the main worksheet that includes the relations between the
data tables. The information can be obtained by filling the query form that
allows to enter the simple search criteria and to select the type of data to be
displayed. The result of search is presented as a table containing the
description of the genes found. This includes unique id, amino acid
specificity, anticodon sequence, organism name,
literature reference, sequence, basepairing and
additional comments. The Genomic tRNA Compilation
contains additional information about taxonomy, strain, original database
source and position of the gene in genome.
An
alignment of sequences is used, which is most compatible with the tRNA phylogeny and known three-dimensional structures of tRNA (3, 4). The corresponding numbering system is shown in
Figure 1. Positions in particular sequence
which are not filled (gaps in the generalised structure) are indicated by a
dash. All nucleotide insertions are commented and denoted by underlining at the
place of insertion.
This
compilations use a one-letter code for all nucleotides including modified ones.
For standard nucleotides, adenosine, cytidine, guanosine, thymidine and uridine the usual abbreviations, A, C, G, T and U,
respectively, are used. To designate modified nucleotides, the other ASCII
signs are employed (see sheet "Help" in the corresponding MS Excel®
file). Terminology and structure of the modified nucleosides occurring in tRNAs were used according to (5)
and (6).
Each
sequence in the Compilation of tRNA Sequences and
Compilation of tRNA Genes has unique six-position identification
code of the sequence ('D' or 'R' for DNA or RNA, respectively; a one-letter
code for the amino acid, X for methionine-initiator,
Z for selenocysteine; the three-digit code specifying
the organism and one digit for isoacceptor number).
Nucleotides involved in Watson-Crick pairs are marked with '=', the GU pairs
are indicated with the sign '*', tertiary interactions are not annotated.
In
addition to the plain text table one can explore the result of search by
presenting the sequences in a cloverleaf form (Figure 1). It is possible to scroll the found sequences one
by one or to select directly the sequence of interest from the result table.
The presentation supports colour code for different structural features in the
canonical cloverleaf model.
Simple
statistical information on the occurrences of certain bases at given positions
and the preferences in basepairing also can be
obtained on a special data sheet.
Useful links:
The RNA Modification Database
http://medlib.med.utah.edu/RNAmods
A database for plant mitochondrial tRNA genes and molecules
http://www.ba.itb.cnr.it/PLMItRNA/
Compilation of mammaliam
mitochondrial tRNA genes
http://mamit-trna.u-strasbg.fr
GtRDB: The
Genomic tRNA Database
http://gtrnadb.ucsc.edu/
ACKNOWLEDGEMENT
This project was supported by Fonds der Chemischen
Industrie and Universität
Bayreuth. We are gratefull for advise,
cooperation and help with data collection to Todd Michael Johnson Lowe, Genetics,
REFERENCES
1.
Sprinzl M., Horn C., Brown
M., Ioudovitch A. and Steinberg S. (1998) Nucl. Acids Res. 26, 148-153.
2. Rainaldi, G., Volpicella, M., Licciulli, F., Liuni, S., Gallerani, R. and Ceci, R. (2003)
Nucl. Acids Res. 31, 436-438.
3. Helm, M.,
Brule, H., Friede, D., Giege,
R., Putz, D. and Florentz,
C. (2000) RNA. 6, 1356-1379.
4.
Lowe, T.M. & Eddy, S.R. (1997) Nucl.
Acids Res. 25, 955-964.
5.
P.R. Schimmmel,
D. Söll, J.N. Abelson Eds. (1979)
Transfer-RNA: Structure, properties and recognition, Cold Spring Harbor Laboratory, p.518-519.
6.
Steinberg S.V. and Kisselev
L.L. (1992) Biochimie 74, 337-351.
7.
Limbach P.A., Crain P.F.
and McCloskey, J.A. (1994) Nucl. Acids Res. 22, 2183-2196.
8. Crain P.F. and
McCloskey J.A. (1997) Nucl. Acids
Res. 25, 126-127.
tRNA database searching engine
Internet service that allows to find records in the database according to
multiple search criteria. Complicated sequence-based queries can be formed
(Updated for the data in Compilation of tRNA Genes
and Compilation of tRNA Sequences up to the end of
1998).
tRNA-Editor
Researchers who wish to perform an advanced search for tRNA
sequences according to several criteria, e.g. anticodon,
amino acid specificity, modified nucleoside, or wish to print the requested
sequences in the cloverleaf form can download appropriate
Windows 3.1 based software as a 900kB ZIPed file (Updated
for the data in Compilation of tRNA Genes and
Compilation of tRNA Sequences up to the end of 1998).