The genetic code only needed to be cracked once because it is universal with some rare exceptions. That means all organisms use the same codons to specify the placement of each of the 20 amino acids in protein formation.
A codon table can therefore be constructed and any coding region of nucleotides read to determine the amino acid sequence of the protein encoded.
A look at the genetic code in the codon table below reveals that the code is redundant meaning many of the amino acids can be coded by four or six possible codons. The amino acid sequence of proteins from all types of organisms is usually determined by sequencing the gene that encodes the protein and then reading the genetic code from the DNA sequence.
These large data sets were then analyzed using data mining tools [ 19 ] and direct database queries. These studies aimed at identifying major differences in triplet-codon context between the fungal ORFeomes stored in the main database Table 1. A similar methodology was used to count amino acid triplets generated from the same ORFeome sequences.
For this, codons were translated to the respective amino acids using standard genetic code rules or using non-standard decoding of the leucine-CTG codons as serine in Candida albicans and Debaryomyces hansenii [ 20 — 22 ]. Finally, new algorithms were implemented to count codon and amino acid repetitions on an ORFeome wide scale. All results obtained were compared with values expected for a random distribution of codons, which were calculated considering the frequencies of random distribution of individual codons or amino acids in the genomes.
The tools described above permitted carrying out a comparative analysis of codon-triplets in 11 fungal ORFeomes Table 1. Clear patterns of codon-triplets preferences and rejections were identified for each ORFeome and, as for codon-pair contexts [ 4 ], such patterns were specific of each ORFeome Tables 2 and 3 and Additional file 1 , Figure S1. This first analysis also showed that the percentage of codon-triplets that vanished from the ORFeomes was much higher than expected from random distribution of the triplets in these ORFeomes Figure 2A.
The human pathogen Candida albicans had higher percentage of such triplets Conversely, analysis of the 10 most frequent codon-triplets Figure 2B showed an even distribution in these fungal ORFeomes with exception of C.
Overall, in C. Major differences in codon-triplet contexts in fungal genomes. In order to characterize codon-triplet distributions in the 11 fungal species studied, we have calculated the percentage of codon-triplets that did not appear in the fungal ORFeomes panel A. Additionally, the fraction corresponding to the 10 most frequent codon-triplets were also quantified panel B.
In both cases, C. Bars represent observed percentages while blue dots indicate values expected from random codon-triplet distribution. Since codon-triplet choice was an ORFeome specific feature that could influence mRNA decoding efficiency see above , the stronger bias found in C.
Consequently, the relative tRNA abundance given by gene copy number per codon or per amino acid was lower in C. Interestingly, C. The divergence of decoding preferences between C. Relative tRNA gene copy number is lower in C. In order to carry out comparisons between ORFeomes, data obtained for individual amino acids was averaged into a single column for each organism. Values are presented as tRNA gene copy number per cognate amino acids.
The relative low number of tRNA genes in C. In order to clarify these important points we have scanned the 5'-upstream sequences of the C.
However, we were unable to identify such putative conserved polIII enhancers data not shown. Therefore, one is left with the intriguing possibility that tRNA limitation and generalized near-cognate decoding may yet be another unique feature of the C. This may explain the strong bias of C.
This puzzling result requires experimental confirmation through in vivo tRNAs quantification to clarify whether tRNA limitation is a feature of the C. Finally, we cannot exclude that biases of codon-triplets arise from protein primary structure constraints. Indeed, our study on a genetic code alteration in C.
However, tri-peptide biases would only be relevant for this study if they were significantly different in C. Rather, the main differences between C. The high frequency of repeated codons and amino acids in fungal ORFeomes and the high percentage of triplets of identical codons and amino acids in C. For this, the distribution of isolated codons, identical codon-pairs, identical codon-triplets and identical codon-strings were determined Figure 5.
Isolated codons were underrepresented in all ORFeomes, in particular in C. However, this effect was minimized in pairs of identical codons, where observed and expected random distribution values were similar Figure 5B. Indeed, the distribution of the latter was remarkably different between C.
High repetition of identical codons and amino acids in C. The percentage of ORFeomes composed of identical codon-triplets was determined.
The percentage of these triplets panel A and of their respective amino acids panel B was much higher in C. Bars represent the observed percentages while blue dots indicate expected values. Low frequency of isolated non-repeated codons in C. Since codon repeats were very frequent in C. This bias was reversed for repetitions of 2 or more identical codons, which again was exacerbated in C.
We then analyzed the amino acid composition of the repeated codon-triplets and again strong biases were observed Figure 6. Of all amino acids, Gln was more frequent and was also rarely present as an isolated amino acid across all ORFeomes blue bars, first column of Figure 6. Once more, C. Specificity of amino acid repeats.
The degeneracy of the genetic code prompted us to determine whether amino acid repeats would provide a better picture of the frequency of repeated features in the fungal ORFeomes.
For this, the repeats were quantified and displayed as shown. In the diagram, and for each species, the first line in each column from the top corresponds to cases in which the amino acid appeared isolated in ORFs. The second line corresponds to isolated pairs of identical amino acids and so on, so that, for each column, higher number of lines correspond to longer amino acid strings.
As expected, amino acid repeats were biased, as indicated by the color scale used in the map, where light blue corresponds to repressed repeats and the brown color indicates preferred repeats. Yellow represents repeats whose observed and expected frequencies were similar. Amino acid repeats were amino acid specific.
Finally, the distribution of the above repetitions was analyzed for synonymous codons of each amino acid Additional file 1 , Figures S3A,B. The high proportion of triplets that vanished from fungal ORFeomes Figure 2A prompted us to investigate whether particular codon-context trends could be identified, which would explain repression of particular codon combinations.
No significant differences could be detected between ORFeomes or codon positions, and the results obtained with codons starting with any base N were redundant data not shown. To overcome this effect and highlight major effects only the data was averaged. A well defined pattern of preferences and rejections linked to the second and third bases of codons in absent codon-triplets Figure 7 became apparent.
This indicated that the first base of the codon, and the position of the codon in the triplet, did not contribute to triplet disappearance. Bias of codon-triplets that vanished from ORFeomes.
The number of possible codon-triplet combinations that were not present in fungal ORFeomes was surprisingly high. In order to elucidate why these triplets disappeared from ORFeomes, the respective codons were further studied, namely by counting the number of times each codon appeared in the first, second or third position of the triplets.
No significant differences were found between species and between codon-triplet positions. Also, the first base of all codons originated redundant results. Conversely, NAA codons were underrepresented in this group yellow bar. As before, CTG reassignment in C. For this, absent triplets that contained CTN codons, i. Each species had its preference pattern. The CTA codon was absent mainly in triplets of A.
Moreover, in D. Such dramatic genetic event imposed negative pressure on CTG usage and eliminated most of these codons. Interestingly, a high number of "old" leucine-CTGs were replaced by "new" serine-CTGs that evolved from mutation of serine rather than leucine codons [ 30 ]. Since serine codons are often present in codon repetitions while leucine codons are strongly repressed Figure 6 , we have taken advantage of this genetic code alteration to shed new light on the evolutionary dynamics of codon amino acid repetitions in yeasts.
Furthermore, since leucine is hydrophobic and serine polar, we hypothesized that constraints imposed by protein structure would be visible as alterations in the context of CTG containing triplets.
This was carried out by determining codon neighbor combinations upstream and downstream that were preferred in leucine- or serine-bearing triplets leucine and serine neighbor signatures and computing the number of times each signature appeared above the expected threshold, when the middle codon of the triplet was CTG Figure 8A. As expected, leucine and serine had clear neighborhood preferences, but this context signature was lost for CTGs in C. Amino acid context signatures detect genetic code alterations.
In order to determine whether genetic code alterations could originate a specific triplet signature, the frequencies of amino acid contexts having leucine or serine in the middle position ex. Whenever this difference was higher than 0. The expected values were calculated for all the contexts and subtracted from the observed values.
These considerations result in a behavior that is illustrated with the plot in the Figure 5. Since we consider that all positions of a protein sequence are equally important, we perform position-unspecific alignment.
As we do not model chemical properties of amino acids, any pair of amino acid substitutions is scored equally. Consequently, we simply assign a score of 7. Point mutations take place in any nucleotide sequence of a digital coding system its mRNA sequence and just the codon parts of tRNA rules. The number of mutations that happen in a new coding system is determined by a Gaussian distribution of which the standard deviation and the mean value are parameters defined by Codonevo parameters.
The value is floored to have an integer value of mutations, and negative values are converted to zero. Particular types of mutations substitutions, deletions, insertions occur with respective probabilities of 0. The location of a point mutation is determined randomly. Mutation rates remain constant throughout the entire simulation.
If a deletion occurs in a codon of a tRNA rule consisting of only a single nucleotide codon, then the tRNA rule is removed entirely. The second type of evolutionary event in Codonevo is gene duplication.
In this case, a tRNA rule can be duplicated with a probability defined by a parameter of the Codonevo program see Codonevo manual. This process is continued iteratively for the entire mRNA. In addition, to make the process more natural, a bonus score is given to mimic stacking interactions between matching nucleotides, with each pair of adjacent matching nucleotides given an additional score 0.
It can be argued that our translation algorithm does not necessarily find the optimal sequence of the best matching tRNA rules across the entire mRNA. However, the step-wise approach in our algorithm reflects the natural course of translation, since during the natural process of translation, each tRNA is chosen based on a local mRNA sequence, independent of a sequence located downstream of the mRNA location currently being translated.
Dynamic histogram of codon and mRNA length evolution for the simulation corresponding to the bottom plot in Figure 1. Other bars indicate a number of tRNA rules with particular codon lengths that are specified underneath each bar.
We are grateful to Dr. John Atkins for stimulating discussions that inspired this work and to Dr. Andrew Firth for indispensable comments and careful reading of the manuscript.
We also grateful to Dr. Wills for his comments and hints to the relevant literature. Analyzed the data: PVB. Browse Subject Areas? Click through the PLOS taxonomy to find articles in your field. Abstract The genetic code appears to be optimized in its robustness to missense errors and frameshift errors. Introduction The ribosome is a sophisticated multifunctional nano-machine that is responsible for protein biosynthesis in all cellular organisms.
Results Brief outline of the model The overall scheme of the model is outlined in Figure 1. Download: PPT. Results of simulations In a number of computer simulations, the initial population of digital coding systems contained a set of tRNA rules with codon sizes of eight octuplet genetic codes. Figure 2.
Dynamics of codon sizes in an evolving population of coding systems starting with octuplet genetic codes. Figure 3. Effect on the speed of codon length evolution of A mRNA length, and B the parameter k the contribution of coding system size to the replication function. Discussion The possibility of the evolution of triplet decoding from codes with codons of larger length is highly attractive, as it can explain how accurate protein biosynthesis could have been organized in the past without assistance from the modern decoding apparatus.
Figure 4. The codon size reduction hypothesis of the triplet genetic code origin. Then, the remaining set is assigned to be the set of coding systems in the population of the following generation: 1 The parameter O represents environmental limitation of energy and food resources in a real world system and is defined as a parameter in the Codonevo program.
Replication function We assume that the reproductive rate of a coding system positively correlates with the accuracy of the protein sequence produced by that coding system. Figure 5. Alignment Since we consider that all positions of a protein sequence are equally important, we perform position-unspecific alignment. Genetic variations Point mutations take place in any nucleotide sequence of a digital coding system its mRNA sequence and just the codon parts of tRNA rules.
Supporting Information. Movie S1. Acknowledgments We are grateful to Dr. References 1. Cell — View Article Google Scholar 2. Annu Rev Biochem — View Article Google Scholar 3. Trends Biochem Sci — View Article Google Scholar 4. Ramakrishnan V What we have learned from ribosome structures.
Biochem Soc Trans — View Article Google Scholar 5. Science — View Article Google Scholar 6. View Article Google Scholar 7. Steitz TA A structural understanding of the dynamic ribosome machine. Nat Rev Mol Cell Biol 9: — View Article Google Scholar 8. Cech TR Structural biology. The ribosome is a ribozyme. View Article Google Scholar 9.
View Article Google Scholar Nat Struct Biol 9: — Di Giulio M An extension of the coevolution theory of the origin of the genetic code. Biol Direct 3: Maeshiro T, Kimura M The role of robustness and changeability on the origin and evolution of genetic codes.
Trends Biochem Sci 44— Wolf YI, Koonin EV On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization. Biol Direct 2: Orig Life Evol Biosph — Mol Biol Evol — Di Giulio M The origin of the genetic code: theories and their relationships, a review.
Biosystems — Wills PR Informed Generation: physical origin and biological evolution of genetic codescript interpreters. J Theor Biol — J Theor Biol 1— J Mol Evol 54— Patel A The triplet genetic code had a doublet predecessor.
0コメント