Codon usage bias in Archaea
Emery, Laura R.
Synonymous codon usage bias has been extensively studied in Bacteria and Eukaryotes and yet there has been little investigation in the third domain of life, the Archaea. In this thesis I therefore examine the coding sequences of nearly 70 species of Archaea to explore patterns of codon bias. Heterogeneity in codon usage among genes was initially explored for a single species, Methanococcus maripaludis, where patterns were explained by a single major trend associated with expression level and attributed to natural selection. Unlike the bacterium Escherichia coli, selection was largely restricted to two-fold degenerate sites. Analyses of patterns of codon usage bias within genomes were extended to the other species of Archaea, where variation was more commonly explained by heterogeneity in G+C content and asymmetric base composition. By comparison with bacterial genomes, far fewer trends were found to be associated with expression level, implying a reduced prevalence of translational selection among Archaea. The strength of selected codon usage bias (S) was estimated for 67 species of Archaea, and revealed that natural selection has had less impact in shaping patterns of codon usage across Archaea than across many species of Bacteria. Variation in S was explained by the combined effects of growth rate and optimal growth temperature, with species growing at high temperatures exhibiting weaker than expected selection given growth rate. Such a relationship is expected if temperature kinetically modulates growth rate via its impact upon translation elongation, since rapid elongation rates at high temperatures reduce the selective benefit of optimal codon usage for the efficiency of translation. Consistent with this, growth temperature is negatively correlated with minimal generation time, and numbers of rRNA operons and tRNA genes are reduced at high growth temperatures. The large fraction of thermophilic Archaea relative to Bacteria account for the lower values of S observed. Two major trends were found to describe variation in codon usage among archaeal genomes; the first was attributed to GC3s and the second was associated with arginine codon usage and was linked both with growth temperature and the genome-wide excess of G over C content. The latter is unlikely to reflect thermophilic adaptation since the codon primarily underlying the trend appears to be selectively disfavoured. No correlations were observed with genome wide GC3s and optimal growth temperature and neither was GC3s associated with aerobiosis. The identities of optimal codons were explored and found to be invariant across U and C-ending two-fold degenerate amino acid groups. The identity of optimal codons and anticodons across four and six-fold degenerate amino acid groups was found to vary with mutational bias. As was first observed in M. maripaludis, selected codon usage bias was consistently greater across two-fold relative to four-fold degenerate amino acid groups across Archaea. This broad pattern could reflect ancestral patterns of optimal codon divergence, prevalent among four-fold but not two-fold degenerate amino acid groups. Consistent with this, the strength of selected codon usage bias was found to be reduced following the divergence of optimal codons, and implies that optimal codon divergence typically proceeds following the relaxation of selection. Finally, a method was developed to partition the strength of selection (S) into separate components reflecting selection for translational efficiency (Seff) and selection for translational accuracy (Sacc) by comparing the codon usage across conserved and nonconserved amino acid residues. While estimates of Sacc are somewhat sensitive to the designation of conserved sites, a general pattern emerged whereby accuracy-selected codon usage bias was consistently strongest across a subset of the most highly conserved sites. Several estimates of Sacc were consistently higher than the 95% range of null values regardless of the dataset, providing evidence for accuracy-selected codon usage bias in these species.