Evolution of codon usage and base composition
Item statusRestricted Access
Embargo end date31/12/2100
Perry, Richard Henry John
This thesis aims to address issues relating to genome architecture and base composition. The first part of this thesis addresses questions relating to codon usage. Initially I will investigate thousands of bacterial species using a detailed analysis of strengths of selection acting upon codons usage while also investigating patterns of optimal codon changes with respect to genomic base composition and tRNA abundance. I report that selection on codon usage increases throughout the length of highly expressed genes, in particular, the first quarter of genes have significantly lower selection. Further, it is clear that factors affecting genomic base composition can eventually lead to changes in optimal codons if the change in base composition is strong enough, however these patterns differ substantially between amino acids. The debate over translational efficiency vs. accuracy was addressed by comparing sites of differing conservation. Differing conservation were defined using a phylogenetic method, allowing sites to change in their extent of conservation throughout the tree. The results show that translational accuracy acts strongly on the top 10% of conserved sites, however is relatively weak when compared to the efficiency for other sites. Also detected is a reduction in apparent selection on codon usage on the bottom 10% of conserved sites which is likely to be caused by conflicting positive selection on amino acids. Finally, although differences in patterns are observed between amino acids, the general relationship to conservation is similar. As much of the variation in codon usage is determined by variation in base composition, this aspect of base composition is investigated in the second part of the thesis. The observed variation in intragenomic base composition in bacteria was found to be far higher than expected for GC-rich bacteria. The non-core part of the genome contributes to this variation to a greater extent than the core part, suggesting that processes such as AT-rich horizontal gene transfer may be involved. Secondly, base composition is modelled under Brownian motion and as an extension, the Ornstein- Uhlenbeck process, which allows for multiple optima throughout the tree. The model including optima fits the data better than standard Brownian motion or Brownian motion with multiple diffusion coefficients. Finally, I investigate a case where a previous codon usage analysis has been seriously confounded by an unusual genome architecture of abnormal regional base composition in two species of eukaryotic parasites in the genus Theileria. In both species, the background G+C content is 37% at most, out of the four syntenic chromosomes. In many orthologous regions however, T.annulata has a decreased G+C content of 28% while T.parva has an increased G+C content of 41%. Various factors coincide with this remarkable divergence: increased divergence at all types of site, recombination hot spots in T.parva, an increased frequency of tandem repeats and DNA sequence motifs in both species. The evolutionary origins of these unusual patterns will be discussed.