|dc.description.abstract||This thesis aims to address issues relating to genome architecture and base composition.
The first part of this thesis addresses questions relating to codon usage.
Initially I will investigate thousands of bacterial species using a detailed analysis of
strengths of selection acting upon codons usage while also investigating patterns
of optimal codon changes with respect to genomic base composition and tRNA
abundance. I report that selection on codon usage increases throughout the length
of highly expressed genes, in particular, the first quarter of genes have significantly
lower selection. Further, it is clear that factors affecting genomic base composition
can eventually lead to changes in optimal codons if the change in base composition
is strong enough, however these patterns differ substantially between amino acids.
The debate over translational efficiency vs. accuracy was addressed by comparing
sites of differing conservation. Differing conservation were defined using a phylogenetic
method, allowing sites to change in their extent of conservation throughout
the tree. The results show that translational accuracy acts strongly on the top 10%
of conserved sites, however is relatively weak when compared to the efficiency for
other sites. Also detected is a reduction in apparent selection on codon usage on the
bottom 10% of conserved sites which is likely to be caused by conflicting positive
selection on amino acids. Finally, although differences in patterns are observed
between amino acids, the general relationship to conservation is similar.
As much of the variation in codon usage is determined by variation in base composition,
this aspect of base composition is investigated in the second part of the
The observed variation in intragenomic base composition in bacteria was found to
be far higher than expected for GC-rich bacteria. The non-core part of the genome
contributes to this variation to a greater extent than the core part, suggesting that
processes such as AT-rich horizontal gene transfer may be involved. Secondly, base
composition is modelled under Brownian motion and as an extension, the Ornstein-
Uhlenbeck process, which allows for multiple optima throughout the tree. The model
including optima fits the data better than standard Brownian motion or Brownian
motion with multiple diffusion coefficients.
Finally, I investigate a case where a previous codon usage analysis has been seriously
confounded by an unusual genome architecture of abnormal regional base composition
in two species of eukaryotic parasites in the genus Theileria. In both species, the
background G+C content is 37% at most, out of the four syntenic chromosomes. In
many orthologous regions however, T.annulata has a decreased G+C content of 28%
while T.parva has an increased G+C content of 41%. Various factors coincide with
this remarkable divergence: increased divergence at all types of site, recombination
hot spots in T.parva, an increased frequency of tandem repeats and DNA sequence
motifs in both species. The evolutionary origins of these unusual patterns will be