Edinburgh Research Archive

Theoretical analyses of codon usage patterns in DNA

Item Status

Embargo End Date

Date

Authors

Wright, Francis George

Abstract

The growing body of DNA sequence data has revealed that the majority of genes show unequal usage of the alternative codons for each amino acid. Three theoretical analyses of codon usage patterns were undertaken: an exploration of patterns of codon usage, an investigation of G+C content and codon usage in human genes, and the development of a simple measure of bias in synonymous codon usage. Codon usage data from a wide range of genomes and genes were analysed using correspondence analysis, a multivariate data reduction method, to extract the main features present. Over 40 per cent of the codon usage variation was displayed using two 2D plots to display the first four dimensions produced by the correspondence analysis. The first dimension accounted for about one-fifth of the codon usage variation of the 428 genes studied. This dimension appeared to be very similar to third position G+C content confirming the latter as the most important factor influencing codon usage patterns. Intra-specific codon usage variation in multicellular organisms and inter-specific codon usage variation in unicellular organisms were well explained by G+C variation at synonymous sites. Intra-specific codon usage variation In unicellular organisms was almost independent of G+C content. The intra-specific codon usage patterns of yeast and E. co/I showed considerable variation that confirmed the known bias of highly expressed genes In these two species. Distantly related bacterial species like E. co/1 and B.subtills appeared more similar In codon usage than E. co/1 and yeast. The known DNA base compositional bias of animal mitochondrial genomes accounted for a large proportion of the overall codon usage variation. The study of G+C content and codon usage in 135 human genes revealed that there are correlations between the G+C content of each of the three codon positions thus suggesting that G+C content in nonsynonymous and synonymous sites are correlated. These results are consistent with a model of human codon usage where each gene is subject to mutation pressure to alter G+C content. However, such mutational changes are subject to selective constraints and only conservative amino-acid replacements appear to be tolerated. A simple unbiased estimator of synonymous codon usage bias, N̂ᵖ꜀, has been developed based on the effective number of alleles concept used in population genetics. N̂ᵖ꜀ is a distance measure from equal usage of synonymous codons and has a range from 20 (total bias i. e. only one codon used in each amino-acid) to 61 (no bias). Appropriate statistical methods for the analysis of codon usage patterns are briefly discussed in the final chapter, along with recent work on models based on population genetics theory.

This item appears in the following Collection(s)