Theoretical analyses of codon usage patterns in DNA
Files
Item Status
Embargo End Date
Date
Authors
Wright, Francis George
Abstract
The growing body of DNA sequence data has revealed that the majority
of genes show unequal usage of the alternative codons for each amino acid.
Three theoretical analyses of codon usage patterns were undertaken: an
exploration of patterns of codon usage, an investigation of G+C content and
codon usage in human genes, and the development of a simple measure of
bias in synonymous codon usage.
Codon usage data from a wide range of genomes and genes were
analysed using correspondence analysis, a multivariate data reduction method,
to extract the main features present. Over 40 per cent of the codon usage
variation was displayed using two 2D plots to display the first four dimensions
produced by the correspondence analysis.
The first dimension accounted for about one-fifth of the codon usage
variation of the 428 genes studied. This dimension appeared to be very
similar to third position G+C content confirming the latter as the most
important factor influencing codon usage patterns. Intra-specific codon usage
variation in multicellular organisms and inter-specific codon usage variation in
unicellular organisms were well explained by G+C variation at synonymous
sites. Intra-specific codon usage variation In unicellular organisms was almost
independent of G+C content. The intra-specific codon usage patterns of yeast
and E. co/I showed considerable variation that confirmed the known bias of
highly expressed genes In these two species. Distantly related bacterial
species like E. co/1 and B.subtills appeared more similar In codon usage than
E. co/1 and yeast. The known DNA base compositional bias of animal
mitochondrial genomes accounted for a large proportion of the overall codon
usage variation.
The study of G+C content and codon usage in 135 human genes revealed
that there are correlations between the G+C content of each of the three
codon positions thus suggesting that G+C content in nonsynonymous and
synonymous sites are correlated. These results are consistent with a model of
human codon usage where each gene is subject to mutation pressure to alter
G+C content. However, such mutational changes are subject to selective
constraints and only conservative amino-acid replacements appear to be
tolerated.
A simple unbiased estimator of synonymous codon usage bias, N̂ᵖ꜀,
has been developed based on the effective number of alleles concept used in
population genetics. N̂ᵖ꜀ is a distance measure from equal usage of
synonymous codons and has a range from 20 (total bias i. e. only one codon
used in each amino-acid) to 61 (no bias).
Appropriate statistical methods for the analysis of codon usage patterns
are briefly discussed in the final chapter, along with recent work on models
based on population genetics theory.
This item appears in the following Collection(s)

