Molecular evolution in two species of Drosophila
Marion de Procé, Sophie
Studying evolution at the level of DNA sequences allows the detection of past and recent natural selection. Natural selection has generally been seen as a force acting on protein- coding nucleotide sequences only. However, a number of studies have recently shown that introns and intergenic sequences can also be subject to natural selection. The main aim of this thesis was to detect natural selection in non-coding sequences using Drosophila species, a widely used population genetics model. I have used different methods to determine if evidence for natural selection could be found in two lesser- known species, Drosophila americana and Drosophila miranda.In Chapter 2, I obtained sequences for a large number of genes in D. mirandafrom BAC sequences, and compared these with sequences from its close relative, D.pseudoobscura. As in previous studies in D. melanogaster, I found a negative relationship between intron length and intron divergence, suggesting that longer introns are under selective constraint. I also found a negative correlation between the rate of non- synonymous substitutions and codon usage bias, suggesting that fast-evolving genes have a lower codon usage bias, consistent with strong positive selection interfering with weak selection for codon usage.Secondly, in Chapter 3, I gathered polymorphism data for a smaller number of genes in D. americana in order to distinguish between positive and negative selection using methods that require polymorphism and divergence. I found that introns are subject to similar evolutionary forces as synonymous sites. I failed to detect a significant relationship between intron length and divergence or polymorphism. Surprisingly, the direction of this relationship seems to be the opposite of that in previous findings, with longer introns being more diverged than smaller introns. First introns show lower polymorphism and divergence than non-first introns, suggesting that they may be more constrained, although the difference is not significant.Using the same D. americana dataset, I then focussed, in Chapter 4, on insertions and deletions to test the hypothesis that insertions are favoured to compensate for thev deletion bias in Drosophila. I used a maximum-likelihood method that takes into account demographic history, in this case a recent population expansion and then calculates the selection coefficients. Although it was not significant, the values suggest positive selection acting on insertions, as expected.In Chapter 5, using the same maximum-likelihood method, I looked at GC to AT polymorphisms in the D. americana intron dataset. It is expected to observe as many GC to AT changes as AT to GC changes and similar mean frequencies if no selection is acting. I find evidence for a preference for GC in introns in my dataset. I also investigated codon usage bias using preferred and unpreferred codons changes and results suggest that there is selection for codon usage bias. Using LDhat on the D. americana dataset, I find that recombination estimates are not significantly different between introns and coding sequences, which is of significance in relation to interpretations of differences in theapparent strength of selection on non-coding and synonymous sitesFinally, in Chapter 6, I looked at a factor that can affect natural selection: gene expression. I used gene expression data from seven Drosophila species to test the hypothesis that genes on the 4th chromosome or Muller element F, which has low crossing-over, have higher gene expression than genes on other chromosomes as previously found. I find that microarray data yields opposite results to the EST data,suggesting that gene expression is actually lower on Muller element F.