Improved Bayesian methods for detecting recombination and rate heterogeneity in DNA sequence alignments
dc.contributor.advisor
Husmeier, Dirk
en
dc.contributor.author
Mantzaris, Alexander Vassilios
en
dc.date.accessioned
2012-01-18T10:39:58Z
dc.date.available
2012-01-18T10:39:58Z
dc.date.issued
2011-11-24
dc.description.abstract
DNA sequence alignments are usually not homogeneous. Mosaic structures
may result as a consequence of recombination or rate heterogeneity. Interspecific
recombination, in which DNA subsequences are transferred between different
(typically viral or bacterial) strains may result in a change of the topology of
the underlying phylogenetic tree. Rate heterogeneity corresponds to a change of
the nucleotide substitution rate. Various methods for simultaneously detecting
recombination and rate heterogeneity in DNA sequence alignments have recently
been proposed, based on complex probabilistic models that combine phylogenetic
trees with factorial hidden Markov models or multiple changepoint processes. The
objective of my thesis is to identify potential shortcomings of these models and
explore ways of how to improve them.
One shortcoming that I have identified is related to an approximation made in
various recently proposed Bayesian models. The Bayesian paradigm requires the
solution of an integral over the space of parameters. To render this integration
analytically tractable, these models assume that the vectors of branch lengths
of the phylogenetic tree are independent among sites. While this approximation
reduces the computational complexity considerably, I show that it leads to the
systematic prediction of spurious topology changes in the Felsenstein zone, that
is, the area in the branch lengths configuration space where maximum parsimony
consistently infers the wrong topology due to long-branch attraction. I demonstrate
these failures by using two Bayesian hypothesis tests, based on an inter- and
an intra-model approach to estimating the marginal likelihood. I then propose a
revised model that addresses these shortcomings, and demonstrate its improved
performance on a set of synthetic DNA sequence alignments systematically generated
around the Felsenstein zone.
The core model explored in my thesis is a phylogenetic factorial hidden Markov
model (FHMM) for detecting two types of mosaic structures in DNA sequence
alignments, related to recombination and rate heterogeneity. The focus of my
work is on improving the modelling of the latter aspect. Earlier research efforts by
other authors have modelled different degrees of rate heterogeneity with separate
hidden states of the FHMM. Their work fails to appreciate the intrinsic difference
between two types of rate heterogeneity: long-range regional effects, which are
potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the
genetic code.
I have improved these earlier phylogenetic FHMMs in two respects. Firstly,
by sampling the rate vector from the posterior distribution with RJMCMC I
have made the modelling of regional rate heterogeneity more flexible, and I infer
the number of different degrees of divergence directly from the DNA sequence
alignment, thereby dispensing with the need to arbitrarily select this quantity
in advance. Secondly, I explicitly model within-codon rate heterogeneity via a
separate rate modification vector. In this way, the within-codon effect of rate
heterogeneity is imposed on the model a priori, which facilitates the learning of
the biologically more interesting effect of regional rate heterogeneity a posteriori.
I have carried out simulations on synthetic DNA sequence alignments, which have
borne out my conjecture. The existing model, which does not explicitly include
the within-codon rate variation, has to model both effects with the same modelling
mechanism. As expected, it was found to fail to disentangle these two effects. On
the contrary, I have found that my new model clearly separates within-codon rate
variation from regional rate heterogeneity, resulting in more accurate predictions.
en
dc.identifier.uri
http://hdl.handle.net/1842/5735
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Husmeier, D. and Mantzaris, A. V. (2008) Addressing the shortcomings of three recent bayesian methods for detecting interspecific recombination in dna sequence alignments. Statistical Applications to Genetics and Molecular Biology (SAGMB), 7, 166–172.
en
dc.relation.hasversion
Mantzaris, A. V. and Husmeier, D. (2009) Distinguishing regional from withincodon rate heterogeneity in dna sequence alignments. In Pattern Recognition in Bioinformatics.
en
dc.subject
rate heterogeneity
en
dc.subject
recombination
en
dc.subject
DNA sequence alignments
en
dc.subject
Bayesian models
en
dc.subject
phylogenetic factorial hidden Markov model
en
dc.subject
FHMM
en
dc.subject
within-codon rate
en
dc.title
Improved Bayesian methods for detecting recombination and rate heterogeneity in DNA sequence alignments
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Mantzaris2011.pdf
- Size:
- 2.73 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

