Laplace transform in population genetics: from theory to efficient algorithms
View/ Open
Date
06/12/2022Author
Bisschop, Gertjan
Metadata
Abstract
Extracting information on the selective and demographic past of populations contained in samples of genome sequences requires a description of the distribution of the underlying genealogies. Using the Laplace transform, this distribution can be generated with a simple recursive procedure, regardless of model complexity. Assuming an infinite-sites mutation model, the probability of observing specific configurations of linked variants within small haplotype blocks can be recovered from the Laplace transform of the joint distribution of branch lengths. However, the repeated differentiation required to compute these probabilities has proven to be a serious computational bottleneck in earlier implementations.
In this thesis, I extend existing work on this theoretical framework in three ways. First, I incorporate a description of the impact of hard sweeps on the genealogies of nearby neutral sites. Secondly, the recursive nature of this approach not only makes the theory easily extendable, but also implies the possibility of graph-based algorithms to query the joint distribution of branch lengths. I devise algorithms that drastically reduce the computational cost of deriving mutation configuration probabilities. This work has been implemented in an open-source Python module, agemo. Finally, the efficient library is used to develop a fully fledged demographic inference tool for fitting models of isolation with migration (IM) to genomic data. Fitting these models to smaller chunks of sequence allows us to also infer both background selection and barriers to gene flow. The software is designed to be modular and user-friendly. It facilitates the entire model fitting workflow, from parsing variants to a simulation-based bootstrap on the model estimates.