Machine learning-based approaches for functional variant classification across mammals
dc.contributor.advisor
Prendergast, James
dc.contributor.advisor
Hassan, Musa
dc.contributor.advisor
Chue Hong, Neil
dc.contributor.advisor
Talenti, Andrea
dc.contributor.author
Zhao, Rongrong
dc.date.accessioned
2024-08-01T12:03:21Z
dc.date.available
2024-08-01T12:03:21Z
dc.date.issued
2024-08-01
dc.description.abstract
As a result of the continued growth of the world’s population, the demand for livestock products continues to grow. However, increasing livestock production results in more greenhouse gas emissions, and pressures on scarce resources such as potable water and land. Therefore, it is of vital importance to improve the productivity of livestock, including through advanced genomics breeding approaches and genome editing so that more can be produced without increasing animal numbers. The pivotal challenge of using advanced genomics breeding approaches is to identify the causal functional variants associated with the productivity traits of interest in livestock species. As in humans, genome-wide association studies (GWAS) have identified numerous genomic regions associated with diseases and traits in livestock, but it is difficult to determine the causal variants in these regions due to a range of factors such as linkage disequilibrium (LD). The overarching aim of my PhD was to utilize data-driven computational methods, such as machine learning, to improve the initial detection of novel functional variants in livestock species to ultimately enable the improvement of livestock breeding. This research focused on developing a reusable variant annotation pipeline for mammalian species with a broad range of features and demonstrating the utility of these features and machine learning approaches in predicting mammalian functional regulatory variants in both human and cattle.
Datasets suitable for machine learning are largely lacking in livestock. To address this and facilitate a diverse range of downstream projects I first developed a reusable variant annotation pipeline in Nextflow for use across platforms and species. The pipeline provides a wide range of annotations including sequence conservation, gene annotations, sequence context, and predicted functional genomic data from other machine learning tools such as Enformer, that can then be used in downstream variant analyses and employed as features in machine learning approaches for variant classification across species.
I first applied this pipeline to develop machine learning models for predicting where functional human variants have direct orthologues in livestock species, that may therefore be relevant to understanding livestock phenotypes. I demonstrate that it is possible to assign probabilities to whether a human variant will be found in other species from its annotations. Hundreds of human regulatory variants were identified with conserved functional impacts on gene expression in livestock species. This observation suggests it is possible to leverage information from well-annotated species, such as humans, to help with the prediction of regulatory variants and other functional variants in less well-annotated livestock species.
To explore the efficacy of using the annotation pipeline with machine learning approaches to predict functional variants, I applied them to directly predicting regulatory variants across humans and cattle. I compared the performance of various approaches of predicting cattle regulatory variants, including with or without incorporating annotations from humans. I highlight that the models incorporating human annotations and those based on cattle annotations demonstrated comparable performance, with the model relying on cattle annotations exhibiting a slightly superior performance.
Overall, the variant annotation pipeline and the machine learning models proposed in this thesis can be utilized to uncover the underlying characteristics of functional variants and prioritise functional variants related to important traits in livestock species for downstream genome editing or marker assisted breeding.
en
dc.identifier.uri
https://hdl.handle.net/1842/42047
dc.identifier.uri
http://dx.doi.org/10.7488/era/4769
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.rights.license
CC BY-NC-ND 4.0 ATTRIBUTION-NONCOMMERCIAL-NODERIVATIVES 4.0 INTERNATIONAL Deed
en
dc.rights.uri
https://creativecommons.org/licenses/by-nc-nd/4.0/
en
dc.subject
functional variant classification across mammals
en
dc.subject
genome-wide association studies (GWAS)
en
dc.subject
livestock
en
dc.subject
linkage disequilibrium
en
dc.subject
machine learning
en
dc.subject
cattle regulatory variants
en
dc.title
Machine learning-based approaches for functional variant classification across mammals
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- ZhaoR_2024.pdf
- Size:
- 36.25 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

