Edinburgh Research Archive

Novel network-based approach to integrate biological datasets for patient disease prediction

Item Status

Embargo End Date

Authors

Ryan, Barry

Abstract

Understanding the underlying genetic, molecular and environmental factors of disease is crucial to advance medicine from intervention to prevention. High-throughput omics technologies now enable cellular-scale analyses, yielding novel insights across many conditions. However, progress has been more limited in complex diseases, where heterogeneity in presentation, manifestation and response to treatment obscures underlying mechanisms and hampers the development of effective interventions. To advance our understanding of these conditions, we need methods that accommodate patient heterogeneity and jointly analyse multiple molecular layers, capturing the interplay among cellular components across omics. The integrative analysis of multiple omic measures using Machine Learning is a promising approach to obtain a detailed understanding of the key drivers of disease. In this thesis we present a novel network-based framework for the integration of multi-omic datasets for patient disease prediction. The methodology, named Multi-Omic Graph Diagnosis (MOGDx), combines Patient Similarity Network representations with neural network encodings of omics using a Graph Neural Network (GNN) methodology. As such, this method accounts for patient heterogeneity while learning complex relationships between multiple omics. It builds on limitations observed in competing methodologies. It uses Similarity Network Fusion and imputation-based methods to include patients with partial data. It includes inductive GNNs allowing the framework to generalise to unseen data and new patients. Importantly, the framework incorporates a biologically interpretable encoder-based architecture which provides users with the opportunity to derive molecular insights from the prediction task. This improved performance of MOGDx is demonstrated on three tumour cancer classification tasks from The Cancer Genome Atlas. It was further applied to the Parkinson’s Progression Markers Initiative dataset to explore the genetic, molecular and environmental influences of Parkinson’s disease (PD). PD is a neurodegenerative disease, with a heterogenous disease population. There are known genetic influences on the disease, with approximately 10% of cases attributable to pathogenic variants in a small number of genes. Patients with PD exhibit a wide range of symptoms. As a result, the interplay among genetic, molecular and environmental influences in PD is not yet fully understood. We use MOGDx to perform cross-sectional and longitudinal analyses, showing there is a strong epigenetic modification occurring in individuals with a genetic predisposition to the disease. We further use MOGDx to group patients with similar symptoms and show improved interpretability when making predictions on the full heterogenous PD population. Finally, using a genetic signal to subtype the PD population, we highlight symptomatic and multi-omic differences, characterising each genetic subtype.

This item appears in the following Collection(s)