Experimental reproducibility in high-throughput multi-omic analysis systems
Item statusRestricted Access
Embargo end date06/07/2020
Godwin, Duncan Richard Iain
The reproducibility of scientific studies is an important issue facing modern biology. A large number of studies published today cannot be reproduced, and the situation has been described as a reproducibility crisis. It has been shown that the inclusion of computational analysis within a study, adds a further level of complexity in reproducing the findings in that study. Even the reproduction of only the computational component of a study is fraught with difficulty. When provided with the source data, a list of the tools used and a protocol, it can still be difficult to produce the same results. One reason for this is that variation between different tools, versions, configurations, dependencies, operating systems and hardware, all contribute towards variation in the results. The work presented here addresses the problem of reproducibility through the design and implementation of a novel reproducible analysis system, Cumulus. The Cumulus system combines technologies such as virtualisation and high-throughput workflow systems, to automate the process of fully recording an analysis environment. Recording of an analysis environment allows it to be shared and reliably reproduced by other researchers. Automating this process enables reproduction of bioinformatic analysis by high-throughput analysis systems. The thesis then goes on to show how the Cumulus system was applied to reproduce and amend a published RNA-seq analysis and to create a novel proteomic analysis pipeline. This proteomic pipeline was then used in the analysis of a pilot study, to identify binding partners of the Nanog protein, dependant on a part of the protein previously shown to be required for the maintenance of pluripotency. This analysis resulted in the identification of a novel Nanog interactome. In addition to this, a further set of tools are presented, including the Stembio Visualisation Framework, a framework which enables the construction of interactive visualisations using the Cumulus system. The initial application of this framework has been accepted as part of a publication in the Journal of Experimental Medicine.