Show simple item record

dc.contributor.advisorHaley, Christopheren
dc.contributor.advisorKnott, Saraen
dc.contributor.authorHemani, Gibranen
dc.date.accessioned2012-10-04T13:11:08Z
dc.date.available2012-10-04T13:11:08Z
dc.date.issued2012-06-30
dc.identifier.urihttp://hdl.handle.net/1842/6472
dc.description.abstractOf central importance in the dissection of the components that govern complex traits is understanding the architecture of natural genetic variation. Genetic interaction, or epistasis, constitutes one aspect of this, but epistatic analysis has been largely avoided in genome wide association studies because of statistical and computational difficulties. This thesis explores both issues in the context of two-locus interactions. Initially, through simulation and deterministic calculations it was demonstrated that not only can epistasis maintain deleterious mutations at intermediate frequencies when under selection, but that it may also have a role in the maintenance of additive variance. Based on the epistatic patterns that are evolutionarily persistent, and the frequencies at which they are maintained, it was shown that exhaustive two dimensional search strategies are the most powerful approaches for uncovering both additive variance and the other genetic variance components that are co-precipitated. However, while these simulations demonstrate encouraging statistical benefits, two dimensional searches are often computationally prohibitive, particularly with the marker densities and sample sizes that are typical of genome wide association studies. To address this issue different software implementations were developed to parallelise the two dimensional triangular search grid across various types of high performance computing hardware. Of these, particularly effective was using the massively-multi-core architecture of consumer level graphics cards. While the performance will continue to improve as hardware improves, at the time of testing the speed was 2-3 orders of magnitude faster than CPU based software solutions that are in current use. Not only does this software enable epistatic scans to be performed routinely at minimal cost, but it is now feasible to empirically explore the false discovery rates introduced by the high dimensionality of multiple testing. Through permutation analysis it was shown that the significance threshold for epistatic searches is a function of both marker density and population sample size, and that because of the correlation structure that exists between tests the threshold estimates currently used are overly stringent. Although the relaxed threshold estimates constitute an improvement in the power of two dimensional searches, detection is still most likely limited to relatively large genetic effects. Through direct calculation it was shown that, in contrast to the additive case where the decay of estimated genetic variance was proportional to falling linkage disequilibrium between causal variants and observed markers, for epistasis this decay was exponential. One way to rescue poorly captured causal variants is to parameterise association tests using haplotypes rather than single markers. A novel statistical method that uses a regularised parameter selection procedure on two locus haplotypes was developed, and through extensive simulations it can be shown that it delivers a substantial gain in power over single marker based tests. Ultimately, this thesis seeks to demonstrate that many of the obstacles in epistatic analysis can be ameliorated, and with the current abundance of genomic data gathered by the scientific community direct search may be a viable method to qualify the importance of epistasis.en
dc.contributor.sponsorBiotechnology and Biological Sciences Research Council (BBSRC)en
dc.language.isoen
dc.publisherThe University of Edinburghen
dc.relation.hasversionHemani G, Theocharidis A, Wei W, Haley CS. EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards. Bioinformatics (2011) 27 (11): 1462-1465.en
dc.relation.hasversionHadjipavlou G, Hemani G, Leach R, Louro B, Nadaf J, Rowe S, de Koning DJ. Extensive QTL and association analyses of the QTLMAS 2009 Data. BMC Proceedings (2010) 4:S1 11.en
dc.subjectepistasisen
dc.subjectgenome-wide association studyen
dc.subjectevolutionen
dc.subjecthigh performance computingen
dc.subjectGPGPU programmingen
dc.subjectmachine learningen
dc.subjectmultiple testingen
dc.titleDissecting genetic interactions in complex traitsen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record