Genomic signature of trait-associated variants
Kindt, Alida Sophie Dorothea
Genome-‐wide association studies have been used extensively to study hundreds of phenotypes and have determined thousands of associated SNPs whose underlying biology and causation is as yet largely unknown. Many previous studies attempted to clarify the causal biology by investigating overlaps of trait-‐ associated variants with functional annotations, but lacked statistical rigor and examined incomplete subsets of available functional annotations. Additionally, it has been difficult to disentangle the relative contributions of different annotations that may show strong correlations with one another. In this thesis, we address these shortcomings and strengthen and extend the obtained results. Two methods, permutations and logistic regression, are applied in statistically rigorous analyses of genomic annotations and their observed enrichment or depletion of trait-‐associated SNPs. The genomic annotations range from genic regions and regulatory features to measures of conservation and aspects of chromatin structure. Logistic regressions in a number of trait-‐specific subsets identify genomic annotations influencing SNPs associated with both normal variation (e.g., eye or hair colour) and diseases, suggesting some generalities in the biological underpinnings of phenotypes. SNPs associated with phenotypes of the immune system are investigated and the results highlight the distinct aetiology for this subset. Despite the heterogeneity of the studied cancers, SNPs associated to different cancers are particularly enriched for conserved regions, unlike all other trait-‐subsets. Nonetheless, chromatin states are, perhaps surprisingly, among the most influential genomic annotations in all trait-‐ subsets. Evolutionary conserved regions are rarely within the top genomic annotations despite their widespread use in prioritisation methods for follow-‐ up studies. We identify a common set of enriched or depleted genomic annotations that significantly influence all traits, but also highlight trait-‐specific differences. These annotations may be used for the computational prioritisation of variants implicated in phenotypes of interest. The approaches developed for this thesis are further applied to studies of a specific human complex trait (height) and gene expression in atherosclerosis.