Copy Number Variants in the human genome and their association with quantitative traits
Copy number Variants (CNVs), which comprise deletions, insertions and inversions of genomic sequence, are a main form of genetic variation between individual genomes. CNVs are commonly present in the genomes of human and other species. However, they have not been extensively characterized as their ascertainment is challenging. I reviewed current CNV studies and CNV discovery methods, especially the algorithms which infer CNVs from whole genome Single Nucleotide Polymorphism (SNP) arrays and compared the performance of three analytical tools in order to identify the best method of CNV identification. Then I applied this method to identify CNV events in three European population isolates—the island of Vis in Croatia, the islands of Orkney in Scotland and villages in the South Tyrol in Italy - from Illumina genome-wide array data with more than 300,000 SNPs. I analyzed and compared CNV features across these three populations, including CNV frequencies, genome distribution, gene content, segmental duplication overlap and GC content. With the pedigree information for each population, I investigated the inheritance and segregation of CNVs in families. I also looked at association between CNVs and quantitative traits measured in the study samples. CNVs were widely found in study samples and reference genomes. Discrepancies were found between sets of CNVs called by different analytical tools. I detected 4016 CNVs in 1964 individuals, out of a total of 2789 participants from the three population isolates, which clustered into 743 copy number variable regions (CNVRs). Features of these CVNRs, including frequency and distribution, were compared and were shown to differ significantly between the Orcadian, South Tyrolean and Dalmatian population samples. Consistent with the inference that this indicated population-specific CNVR identity and origin, it was also demonstrated that CNV variation within each population can be used to measure genetic relatedness. Finally, I discovered that individuals who had extreme values of some metabolic traits possessed rare CNVs which overlapped with known genes more often than in individuals with moderate trait values.