Lost pigs and broken genes: the search for causes of embryonic loss in the pig and the assembly of a more contiguous reference genome
Warr, Amanda Susan
The pig is an economically important species, with pork being the most widely consumed meat in the world. Genomic technologies have the potential to improve reproduction, health and efficiency in the pig industry. Additionally, pigs are more similar to humans than species commonly used as medical models and improved genomic resources for the pig may facilitate its use in medical modelling. The cost of DNA sequencing has greatly decreased in recent years, allowing more researchers to incorporate next generation sequencing into their projects. Many bioinformatic tools are designed to accept a reference genome as a truth against which individuals are compared, however most of the available reference genome sequences are low-quality drafts. It is important to understand the limitations of available reference genomes in order to make full use of the sequencing technologies available. Chapter 2 assesses the quality of the published pig draft reference genome sequence, Sscrofa10.2, using short-read sequencing data from the individual from which the genome was assembled. By identifying regions where the reads disagree with the assembly, regions of low-confidence are identified and a filter is produced to reduce the impact of these regions on genomic analyses. Chapter 3 makes use of exome sequencing data to identify variants that are predicted to truncate proteins in 96 pigs, through application of filters, including the filter designed in Chapter 2. This is reduced to a short list of variants that are likely to have an impact on phenotype, specifically variants that may be associated with reproductive phenotypes and embryonic lethality. Additionally, imputation from the 96 pig exomes to a larger set of 446 pigs each genotyped for ~60,000 single nucleotide polymorphisms (SNPs) with the PorcineSNP60 BeadChip (Illumina) is carried out, and variants are investigated for association between two reproductive phenotypes and the imputed exome variants using a genome-wide association study (GWAS) and a number of candidate genes are identified. Chapter 4 uses an alternate method of identifying phenotype altering variants by using whole genome sequencing of a trio of individuals (sire, dam and affected individual) to search for a genetic cause of foetal mummification in pigs. Finally, chapter 5 focuses on improving the available resources for the pig by reassembling the pig genome using the latest long-read sequencing technologies, producing a much improved assembly, Sscrofa11.1. The assembly is one of the most contiguous reference genomes currently available with a contig N50 of 48.2Mb, only 103 gaps remaining (less the Y chromosome) and two closed chromosomes. This project improves the available genomic resources for the pig, and identifies several putative causal variants and candidate genes underlying important traits in a commercial population.