Visualising the bacterial pangenome: an analysis of the genetic content of staphylococcus aureus
Item statusRestricted Access
Embargo end date13/03/2024
Harling-Lee, Joshua D.
Bacterial genomes can be highly variable, with genetic material being frequently gained, lost, altered, and exchanged. Modern sequencing capabilities have enabled the sequencing of tens of thousands of genomes, and large-scale computational analysis can now be leveraged to explore how genetic variation is linked to virulence, niche adaptation, antibiotic resistance, and host transition events. Understanding the dynamics that underpin the movement of genetic material; the factors that influence gene gain and loss; and the associations between accessory genes and ecology is of importance to monitor pathogen emergence, develop new therapeutic strategies, and to steward antibiotic usage for the benefit of both humans and animals. This study began by exploring how the latest graph-based analysis and visualisation methods could be utilised in the analysis of the increasingly large and complex datasets available for key pathogen species. Using exemplar datasets from Staphylococcus aureus and Legionella pneumophila, we demonstrated the construction and analysis of genome- genome and gene-gene similarity networks, and gene synteny networks. This approach yields interactive and visually informative graphical representations of large-scale genomic data, allowing rapid insight and exploration of the bacterial pangenome, particularly when combined with information from existing analysis tools. A large-scale analysis was then conducted to investigate the pangenome diversity of S. aureus, a major human and animal pathogen. We first established a dataset of over 50,000 S. aureus genome sequences, then downsampled the best-represented 20 lineages. Through comparison of fundamental genome characteristics, we identified significant pseudogene accumulation in some ecologically restricted clones. We tested the extent to which lineages shared accessory genes, finding that certain groups are more likely to exchange genes, and some genes are independent of any lineage restriction. Consistent with this, we also identify unique combinations of known restriction-modification systems in each lineage. These data provide broad insight into the population-level exchange of genetic material between major S. aureus lineages. Finally, we applied our graph-based methods alongside standard large-scale analysis techniques to study the pangenome of S. aureus in the context of bovine mastitis, a major burden on the global dairy industry. Using a globally distributed and population-wide dataset of 4,841 genomes, we identified sets of accessory genes associated with the bovine host. Many of these genes were carried in different lineages, with limited sharing of such genes between key lineages. There was also limited evidence for any geographical impact on the accessory genome. Analysis of genome structure revealed that many bovine- associated genes are enriched in specific genomic locations, and further analysis of these regions revealed additional host-related gene combinations. Together, these data present evidence for multiple routes to bovine colonisation, and further elucidate the role of the accessory genome in host transition and colonisation. Overall, the studies presented in this thesis utilise the latest analysis and visualisation techniques to better understand the pangenome dynamics of S. aureus through the study of large-scale genome sequence datasets. In particular, we have investigated how the differing pangenome characteristics of distinct S. aureus lineages may relate to their diverse ecologies, especially in regard to adaptation to the bovine host. Our findings will be of relevance to monitoring the outbreaks of pathogenic lineages, the rise of antimicrobial resistance, and in determining the risk and management of emergent novel clones.