Using (meta)genomic approaches to improve the accuracy of rumen microbiome analysis

Smith, Rebecca Hannah

Using (meta)genomic approaches to improve the accuracy of rumen microbiome analysis

Files

Smith2023.pdf (10.15 MB)

Date

2023-09-28

Authors

Smith, Rebecca Hannah

Full item page

Abstract

The rumen is home to a rich microbiota that metabolises the lignocellulosic feed ingested by the animal, and produces short-chain fatty acids (SCFA) that can be used by the host animal for growth. The rumen microbiota is critical to agriculture and food security, directly contributing to the production of meat and milk. Culturing has provided some insight into the microbes that live in the rumen, but the majority of the rumen microbiota have yet to be cultured. Culture-independent molecular approaches such as shotgun metagenomics and 16S rRNA gene profiling have therefore become appealing additional methods for studying the rumen microbiome. Such analyses rely on reference information available in databases, which are traditionally populated by cultured microbes. This poses an inherent bias, as taxa that have been cultured will be over-represented in reference databases. Firstly, this thesis examines the impact of reference database choice on the results of taxonomic classification of rumen metagenomic data. To measure classification accuracy to the read-level, ground-truth data was simulated from known rumen isolate genomes (from the Hungate1000 project). In this study it was demonstrated that the choice of reference database hugely impacted classification results, and for the rumen specifically, the accuracy of classification depended on representation of microbes from this environment in the reference database. The use of custom reference databases that contained culture-derived genomes from the rumen increased classification rate and accuracy at all taxonomic levels. When uncultured metagenome-assembled genomes (rumen MAGs) were included in reference databases, there was an improvement in classification rate, but this resulted in only limited improvements in classification accuracy due to incomplete and informal taxonomy labels. Importantly, this work highlights that the use of standard, and widely used, reference databases resulted in the classification of rumen data with poor accuracy and suggests that custom reference databases are needed to substantially improve classification accuracy. To further explore the use of MAGs as representative genomes of uncultured species, Cultured and uncultured rumen genomes were then compared to investigate any differences that may be potential limitations of using MAGs. Bacteria from rumen samples were cultured and isolate genomes sequenced. These cultured genomes were then phylogenetically clustered with rumen MAGs, to create genome pairs consisting of a cultured genome and MAG that were thought to belong to the same microbial strain. The presence of certain functions relevant to the rumen environment were compared for all genome pairs. For all functions relating to nitrogen metabolism, and SCFA and alcohol conversions, the presence or absence of relevant gene pathways was observed to be the same (i.e. they were present or absent for both genomes in the pair) for all seven genome pairs. Carbohydrate active enzymes (CAZys) showed more variation, with only three of the genome pairs having the identical predictions of functions being present or absent. This work also suggested an association between the species and how similar the genome pair (culture-derived genome and MAG) are to one-another. In particular, it would seem that there is less variation between a culture-derived genome and a MAG for species that have a relatively closed pangenome, and more variation for species with a more open pangenome. The microbiome field is moving towards high-throughput methods, accompanied by new approaches that need to be evaluated for their suitability and accuracy. This work concludes that MAGs may be useful as reference genomes for as-yet uncultured microbes, and highlights factors that may make a MAG more or less suitable as an accurate and representative genome. Lastly, this thesis presents a project that evaluated the suitability of published pipelines for the functional classification of metagenomic data in an industry setting. Two pipelines, HUMAnN3 and Carnelian, were chosen based on the needs of the business. Two simulated metagenomic datasets, one generated from microbes that are members of the human gut microbiome and one from members of the rumen microbiome, were annotated with functional information from UniRef90/UniProt90 to create a ground-truth. The datasets were then classified by each pipeline, and the functional classification results were compared with the ground-truth annotations of each dataset to assess accuracy. This work found that the HUMAnN3 pipeline classified function in the most similar way to the ground truth annotations. Building on the work presented in Chapter 2, this work highlights issues in reference database bias when classifying microbial function. This project concluded that the HUMAnN3 pipeline was the most suitable for the business to incorporate into their offered services. Overall, this thesis investigated the accuracy of popular methods to classify the taxonomy and function of metagenomics data. It revealed that the limited number of rumen microbial reference genomes is likely to be a major issue in the rumen microbiome field of research, significantly reducing the accuracy of classification. Furthermore, this work demonstrates that MAGs can resemble culture-derived genomes. Given the pressing need for ruminal reference genomes, the use of MAGs as representative reference genomes for uncultured microbes has the potential to revolutionise the rumen microbiome field. However, robust classification relies on consistent and accurate taxonomic labelling, including that of reference genomes regardless of whether they are culture-derived or metagenome-derived. Continuing improvement in reference databases is required to ensure accurate and valuable insights into the critically important, yet incompletely understood, rumen environment.

URI

https://hdl.handle.net/1842/40971
http://dx.doi.org/10.7488/era/3722

This item appears in the following Collection(s)

Royal (Dick) School of Veterinary Studies thesis and dissertation collection