Using (meta)genomic approaches to improve the accuracy of rumen microbiome analysis
Files
Item Status
Embargo End Date
Date
Authors
Smith, Rebecca Hannah
Abstract
The rumen is home to a rich microbiota that metabolises the lignocellulosic
feed ingested by the animal, and produces short-chain fatty acids (SCFA)
that can be used by the host animal for growth. The rumen microbiota is
critical to agriculture and food security, directly contributing to the production
of meat and milk.
Culturing has provided some insight into the microbes that live in the rumen,
but the majority of the rumen microbiota have yet to be cultured. Culture-independent molecular approaches such as shotgun metagenomics and 16S
rRNA gene profiling have therefore become appealing additional methods for
studying the rumen microbiome. Such analyses rely on reference information
available in databases, which are traditionally populated by cultured
microbes. This poses an inherent bias, as taxa that have been cultured will
be over-represented in reference databases.
Firstly, this thesis examines the impact of reference database choice on the
results of taxonomic classification of rumen metagenomic data. To measure
classification accuracy to the read-level, ground-truth data was simulated
from known rumen isolate genomes (from the Hungate1000 project). In this
study it was demonstrated that the choice of reference database hugely
impacted classification results, and for the rumen specifically, the accuracy of
classification depended on representation of microbes from this environment
in the reference database. The use of custom reference databases that
contained culture-derived genomes from the rumen increased classification
rate and accuracy at all taxonomic levels. When uncultured metagenome-assembled genomes (rumen MAGs) were included in reference databases,
there was an improvement in classification rate, but this resulted in only
limited improvements in classification accuracy due to incomplete and
informal taxonomy labels. Importantly, this work highlights that the use of
standard, and widely used, reference databases resulted in the classification
of rumen data with poor accuracy and suggests that custom reference
databases are needed to substantially improve classification accuracy.
To further explore the use of MAGs as representative genomes of uncultured
species, Cultured and uncultured rumen genomes were then compared to
investigate any differences that may be potential limitations of using MAGs.
Bacteria from rumen samples were cultured and isolate genomes
sequenced. These cultured genomes were then phylogenetically clustered
with rumen MAGs, to create genome pairs consisting of a cultured genome
and MAG that were thought to belong to the same microbial strain. The
presence of certain functions relevant to the rumen environment were
compared for all genome pairs. For all functions relating to nitrogen
metabolism, and SCFA and alcohol conversions, the presence or absence of
relevant gene pathways was observed to be the same (i.e. they were present
or absent for both genomes in the pair) for all seven genome pairs.
Carbohydrate active enzymes (CAZys) showed more variation, with only
three of the genome pairs having the identical predictions of functions being
present or absent. This work also suggested an association between the
species and how similar the genome pair (culture-derived genome and MAG)
are to one-another. In particular, it would seem that there is less variation
between a culture-derived genome and a MAG for species that have a
relatively closed pangenome, and more variation for species with a more
open pangenome.
The microbiome field is moving towards high-throughput methods,
accompanied by new approaches that need to be evaluated for their
suitability and accuracy. This work concludes that MAGs may be useful as
reference genomes for as-yet uncultured microbes, and highlights factors
that may make a MAG more or less suitable as an accurate and
representative genome.
Lastly, this thesis presents a project that evaluated the suitability of published
pipelines for the functional classification of metagenomic data in an industry
setting. Two pipelines, HUMAnN3 and Carnelian, were chosen based on the
needs of the business. Two simulated metagenomic datasets, one generated
from microbes that are members of the human gut microbiome and one from
members of the rumen microbiome, were annotated with functional
information from UniRef90/UniProt90 to create a ground-truth. The datasets
were then classified by each pipeline, and the functional classification results
were compared with the ground-truth annotations of each dataset to assess
accuracy. This work found that the HUMAnN3 pipeline classified function in
the most similar way to the ground truth annotations. Building on the work
presented in Chapter 2, this work highlights issues in reference database
bias when classifying microbial function. This project concluded that the
HUMAnN3 pipeline was the most suitable for the business to incorporate into
their offered services.
Overall, this thesis investigated the accuracy of popular methods to classify
the taxonomy and function of metagenomics data. It revealed that the limited
number of rumen microbial reference genomes is likely to be a major issue in
the rumen microbiome field of research, significantly reducing the accuracy of
classification. Furthermore, this work demonstrates that MAGs can resemble
culture-derived genomes. Given the pressing need for ruminal reference
genomes, the use of MAGs as representative reference genomes for
uncultured microbes has the potential to revolutionise the rumen microbiome
field. However, robust classification relies on consistent and accurate
taxonomic labelling, including that of reference genomes regardless of
whether they are culture-derived or metagenome-derived. Continuing
improvement in reference databases is required to ensure accurate and
valuable insights into the critically important, yet incompletely understood,
rumen environment.
This item appears in the following Collection(s)

