Is mutational meltdown a threat to the mega diverse genus Begonia?
View/ Open
Date
19/09/2023Author
Michel, Thibauld
Metadata
Abstract
Begonia is one of the most species-rich angiosperm genera, studied for its rapid
species radiation in tropical regions, and high morphological diversity. Typical
populations are isolated and many display characteristics of narrow endemism.
Endemic populations are prone to inbreeding and vulnerable to anthropogenic
disturbance, while being isolated and difficult to access for population size
estimation. For these rare species, herbarium specimens are the most accessible
material available, even though the number of specimens collected for a single
population is few.
We have developed a pipeline to use genomic data recovered from a single
herbarium specimen to estimate the degree of inbreeding and the demographic
history of the population. This pipeline has been designed to process low-coverage
ancient DNA datasets from non-model organisms and assess the inbreeding
coefficient using several genomic homozygosity estimators.
The pipeline integrate several tools to manage ancient DNA (aDNA) damage
patterns, duplicated genes, problematic baits, and to determine homozygosity
patterns in fresh and historical specimens.
The pipeline includes mapDamage, a tool to quantify nucleotides substitution
A to G or C to T in the set of data, and recalibrate the quality score of the
alignment files, minimizing the bias due to aDNA patterns of damages.
Target capture baits matching multiple regions of the genome have been
identified, characterised, and removed from the analysis as well to prevent
subsequent incorrect variant call.
Many paralogous genes are found in Begonia genomes due to an early whole
genome duplication event in the history of the genus. As this can introduce a bias
in the variant calling step of the pipeline, we have implemented a step to detect
baits capturing sequences from paralogous genes in our analysis. Three methods
have been considered for this: deviation of the genotype frequencies expected
in a mapping population, detection of a unexpected level of heterozygosity
(HDplot tool), or segregating multiple contigs aligning to the same bait (pipeline
HybPiper). This analysis used genome skims from a mapping population to test
the approaches. The study showed low overlap between the baits detected as
capturing paralogs between the three methods with only 73 detected in all of
them.
Herbarium historical specimens from a single population are scarce, and at
one time point considered we can expect to find a reduced number of specimens
available for analysis. In a lot of cases, only a unique specimen is available
and represent the whole population. Therefore, rather than using inbreeding
coefficients based on alleles frequencies, we are using Runs of Homozygosity
(ROH) to estimate inbreeding and need only a single sample to be measured. To
be able to measure ROH with Hyb-Seq data, we needed to know what part of
the genome the Begonia baits are capturing with contiguous baits. The length of
genome captured by the bait set has been calculated for the four most complete
Begonia genomes available to determine the length of syntenic regions which can
be captured.
This was a key point to establish the last part of the pipeline to calculate
the size of ROH. We used PLINK to detect and quantify ROHs from VCF files
produced by variant calling. The estimators derived are the total length of ROH
in the dataset (SROH), the total number of ROH in the dataset (NROH), and
the frequency of ROH for each sample (FROH). The confrontation of the SROH
and NROH scores on a scatter plot provide an estimation of the relative size
of the population, and give clues about an admixture with another population,
a bottleneck event, or consanguinity are provided by this plot. The FROH
estimator is less informative but follows linearly the size of the population
estimated by the NROH/SROH plot. It has been used to study the biogeography
of the specimens and mapped to their phylogenetic reconstruction to investigate
the patterns of homozygosity.
We have analysed two sets of target-capture data with the pipeline, one with
Arabian Begonia, and the second with Begonia from Papua New Guinea.
The first set is composed of 43 specimens of Arabian Begonia specimens from
the Socotran archipelago including the species B. socotrana and B. samhaensis
and with silica-dried and herbarium-dried historical specimens. Examination of
the Hyb-Seq Socotran dataset revealed uneven coverage across the baits. This
capture has been used to show the limitation of the pipeline, as phylogenetic
reconstruction has not been successful beyond species level, and the ROH
estimations were not significant.
The second set of target capture data included 160 samples from the New
Guinea Highlands, from silica-dried and herbarium-dried historical specimens. As
output of the pipeline, 10 specimens showed high homozygosity levels indicating
a bottleneck in their demographic history, 3 outliers were suspected to be inbred,
60 were found to be from a large population or showing introgression, and 87
did not display homozygosity patterns significant enough and were filtered out
by the pipeline. Mapping FROH metrics to the phylogeny shows a group within
section Petermannia with consistently high homozygosity levels. Biogeographical
analysis of the distribution of the samples did not reveal any clear relation between
patterns of homozygosity and geographic location of the populations sampled.
The data analysis has revealed a higher genetic diversity than expected in the
Papua New Guinea Begonia collected and has given clues about the origin of
the homozygosity patterns observed which seem more related to phylogenetic
relationship rather than microevolution at population level.