Edinburgh Research Archive

To cut a short story long: development of full-length RNA-sequencing approaches to resolve transcript-level expression dynamics in Atlantic salmon

Item Status

Embargo End Date

Authors

Eve, Oliver T. H.

Abstract

Atlantic salmon is a finfish of significant cultural, ecological and commercial importance, representing the United Kingdom’s main aquaculture species. There is currently a great opportunity to apply genomics to improve the sustainability, efficiency and welfare of the aquaculture sector. This includes a current drive to perform functional annotation of genomes to identify genes and other elements that shape the traits of aquaculture species. The AQUA-FAANG consortium aimed to produce comprehensive functional annotations for six key aquaculture species, including Atlantic salmon. The work in this thesis was carried out under the AQUA-FAANG umbrella. Long-read RNA-sequencing (RNA-seq) technologies are powerful tools for functional annotation of gene expression, with great scope to resolve complex transcript variants that cannot be accurately assessed using traditional short-read methods. However, long-read RNA-seq is yet to be applied and benchmarked in many aquaculture species. As such, my work aimed to develop a robust full-length RNA-seq method in Atlantic salmon using long-read technology to examine transcriptional diversity and conduct expression analyses resolved to individual transcript variants. My work focused on two distinct study systems where extensive transcriptional regulation is applied: 1) embryogenesis, the stage of ontogeny where the adult body plan is established, and 2) immune function in response to acute viral and bacterial stimulation, improving understanding of innate immune function. I developed a full-length RNA-seq method using the Oxford Nanopore Technologies platform involving the optimisation of total RNA extraction and mRNA isolation protocols, as well as cDNA library generation and subsequent sequencing on the PromethION device. A custom transcriptome assembly pipeline was optimised to generate the first nanopore-based long-read transcriptome for Atlantic salmon, used as the reference for further analyses reported in this Thesis. The long-read transcriptome consisted of 266,222 transcripts and 35,480 genes, with a transcript-to-gene ratio of 7.50 in comparison with 2.65 in the current Ensembl reference annotation (Ssal_v3.1). Furthermore, 60% of transcript models were deemed to contain a novel splice site, indicating that my full-length RNA-seq method captured extensive novel transcript diversity not annotated in the current reference assembly. To examine transcript expression dynamics in response to viral and bacterial infection, I developed a differential transcript expression and usage analysis workflow, adapting existing bioinformatic tools. My analysis captured complex dynamics of alternative transcript expression for antiviral and antibacterial genes involved in the interferon-JAK/STAT pathway and proinflammatory responses, respectively. A novel fusion transcript between pctk2 and an undescribed locus containing a FIP2-like coding sequence was identified to be upregulated in both viral and bacterial response. A separate pipeline was developed to assess transcript expression during embryogenesis using a complex timecourse design that sampled embryos at six stages (from blastulation to the late-eyed stage). Using a dimensionality reduction technique called self-organising maps (SOM), twinned with a generalised linear model and quasi-likelihood F-test method, I optimised a differential transcript expression workflow and developed an approach to examine differential transcript usage across development stages. This resulted in a comprehensive description of transcript expression throughout early development and the discovery of alternative transcript usage events within individual genes including an exon-chaining event in the coding sequence of slc25a3b, (mitochondrial phosphate carrier PiC) causing expression of unique isoforms in blastulation and late-eyed stages of development, whilst a 5’ UTR difference in the tagl gene led to different isoforms being expressed in blastulation and somitogenesis. The full-length sequencing method captured many mono-exonic, or intronless gene and transcript models not present in the reference annotation. Over a third of these models were found to contain a complete or partial ORF indicating they are protein-coding, whilst approximately 25% of mono-exonic transcripts were found to overlap repetitive regions. Additionally, I identified a previously undescribed retrogene family found to be widespread throughout the genome. Overall, this thesis reports approaches for robust full-length RNA-seq analysis in a non-model species with a complex genome. This work has furthered our understanding of the transcript-level expression dynamics underpinning early development and immune function in Atlantic salmon, with possible applications in aquaculture research.