Genome-scale transcriptomic and epigenomic analysis of stem cells
Item Status
Embargo End Date
Date
Authors
Halbritter, Florian
Abstract
Embryonic stem cells (ESCs) are a special type of cell marked by two key properties: The
capacity to create an unlimited number of identical copies of themselves (self-renewal) and the
ability to give rise to differentiated progeny that can contribute to all tissues of the adult body
(pluripotency). Decades of past research have identified many of the genetic determinants of
the state of these cells, such as the transcription factors Pou5f1, Sox2 and Nanog. Many other
transcription factors and, more recently, epigenetic determinants like histone modifications,
have been implicated in the establishment, maintenance and loss of pluripotent stem cell
identity.
The study of these regulators has been boosted by technological advances in the field of
high-throughput sequencing (HTS) that have made it possible to investigate the binding and
modification of many proteins on a genome-wide level, resulting in an explosion of the amount
of genomic data available to researchers. The challenge is now to effectively use these data
and to integrate the manifold measurements into coherent and intelligible models that will
actually help to better understand the way in which gene expression in stem cells is regulated
to maintain their precarious identity.
In this thesis, I first explore the potential of HTS by describing two pilot studies using
the technology to investigate global differences in the transcriptional profiles of different cell
populations. In both cases, I was able to identify a number of promising candidates that mark
and, possibly, explain the phenotypic and functional differences between the cells studied.
The pilot studies highlighted a strong requirement for specialised software to deal with
the analysis of HTS data. I have developed GeneProf, a powerful computational framework
for the integrated analysis of functional genomics experiments. This software platform solves
many recurring data analysis challenges and streamlines, simplifies and standardises data analysis
work flows promoting transparent and reproducible methodologies. The software offers a
graphical, user-friendly interface and integrates expert knowledge to guide researchers through
the analysis process. All primary analysis results are supplemented with a range of informative
plots and summaries that ease the interpretation of the results. Behind the scenes, computationally
demanding tasks are handled remotely on a distributed network of high-performance
computers, removing rate-limiting requirements on local hardware set-up. A flexible and modular
software design lays the foundations for a scalable and extensible framework that will be
expanded to address an even wider range of data analysis tasks in future.
Using GeneProf, billions of data points from over a hundred published studies have been
re-analysed. The results of these analyses are stored in an web-accessible database as part
of the GeneProf system, building up an accessible resource for all life scientists. All results,
together with details about the analysis procedures used, can be browsed and examined in
detail and all final and intermediate results are available and can instantly be reused and
compared with new findings.
In an attempt to elucidate the regulatory mechanisms of ESCs, I use this knowledge base
to identify high-confidence candidate genes relevant to stem cell characteristics by comparing
the transcriptional profiles of ESCs with those of other cell types. Doing so, I describe 229
genes with highly ESC-specific transcription. I then integrate the expression data for these ES-specific genes with genome-wide transcription factor binding and histone modification data.
After investigating the global characteristics of these "regulatory inputs", I employ machine
learning methods to first cluster subgroups of genes with ESC-specific expression patterns and
then to define a "regulatory code" that marks one of the subgroups based on their regulatory
signatures.
The tightly co-regulated core cluster of genes identified in this analysis contains many
known members of the transcriptional circuitry of ESCs and a number of novel candidates
that I deem worthy of further investigations thanks to their similarity to their better known
counterparts. Integrating these candidates and the regulatory code that drives them into our
models of the workings of ESCs might eventually help to refine the ways in which we derive,
culture and manipulate these cells - with all its prospective benefits to research and medicine.
This item appears in the following Collection(s)

