Large-scale analysis of microarray data to identify molecular signatures of mouse pluripotent stem cells
View/ Open
Date
08/07/2018Author
McGlinchey, Aidan James
Metadata
Abstract
Publicly-available microarray data constitutes a huge resource for researchers in biological
science. A wealth of microarray data is available for the model organism – the mouse. Pluripotent
embryonic stem (ES) cells are able to give rise to all of the adult tissues of the organism and, as
such, are much-studied for their myriad applications in regenerative medicine. Fully differentiated,
somatic cells can also be reprogrammed to pluripotency to give induced pluripotent stem cells
(iPSCs). ES cells progress through a range of cellular states between ground state pluripotent stem
cells, through the primed state ready for differentiation, to actual differentiation.
Microarray data available in public, online repositories is annotated with several important
fields, although this accompanying annotation often contains issues which can impact its usefulness
to human and / or programmatic interpretation for downstream analysis. This thesis assembles and
makes available to the research community the largest-to-date pluripotent mouse ES cell (mESC)
microarray dataset and details the manual annotation of those samples for several key fields to
allow further investigation of the pluripotent state in mESCs.
Microarray samples from a given laboratory or experiment are known to be similar to each
other due to batch effects. The same has been postulated about samples which use the same cell
line. This work therefore precedes the investigation of transcriptional events in mESCs with an
investigation into whether a sample's cell line or source laboratory is a greater contributor to the
similarity between samples in this collected pluripotent mESC dataset using a method employing
Random Submatrix Total Variability, and so named RaSToVa. Further, an extension of the same
permutation and analysis method is developed to enable Discovery of Annotation-Linked Gene
Expression Signatures (DALGES), and this is applied to the gathered data to provide the first large-scale
analysis of transcriptional profiles and biological pathway activity of three commonly-used
mESC cell lines and a selection of iPSC samples, seeking insight into potential biological
differences that may result from these.
This work then goes on to re-order the pluripotent mESC data by markers of known
pluripotency states, from ground state pluripotency through primed pluripotency to earliest
differentiation and analyses changes in gene expression and biological pathway activity across this
spectrum, using differential expression and a window-scanning approach, seeking to recapitulate
transcriptional patterns known to occur in mESCs, revealing the existence of putative “early” and
“late” naïve pluripotent states and thereby identifying several lines of enquiry for in-laboratory
investigation.