Dynamics of bivalent chromatin during development in mammals
Mammalian cell types and tissues have diverse functional roles within an organism but can be derived by the differentiation of the embryonic stem cells (ESCs). ESCs are pluripotent cells with self-renewal properties. During development subsets of genes in ESCs are activated or silenced for manifestation of the cell type specific function. Gene expression changes occur transiently in early developmental stages, through signals received and executed by a variety of transcription factors (TFs), regulatory elements (promoters, enhancers) and epigenetic modifications of chromatin. Post-translational modifications of the histone tails are regulated by chromatin modifiers and transform the chromatin architecture. Polycomb (PcG) and Trithorax (TrxG) group proteins are the most commonly studied histone modifiers. They were first discovered as repressors (H3K27me3) and activators (H3K4me3) respectively of Homeobox (Hox) genes in Drosophila and they are conserved in mammals. Bivalent chromatin is defined as the simultaneous presence of silencing (H3K27me3) and activating (H3K4me3) histone marks and was first discovered as a feature of many developmental gene promoters of ESCs. Bivalent promoters are thought to be in a ‘poised’ state for later activation or repression during differentiation due to the presence of the two counter-acting histone modifications and a pausing variant of RNA polymerase II (RNAPII) accompanied with intermediate-low levels of expression. By integrative analysis of publicly available ChIP sequencing (ChIP-seq) datasets in murine and human ESCs, we predicted 3,659 and 4,979 high–confidence (HC) bivalent promoters in mouse and human ESCs respectively. Using a peak-based method, we acquire a set of bivalent promoters with high enrichment for developmental regulators. Over 85% of Polycomb targets were bivalent and their expression was particularly sensitive to TF perturbation. Moreover, murine HC bivalent promoters were occupied by both Polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions. HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters and a ‘TCCCC’ sequence motif was specifically enriched in bivalent promoters. Using the recent technology of single cell RNA sequencing (scRNA-seq) we focused on gene expression heterogeneity and how it may affect the output of differentiation. We collected single cell gene expression profiles for 32 human and 39 murine ESCs and studied the correlation between diverse characteristics such as network connectivity and coefficient of variation (CV) across single cells. We further characterized properties unique to genes with high CV. Highly expressed genes tended to have a low CV and were enriched for cell cycle genes. In contrast, High CV genes were co-expressed with other High CV genes, were enriched for bivalent promoters and showed enrichment for response to DNA damage and DNA repair. Bivalent promoters in ESCs grouped in four distinct classes of variable biological functions according to Polycomb occupancy and three RNAPII variants. To study the dynamics of epigenetic and transcription control at promoters during development, we collected ChIPseq data for two chromatin modifications (H3K4me3 and H3K27me3) and RNAPII (8WG16 antibody) as well as expression data (RNA-seq) across 8 cell types (ESCs and seven committed cell types) in mouse. Hierarchical clustering of 22,179 unique gene promoters across cell types, showed that H3K4me3 peaks are in agreement with the expression data while H3K27me3 and RNAPII peaks were not highly consistent with the hierarchical tree of gene expression. Unsupervised clustering of ChIP-seq and RNA-seq profiles has resulted in 31 distinct profiles, which were subsequently narrowed down to nine major profile groups across cell types. TF enrichment at individual clusters using ChIP sequencing data did not fully agree with the classification of 8 major profile groups. Considering all the above results, three major epigenetic profiles (active, bivalent and latent) seem to be conserved across the species and cell types in our study. These states could recapitulate only a fraction of the transcriptional information - adding other chromatin marks could enrich it - since they are seemingly unaffected by their respective expression profiles. H3K27me3 only state has low CpG density and shows stronger signatures at differentiated cell types. Transcriptional control is tighter in active than bivalent promoters and the different occupancy levels of PcG subunits and RNAPII can be reflected at the expression variance of bivalent genes, where a fraction of them are involved in developmental functions while others are more tissue-specific. Last, there is a striking similarity in the pausing patterns of RNAPII in the progenitor cell types, which suggests that RNAPII pausing is correlated with the developmental potential of the cell type. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during development.