Analysis, interpretation, and visualisation of DamID-seq experiments
DNA adenine methyltransferase identification with sequencing (abbreviated DamIDseq) is a technique that can measure protein-DNA interactions in the genome. Unlike chromatin immunoprecipitation with sequencing (abbreviated ChIP-seq), this technique does not require validated antibodies, precipitation steps, or chemical crosslinking, and can be used with minimal numbers of cells. Although the technique was first developed in Drosophila nearly two decades ago, due to technical limitations only a handful of experiments using mammalian cells have been published. The optimisation of mammalian DamID-seq in our lab has highlighted the need to survey potential sources of bias, develop accurate analysis methods, and investigate the similarities and differences with ChIP-seq for detecting protein-DNA interactions. Here, I describe several variables that influence the accuracy of DamID-seq experiments, present the Daim software package (pronounced “Dime”) for the comprehensive analysis of DamID-seq data, and assess the sensitivity and specificity of DamID-seq compared with competing techniques. In particular, I show that differences in the experimental procedure (polymerase usage and restriction digest) and features in the sequencing data (fragment length and nucleotide content) generate systematic bias and technical variation. I also demonstrate that DamID-seq data can be re-purposed to measure Dam-accessible DNA in the genome, comparable with other chromatin accessibility techniques (ATAC-seq, DNase-seq, and FAIRE-seq). To analyse DamID-seq data, I developed the Daim software package which incorporates methods for preprocessing, normalisation, and identification of DNA binding and accessibility sites. Several options for functional and sequence analysis of results are also included. The use of Daim was demonstrated using data for transcription factors Oct4 and Sox2 in mouse embryonic stem cells, embryonic fibroblast cells, and neural stem cells from a range of cell numbers. Finally, I show that DNA binding and accessibility sites vary substantially between and within techniques, yet no clear reason for these differences has been detected, prompting careful consideration of any biological conclusions. These results show that Daim can be successfully used for the analysis, interpretation, and visualisation of DamID-seq experiments, and that to achieve comprehensive results, different techniques should be treated as complementary rather than competing.