Multigene datasets for deep phylogeny
Though molecular phylogenetics has been very successful in reconstructing the evolutionary history of species, some phylogenies, particularly those involving ancient events, have proven difficult to resolve. One approach to improving the resolution of deep phylogenies is to increase the amount of data by including multiple genes assembled from public sequence databases. Using modern phylogenetic methods and abundant computing power, the vast amount of sequence data available in public databases can be brought to bear on difficult phylogenetic problems. In this thesis I outline the motivation for assembling large multigene datasets and lay out the obstacles associated with doing so. I discuss the various methods by which these obstacles can be overcome and describe a bioinformatics solution, TaxMan, that can be used to rapidly assemble very large datasets of aligned genes in a largely automated fashion. I also explain the design and features of TaxMan from a biological standpoint and present the results of benchmarking studies. I illustrate the use of TaxMan to assemble large multigene datasets for two groups of taxa – the subphylum Chelicerata and the superphylum Lophotrochozoa. Chelicerata is a diverse group of arthropods with an uncertain phylogeny. When a set of mitochondrial genes is used to analyse the relationships between the chelicerate orders, the conclusions are highly dependent upon the evolutionary model used and are affected by the presence of systematic compsitional bias in mitochondrial genomes. Lophotrochozoa is a recently-proposed group of protostome phyla. A number of distinct phylogenetic hypotheses concerning the relationships between lophotrochozoan phyla have been proposed. I compare the phylogenetic conclusions given by analysis of nuclear and mitochondrial protein-coding and rRNA genes to evaluate support for some of these hypotheses.