Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis
Hussain, Hanaa Mohammad
The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis.
The following license files are associated with this item: