High efficiency coarse-grained customised dynamically reconfigurable architecture for digital image processing and compression technologies
Digital image processing and compression technologies have significant market potential, especially the JPEG2000 standard which offers outstanding codestream flexibility and high compression ratio. Strong demand for high performance digital image processing and compression system solutions is forcing designers to seek proper architectures that offer competitive advantages in terms of all performance metrics, such as speed and power. Traditional architectures such as ASIC, FPGA and DSPs have limitations in either low flexibility or high power consumption. On the other hand, through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, coarse-grained dynamically reconfigurable architectures are proving to be strong candidates for future high performance digital image processing and compression systems. This thesis investigates dynamically reconfigurable architectures and especially the newly emerging RICA paradigm. Case studies such as Reed- Solomon decoder and WiMAX OFDM timing synchronisation engine are implemented in order to explore the potential of RICA-based architectures and the possible optimisation approaches such as eliminating conditional branches, reducing memory accesses and constructing kernels. Based on investigations in this thesis, a novel customised dynamically reconfigurable architecture targeting digital image processing and compression applications is devised, which can be tailored to adopt different applications. A demosaicing engine based on the Freeman algorithm is designed and implemented on the proposed architecture as the pre-processing module in a digital imaging system. An efficient data buffer rotating scheme is designed with the aim of reducing memory accesses. Meanwhile an investigation targeting mapping the demosaicing engine onto a dual-core RICA platform is performed. After optimisation, the performance of the proposed engine is carefully evaluated and compared in aspects of throughput and consumed computational resources. When targeting the JPEG2000 standard, the core tasks such as 2-D Discrete Wavelet Transform (DWT) and Embedded Block Coding with Optimal Truncation (EBCOT) are implemented and optimised on the proposed architecture. A novel 2-D DWT architecture based on vector operations associated with RICA paradigm is developed, and the complete DWT application is highly optimised for both throughput and area. For the EBCOT implementation, a novel Partial Parallel Architecture (PPA) for the most computationally intensive module in EBCOT, termed Context Modeling (CM), is devised. Based on the algorithm evaluation, an ARM core is integrated into the proposed architecture for performance enhancement. A Ping-Pong memory switching mode with carefully designed communication scheme between RICA based architecture and ARM is proposed. Simulation results demonstrate that the proposed architecture for JPEG2000 offers significant advantage in throughput.